class email.charset.Charset(input_charset=DEFAULT_CHARSET)
Map character sets to their email properties.
This class provides information about the requirements imposed on email for a specific character set. It also provides convenience routines for converting between character sets, given the availability of the applicable codecs. Given a character set, it will do its best to provide information on how to use that character set in an email message in an RFC-compliant way.
Certain character sets must be encoded with quoted-printable or base64 when used in email headers or bodies. Certain character sets must be converted outright, and are not allowed in email.
Optional input_charset is as described below; it is always coerced to lower case. After being alias normalized it is also used as a lookup into the registry of character sets to find out the header encoding, body encoding, and output conversion codec to be used for the character set. For example, if input_charset is iso-8859-1, then headers and bodies will be encoded using quoted-printable and no output conversion codec is necessary. If input_charset is euc-jp, then headers will be encoded with base64, bodies will not be encoded, but output text will be converted from the euc-jp character set to the iso-2022-jp character set.
Charset instances have the following data attributes:
-
input_charset -
The initial character set specified. Common aliases are converted to their official email names (e.g.
latin_1is converted toiso-8859-1). Defaults to 7-bitus-ascii.
-
header_encoding -
If the character set must be encoded before it can be used in an email header, this attribute will be set to
Charset.QP(for quoted-printable),Charset.BASE64(for base64 encoding), orCharset.SHORTESTfor the shortest of QP or BASE64 encoding. Otherwise, it will beNone.
-
body_encoding -
Same as header_encoding, but describes the encoding for the mail message’s body, which indeed may be different than the header encoding.
Charset.SHORTESTis not allowed for body_encoding.
-
output_charset -
Some character sets must be converted before they can be used in email headers or bodies. If the input_charset is one of them, this attribute will contain the name of the character set output will be converted to. Otherwise, it will be
None.
-
input_codec -
The name of the Python codec used to convert the input_charset to Unicode. If no conversion codec is necessary, this attribute will be
None.
-
output_codec -
The name of the Python codec used to convert Unicode to the output_charset. If no conversion codec is necessary, this attribute will have the same value as the input_codec.
Charset instances also have the following methods:
-
get_body_encoding() -
Return the content transfer encoding used for body encoding.
This is either the string
quoted-printableorbase64depending on the encoding used, or it is a function, in which case you should call the function with a single argument, the Message object being encoded. The function should then set the Content-Transfer-Encoding header itself to whatever is appropriate.Returns the string
quoted-printableif body_encoding isQP, returns the stringbase64if body_encoding isBASE64, and returns the string7bitotherwise.
-
get_output_charset() -
Return the output character set.
This is the output_charset attribute if that is not
None, otherwise it is input_charset.
-
header_encode(string) -
Header-encode the string string.
The type of encoding (base64 or quoted-printable) will be based on the header_encoding attribute.
-
header_encode_lines(string, maxlengths) -
Header-encode a string by converting it first to bytes.
This is similar to
header_encode()except that the string is fit into maximum line lengths as given by the argument maxlengths, which must be an iterator: each element returned from this iterator will provide the next maximum line length.
-
body_encode(string) -
Body-encode the string string.
The type of encoding (base64 or quoted-printable) will be based on the body_encoding attribute.
The Charset class also provides a number of methods to support standard operations and built-in functions.
-
__str__() -
Returns input_charset as a string coerced to lower case.
__repr__()is an alias for__str__().
-
__eq__(other) -
This method allows you to compare two
Charsetinstances for equality.
-
__ne__(other) -
This method allows you to compare two
Charsetinstances for inequality.
Please login to continue.