email.charset.Charset

class email.charset.Charset(input_charset=DEFAULT_CHARSET)

Map character sets to their email properties.

This class provides information about the requirements imposed on email for a specific character set. It also provides convenience routines for converting between character sets, given the availability of the applicable codecs. Given a character set, it will do its best to provide information on how to use that character set in an email message in an RFC-compliant way.

Certain character sets must be encoded with quoted-printable or base64 when used in email headers or bodies. Certain character sets must be converted outright, and are not allowed in email.

Optional input_charset is as described below; it is always coerced to lower case. After being alias normalized it is also used as a lookup into the registry of character sets to find out the header encoding, body encoding, and output conversion codec to be used for the character set. For example, if input_charset is iso-8859-1, then headers and bodies will be encoded using quoted-printable and no output conversion codec is necessary. If input_charset is euc-jp, then headers will be encoded with base64, bodies will not be encoded, but output text will be converted from the euc-jp character set to the iso-2022-jp character set.

Charset instances have the following data attributes:

input_charset

The initial character set specified. Common aliases are converted to their official email names (e.g. latin_1 is converted to iso-8859-1). Defaults to 7-bit us-ascii.

header_encoding

If the character set must be encoded before it can be used in an email header, this attribute will be set to Charset.QP (for quoted-printable), Charset.BASE64 (for base64 encoding), or Charset.SHORTEST for the shortest of QP or BASE64 encoding. Otherwise, it will be None.

body_encoding

Same as header_encoding, but describes the encoding for the mail message’s body, which indeed may be different than the header encoding. Charset.SHORTEST is not allowed for body_encoding.

output_charset

Some character sets must be converted before they can be used in email headers or bodies. If the input_charset is one of them, this attribute will contain the name of the character set output will be converted to. Otherwise, it will be None.

input_codec

The name of the Python codec used to convert the input_charset to Unicode. If no conversion codec is necessary, this attribute will be None.

output_codec

The name of the Python codec used to convert Unicode to the output_charset. If no conversion codec is necessary, this attribute will have the same value as the input_codec.

Charset instances also have the following methods:

get_body_encoding()

Return the content transfer encoding used for body encoding.

This is either the string quoted-printable or base64 depending on the encoding used, or it is a function, in which case you should call the function with a single argument, the Message object being encoded. The function should then set the Content-Transfer-Encoding header itself to whatever is appropriate.

Returns the string quoted-printable if body_encoding is QP, returns the string base64 if body_encoding is BASE64, and returns the string 7bit otherwise.

get_output_charset()

Return the output character set.

This is the output_charset attribute if that is not None, otherwise it is input_charset.

header_encode(string)

Header-encode the string string.

The type of encoding (base64 or quoted-printable) will be based on the header_encoding attribute.

header_encode_lines(string, maxlengths)

Header-encode a string by converting it first to bytes.

This is similar to header_encode() except that the string is fit into maximum line lengths as given by the argument maxlengths, which must be an iterator: each element returned from this iterator will provide the next maximum line length.

body_encode(string)

Body-encode the string string.

The type of encoding (base64 or quoted-printable) will be based on the body_encoding attribute.

The Charset class also provides a number of methods to support standard operations and built-in functions.

__str__()

Returns input_charset as a string coerced to lower case. __repr__() is an alias for __str__().

__eq__(other)

This method allows you to compare two Charset instances for equality.

__ne__(other)

This method allows you to compare two Charset instances for inequality.

doc_python
2016-10-07 17:32:18
Comments
Leave a Comment

Please login to continue.