Post-Delivery Message Downgrading for Internationalized Email Messages
Japan Registry Services Co., Ltd.Chiyoda First Bldg. East 13F, 3-8-1 Nishi-KandaChiyoda-kuTokyo101-0065Japan+81 3 5215 8451fujiwara@jprs.co.jp
Applications
Email Address Internationalization (EAI)EAIEmail Address InternationalizationDowngradeMAIL
The Email Address Internationalization (SMTPUTF8)
extension to SMTP allows Unicode characters encoded in
UTF-8 and outside the ASCII repertoire in mail header
fields.
Upgraded POP and IMAP servers support internationalized messages.
If a POP or IMAP client does not support Email Address Internationalization,
a POP or IMAP server cannot deliver internationalized messages to the client
and cannot remove the message.
To avoid that situation,
this document describes a mechanism
for converting internationalized messages into the traditional message format.
As part of the conversion process, message elements that require
internationalized treatment are recoded or removed,
and receivers are able to recognize that they received messages containing such
elements, even if they cannot process the internationalized elements.
Traditional (legacy) mail systems, which are defined by the Internet
Message Format and other specifications,
allow only ASCII characters in mail header field values.
The SMTPUTF8 extension
allows Unicode characters encoded in UTF&nbhy;8 in these mail header fields. "Raw non&nbhy;ASCII strings" refers to strings
of those characters in which at least one of them is not
part of the ASCII repertoire.
If a header field contains non&nbhy;ASCII strings,
a POP or IMAP server cannot deliver internationalized messages to
legacy clients that do not send UTF8 commands or have UTF8 capability.
Also, because they have no obvious or standardized way to explain what
is going on to clients, a POP or IMAP server cannot even safely discard the message.
There
are four plausible approaches to the problem. The preferred approach
depends on the particular circumstances and relationship
among the delivery SMTP server, the mail store, the POP or IMAP
server, and the users and their Mail User Agent (MUA) clients. The
four approaches are as follows:
If the delivery Mail Transport Agent (MTA) has sufficient knowledge about the POP
or IMAP server and the clients being used, the message
may be rejected as undeliverable.A new, surrogate, message may be created by
downgrading the original one in the POP or IMAP server
in a way that preserves maximum information at the
expense of some complexity and that does not create
security or operational problems in the mail system.
These surrogate messages are referred to as
"downgraded" in this specification and as "surrogate
messages" elsewhere.
Some intermediate downgrading may be applied that balances
additional information loss against lower complexity and greater
ease of implementation.The POP or IMAP server may fabricate a message that is
intended to notify the client that an internationalized
message is waiting but cannot be delivered until an
upgraded client is available. This specification describes the second of these options. It
is worth noting that, at least in the general case, none of
these options preserves sufficient information to guarantee that
it is possible to reply to an incoming message without loss of
information, so the choice may be considered one of the available "least
bad" options.
While this document specifies a well&nbhy;designed mechanism,
it is only an interim solution while clients are being upgraded
.
This message downgrading mechanism converts mail header fields to an
all&nbhy;ASCII representation. The POP or IMAP server can use the
downgrading mechanism and then deliver the internationalized message in
a traditional form, which allows receivers to know whether a message is
internationalized or unknown or broken.
The Internationalized Mail Header specification
allows UTF&nbhy;8 characters (see ) to be used in
mail header fields and MIME header fields.
The Internationalized Mail Transport specification
allows UTF&nbhy;8 characters to be used in
some trace header fields.
The message downgrading mechanism specified here describes
the method by which
internationalized messages
are converted to traditional email messages
.
This document provides a precise definition of the
minimum-information-loss message downgrading process.
Downgrading consists of the following two parts:
Email header field downgradingMIME header field downgrading
Email header field downgrading is described in .
It generates ASCII&nbhy;only header fields.
Header fields starting with Downgraded- are introduced in .
They preserve the information that appeared in the original
header fields.
The definition of MIME header fields in internationalized messages
is described in RFC 6532.
A delivery status notification may contain non&nbhy;ASCII addresses.
MIME header field downgrading is described in .
Delivery status
notification downgrading is described in .
It generates ASCII&nbhy;only MIME header fields.
Displaying downgraded messages that originally contained
internationalized header fields is out of scope of this
document. A POP or IMAP client that does not support UTF8
extensions as defined for POP3 "UTF8 command"
and IMAP "ENABLE UTF8=ACCEPT command"
does not recognize the internationalized message format
.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in RFC
2119 .
Many of the specialized terms used in this specification are defined in other
documents. They include "Overview and Framework for Internationalized Email"
, the Internet Message Format specification , and some of the basic MIME documents
. This specification makes
extensive use of the MIME Message Header Extensions and extended MIME parameter encodings . For convenience, both are described as "encoded&nbhy;words" or
"encoded&nbhy;word encoding". All of the encoded&nbhy;words generated
according to this specification use UTF-8 as their charset.
The terms "U&nbhy;label", "A&nbhy;label", and "IDNA" are used as defined in the IDNA
Definitions document .
The terms "ASCII address", "non&nbhy;ASCII address", "SMTPUTF8", "message", and
"internationalized message" are used as defined RFC 6530.
The term "non&nbhy;ASCII string"
is used with the definition provided in the Internationalized Email Headers
document .
The term "UTF&nbhy;8 character" is used informally in this
document to denote a Unicode character, encoded in
UTF-8, outside the ASCII repertoire.
Such characters
are more formally described using the ABNF element
<UTF8-non-ascii>, defined in RFC 6532.This document refers to the Augmented Backus-Naur Form (ABNF) elements that appear in RFC 5322 and RFC 2045.
RFC 5322 describes the
ABNF elements <CFWS>, <comment>, <display&nbhy;name>,
<group>,
<id&nbhy;left>, <id&nbhy;right>,
<mailbox>, <quoted&nbhy;string>, <unstructured>, and
<word>.
RFC 2045 describes the ABNF element <value>.
Section 3.3 of the Internationalized Mail Transport specification
and
Section 3.2 of the Internationalized Email Headers document updated <domain> to allow non&nbhy;ASCII characters.
Some additional terms are defined locally
in&nbhy;line below.
This section defines the method for converting each header field
that may contain non-ASCII strings into ASCII.
describes the methods for rewriting
each ABNF element.
describes the methods for
rewriting each header field.
Header field downgrading is defined below for each ABNF element.
Conversion of the header field terminates when no characters other than
those in the ASCII repertoire remain in the header field.
If the header field has an <unstructured> field that
contains non&nbhy;ASCII strings,
apply encoded-word encoding.
If the header field has any <word>
fields that contain non&nbhy;ASCII strings,
apply encoded-word encoding.
If the header field has any <comment>
fields that contain non&nbhy;ASCII strings,
apply encoded-word encoding.
If the header field has any <value> elements that contain non&nbhy;ASCII
strings, remove any <CFWS> that appear outside DQUOTE that appear in
those elements, then encode the <value> elements as extended MIME parameter
encodings and leave the language information empty.
If the header field has any <address>
(<mailbox> or <group>) elements, and
they have <display&nbhy;name> elements that contain non&nbhy;ASCII strings,
encode the <display&nbhy;name> elements
as encoded-words.
Display&nbhy;Name downgrading uses the same algorithm as Word downgrading.
If the header field has any <domain> elements
that contain U&nbhy;labels,
rewrite the non&nbhy;ASCII domain name into an ASCII domain name using A&nbhy;labels
.
<group> is defined in Section 3.4 of the Internet Message Format specification .
The <group> element may contain <mailbox> elements that contain non&nbhy;ASCII addresses.
If a <group> element contains <mailbox> elements and one of those <mailbox> elements contains a non&nbhy;ASCII <local&nbhy;part>,
rewrite the <group> element as
where the <ENCODED_WORD> is the original <group&nbhy;list> encoded
as encoded-words.
Otherwise, the <group> element contains an ASCII-only
<local&nbhy;part>.
If the <group> element contains non&nbhy;ASCII <mailbox> elements,
they contain non&nbhy;ASCII domain names.
Rewrite the non&nbhy;ASCII domain names into ASCII domain names using A&nbhy;labels
.
Generated <mailbox> elements contain ASCII addresses only.
If the <local&nbhy;part> of the <mailbox> element contains no
characters other than
those in the ASCII repertoire,
the <domain> element may contain non&nbhy;ASCII characters.
Rewrite the non&nbhy;ASCII domain names into ASCII domain names using A&nbhy;labels
.
Otherwise, the <local&nbhy;part> may contain non&nbhy;ASCII characters.
The <local&nbhy;part> that contains characters outside the ASCII
repertoire has no equivalent format for ASCII addresses.
The <addr&nbhy;spec> element that contains non&nbhy;ASCII strings may
appear in two forms as:
Rewrite both as:
where the <ENCODED&nbhy;WORD> is the original
<addr&nbhy;spec> encoded as encoded-words.
If the header field contains <utf&nbhy;8&nbhy;type&nbhy;addr>
and the <utf&nbhy;8&nbhy;type&nbhy;addr> contains raw
non&nbhy;ASCII strings (<UTF8-non-ascii>),
it is in utf&nbhy;8&nbhy;address form .
Convert it to utf&nbhy;8&nbhy;addr&nbhy;xtext form .
Comment downgrading is also performed in this case.
If the address type is unrecognized and the header field contains non&nbhy;ASCII strings,
then fall back to using Encapsulation on the entire header field as
specified in .
As a last resort, when header fields cannot be converted as
discussed in the previous subsection, the fields are deleted
and replaced by specialized new header fields. Those fields
are defined to preserve, in encoded form, as much
information as possible from the header field values of the
incoming message. This mechanism is known as Encapsulation downgrading in
this specification because it preserves the original information in a different
form. The syntax of these new header fields is:
Applying this procedure to the "Received:" header field is
prohibited. Encapsulation downgrading is allowed for
"Message&nbhy;ID:", "In&nbhy;Reply&nbhy;To:", "References:",
"Original&nbhy;Recipient:", and "Final&nbhy;Recipient:" header fields.
To preserve a header field in a Downgraded- header field:
Generate a new header field.
The field name is a concatenation of Downgraded- and the original field name.The initial new field value is the original header field value.
Treat the initial new header field value as if it were unstructured,
and then apply the encoded-word encoding as necessary so that the resulting new header field value
is completely in ASCII.
Remove the original header field.
The Mail and MIME Header Fields document establishes a registry of header fields.
This section describes the downgrading method for each header field
listed in that registry as of the date of publication of
this specification.
If the entire mail header field contains no characters other than those
in the ASCII repertoire,
email header field downgrading is not required.
Each header field's downgrading method is described below.
If the header field contains non&nbhy;ASCII characters, first perform
Comment downgrading and Display&nbhy;Name downgrading as
described in the corresponding subsections of
.
If the header field still contains non&nbhy;ASCII characters,
complete the following two steps:
If the header field contains <group> elements
that contain non&nbhy;ASCII addresses,
perform Group downgrading on those elements.
If the header field contains <mailbox> elements
that contain non&nbhy;ASCII addresses,
perform Mailbox downgrading on those elements.
This procedure may generate empty <group> elements in the "From:" and "Sender:" header fields.
The Group Syntax document updates the
Internet Message Format specification
to allow (empty) <group> elements in the "From:" and "Sender:"
header fields.
Except in comments, these header fields do not contain characters other than those in
the ASCII repertoire.
If the header field contains UTF&nbhy;8 characters in comments, perform Comment downgrading.
If there are non-ASCII strings in <id-left> or
<id-right> elements,
perform Encapsulation. Otherwise, the header field contains UTF-8 characters
in comments and Comment downgrading should be performed.
If <domain> elements or <mailbox> elements contain U&nbhy;labels, perform Domain downgrading as specified in .
Comments may contain non&nbhy;ASCII strings; if so, perform Comment downgrading.
After the Domain downgrading and the Comment downgrading,
if the "FOR" clause contains a non&nbhy;ASCII <local&nbhy;part>, remove the
FOR clause.
If the "ID" clause contains a non&nbhy;ASCII value, remove the ID clause.
If there are non&nbhy;ASCII strings in <value> or <CFWS> elements, perform
MIME-Value and Comment downgrading.
If non-ASCII strings are present in <unstructured> elements, perform
Unstructured downgrading.
If non-ASCII strings are present in <phrase> elements, perform Word downgrading.
Other header fields that are not covered in this document (such as
implementation&nbhy;specific or user&nbhy;defined fields) might also contain
non&nbhy;ASCII strings. Any header field that does not have a conversion
method defined above will be in this category and treated as follows.If there are non-ASCII strings present in the header fields, perform
Unstructured downgrading.If the software understands the header field's structure and a downgrading
algorithm other than Unstructured is applicable, that software SHOULD use
that algorithm; Unstructured downgrading is used when there is no other option.
Mailing list header fields (those that start in "List-")
are part of this category.
Both the MIME body part header fields and the contents of a delivery
status notification may contain non&nbhy;ASCII characters.
RFC 6532 specifies an extension that permits MIME header fields, including body
part header fields, to contain non&nbhy;ASCII strings.
This section defines the conversion method to ASCII&nbhy;only header fields
for each MIME header field that contains non&nbhy;ASCII strings.
Parse the message body's MIME structure at all levels and
check each MIME header field to see whether it contains non&nbhy;ASCII strings.
If the header field contains non&nbhy;ASCII strings in the header field value,
the header field is a target of the MIME body part header field's downgrading.
The downgrading methods used for the MIME body part header fields Content&nbhy;ID,
Content&nbhy;Type, Content&nbhy;Disposition, and Content&nbhy;Description are the same as
those used for the header fields of the same name described in
If the message contains a delivery status notification
(see Section 6 of the SMTP DSN Extension ),
perform the following tests and conversions.
If there are "Original&nbhy;Recipient:" and "Final&nbhy;Recipient:" header fields,
and the header fields contain non&nbhy;ASCII strings, perform
Type&nbhy;Addr downgrading.
The purpose of post&nbhy;delivery message downgrading is to allow
POP and IMAP servers to deliver internationalized messages
to traditional POP and IMAP clients
and to permit the clients to display those messages.
Users that receive such messages can know that they were
internationalized.
It does not permit receivers to read the messages in their
original form and, in general, will not permit generating
replies, at least without significant user intervention.
After downgrading as specified in this document, the
header fields of a message will contain ASCII characters
only, some of them in encoded&nbhy;word form. Nothing in
this document or other SMTPUTF8 specifications alters the basic properties of MIME that allow
characters outside the ASCII repertoire in encodings as
specified for them. Thus,
this document inherits the security considerations associated with
MIME&nbhy;encoded header fields as specified in RFC 2047 and with UTF-8 itself as specified in RFC 3629 .
Rewriting header fields increases the opportunities for
undetected spoofing by malicious senders.
However, the rewritten header field values are
preserved in equivalent MIME form
or in newly defined header fields for which traditional MUAs have no special processing procedures.
The techniques described here may invalidate methods that depend on digital
signatures over any part of the message, which includes the top&nbhy;level header
fields and body part header fields. Depending on the specific message being
downgraded, at least the following techniques are likely to break: DomainKeys
Identified Mail (DKIM) and possibly S/MIME and Pretty Good Privacy (PGP). The
downgrade mechanism SHOULD NOT remove signatures even if the signatures will
fail validation after downgrading. As much of the information as possible from
the original message SHOULD be preserved. In addition, MUAs may be able to use the presence of
an Authentication-Results header field to assess whether the digital
signatures were valid before the header fields were downgraded.
While information in any email header field should usually be treated with
some suspicion, current email systems commonly employ various
mechanisms and protocols to make the information more trustworthy.
Information in the new Downgraded&nbhy;* header fields is
not inspected by traditional MUAs and may be even less trustworthy
than the traditional header fields.
Note that the Downgraded&nbhy;* header fields could have been inserted with malicious intent
(and with content unrelated to the traditional header fields); however,
traditional MUAs do not evaluate Downgraded&nbhy;* header fields.
See the Security Considerations sections in the Group Syntax document and the Internationalized Email Framework for more discussion.
While the specification of encoded-words includes specific rules for dealing
with whitespace in adjacent encoded words , there are
a number of deployed implementations
that fail to implement the algorithm correctly. As a result, whitespace
behavior is somewhat unpredictable, in practice, when multiple encoded words
are used.
While RFC 5322 states that implementations SHOULD limit lines
to 78 characters or less, implementations MAY choose to allow
overly long encoded words to work around faulty implementations of
encoded-words. Implementations that choose to do so SHOULD have an
optional mechanism to limit line length to 78 characters.
The experimental specification from which this document was partially
derived specifies that no new
header fields beginning with Downgraded- are to be registered. That
restriction is now lifted, and this document
makes a new set of registrations, replacing the experimental fields with
standard ones.
The Downgraded&nbhy;* header fields that were registered as experimental
fields in RFC 5504 are no longer in use. IANA has changed the status
from "experimental" to "obsoleted" for every name in the "Permanent Message
Header Field Names" registry that began with Downgraded-.
The following header fields have been registered in the
"Permanent Message Header Field Names" registry, in accordance with the
procedures set out in the Header Field Registration document .
Downgraded-Message-IdmailstandardIETFThis document ()Downgraded-In-Reply-TomailstandardIETFThis document ()Downgraded-ReferencesmailstandardIETFThis document ()Downgraded-Original-RecipientmailstandardIETFThis document ()Downgraded-Final-RecipientmailstandardIETFThis document ()
This document draws heavily from the experimental in&nbhy;transit
message downgrading procedure described RFC 5504. The contributions of the coauthor
of that earlier document, Y. Yoneya, are gratefully
acknowledged.
Significant comments and suggestions were received from John
Klensin, Barry Leiba, Randall Gellens, Pete Resnick, Martin J. Durst,
and other WG participants.
Update to Internet Message Format to Allow Group Syntax in the "From:" and "Sender:" Header FieldsThe Internet Message Format (RFC 5322) allows "group" syntax in some email header fields, such as "To:" and "CC:", but not in "From:" nor "Sender:". This document updates RFC 5322 to relax that restriction, allowing group syntax in those latter fields, as well as in "Resent- From:" and "Resent&nbhy;Sender:", in certain situations.Post Office Protocol Version 3 (POP3) Support for UTF-8This specification extends the Post Office Protocol version 3 (POP3) to support UTF-8 encoded international string in user names, passwords, mail addresses, message headers, and protocol-level textual strings.IMAP Support for UTF-8This specification extends the Internet Message Access Protocol version 4rev1 (IMAP4rev1) to support UTF-8 encoded international characters in user names, mail addresses and message headers. This specification replaces RFC 5738.
This appendix shows a message downgrading example.
Consider a received mail message where:
The sender address is a non&nbhy;ASCII address, "NON&nbhy;ASCII&nbhy;LOCAL@example.com". Its display&nbhy;name is "DISPLAY&nbhy;LOCAL".
The "To:" header field contains two non&nbhy;ASCII addresses,
"NON&nbhy;ASCII&nbhy;REMOTE1@example.net" and "NON&nbhy;ASCII&nbhy;REMOTE2@example.com". Their
display&nbhy;names are "DISPLAY&nbhy;REMOTE1" and "DISPLAY&nbhy;REMOTE2".
The "Cc:" header field contains a non&nbhy;ASCII address, "NON&nbhy;ASCII&nbhy;REMOTE3@example.org".
Its display&nbhy;name is "DISPLAY&nbhy;REMOTE3".
Four display&nbhy;names contain non&nbhy;ASCII characters.
The "Subject:" header field is "NON&nbhy;ASCII&nbhy;SUBJECT", which contains non&nbhy;ASCII strings.
The "Message&nbhy;Id:" header field contains "NON&nbhy;ASCII&nbhy;MESSAGE_ID",
which contains non&nbhy;ASCII strings.
There is an unknown header field "X&nbhy;Unknown&nbhy;Header:", which contains non&nbhy;ASCII strings.
The downgraded message is shown in Figure 2. "Return&nbhy;Path:",
"From:", "To:", and "Cc:" header fields are rewritten. "Subject:"
and "X&nbhy;Unknown&nbhy;Header:" header fields are encoded as encoded&nbhy;words.
The "Message&nbhy;Id:" header field is encapsulated as a "Downgraded&nbhy;Message&nbhy;Id:" header field.