Post Office Protocol Version 3 (POP3) Support for UTF-8QUALCOMM Incorporated5775 Morehouse DriveSan DiegoCA92651USArg+ietf@qualcomm.comOracle800 Royal OaksMonroviaCA91016-6347USAchris.newman@oracle.comCNNICNo.4 South 4th Street, ZhongguancunBeijingChina+86 10 58813007 yaojk@cnnic.cnJapan Registry Services Co., Ltd.Chiyoda First Bldg. East 13F, 3-8-1 Nishi-KandaTokyoJapan+81 3 5215 8451fujiwara@jprs.co.jp
This specification extends the Post Office Protocol
version 3 (POP3) to support international strings encoded in UTF&nbhy;8 in
usernames, passwords, mail addresses, message headers, and
protocol&nbhy;level text strings.
This document forms part of the Email Address Internationalization
protocols described in the Email Address Internationalization Framework
document. As
part of the overall Email Address Internationalization work, email messages can be transmitted and
delivered containing a Unicode string encoded in UTF&nbhy;8 in the header and/or body,
and maildrops that
are accessed using POP3 might natively
store Unicode characters.This specification extends POP3
using the
POP3 extension mechanism to
permit un&nbhy;encoded UTF&nbhy;8 in headers
and bodies (e.g., transferred using 8&nbhy;bit content&nbhy;transfer&nbhy;encoding)
as
described in "Internationalized Email
Headers".
It also adds a mechanism to support login names and passwords
containing a UTF&nbhy;8 string (see below), a mechanism to support UTF&nbhy;8
strings in protocol-level response strings, and the ability to
negotiate a language for such response strings.This specification also adds a new response code to
indicate that a message was not delivered
because it required UTF&nbhy;8 mode (as discussed in ) and the
server was unable or unwilling to create and deliver a
surrogate form of the message as
discussed in Section 7 of "IMAP Support for UTF&nbhy;8" .
This specification replaces an earlier, experimental, approach to the
same problem .The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in
"Key words for use in RFCs to Indicate
Requirement Levels".The terms "UTF&nbhy;8 string" or "UTF&nbhy;8 character" are used to refer to
Unicode characters, which may or may not be members of the ASCII
repertoire, encoded in UTF&nbhy;8 , a standard Unicode encoding form. All
other specialized terms used in this specification are defined in the Email Address Internationalization
framework document.In examples, "C:" and "S:" indicate lines sent by the client and
server, respectively. If a single "C:" or "S:" label applies to multiple
lines, then the line breaks between those lines are for editorial
clarity only and are not part of the actual protocol exchange.Note that examples always use ASCII characters due to
limitations of the RFC format; otherwise, some examples
for the "LANG" command would have appeared incorrectly.This specification adds a new POP3 Extension
capability response tag and command to specify support for
header field information outside the ASCII repertoire.
The capability tag and
new command and functionality are described below.
UTF8USERUTF8USER, PASS, APOP, LIST, TOP, RETRboth / noAUTHORIZATIONthis documentThis capability adds the "UTF8" command to POP3. The
"UTF8" command switches the session from the ASCII&nbhy;only mode of POP3
to UTF&nbhy;8 mode. The UTF&nbhy;8 mode means that all messages
transmitted between servers and clients are UTF&nbhy;8 strings, and
both servers and clients can send and accept UTF&nbhy;8 strings.The "UTF8" command enables UTF&nbhy;8 mode. The "UTF8" command has no parameters.
UTF&nbhy;8
mode has no effect on messages in an ASCII&nbhy;only maildrop.
Messages in native Unicode maildrops can be encoded in
UTF&nbhy;8 using internationalized headers , in
8bit
content&nbhy;transfer&nbhy;encoding (see Section 2.8 of MIME ),
in ASCII, or in any combination of these options.
In UTF&nbhy;8 mode,
if the character encoding format of maildrops is UTF&nbhy;8 or ASCII, the messages are sent to the client as is; if the character encoding format of maildrops is a format other than
UTF&nbhy;8 or ASCII, the messages' encoding format SHOULD be converted to be UTF&nbhy;8 before they are sent to the client.
When UTF&nbhy;8 mode has not been enabled, character strings outside the ASCII
repertoire MUST NOT be
sent to the client as is.
If a client requests a UTF&nbhy;8 message when
UTF&nbhy;8 mode is not enabled, the server MUST either send the client
a surrogate message that complies with unextended POP and
Internet Mail Format without UTF&nbhy;8 mode support, or fail the request
with an -ERR response.
See Section 7 of "IMAP Support for UTF&nbhy;8" for information about creating a
surrogate message and for a discussion of potential issues.
of this document discusses "UTF8"
response codes.
The server MAY respond to the "UTF8" command with an
-ERR response.
Note that even in UTF&nbhy;8 mode, MIME binary
content&nbhy;transfer&nbhy;encoding as defined in Section 6.2 of
MIME is still not permitted. MIME 8bit
content&nbhy;transfer&nbhy;encoding (8BITMIME) is
obviously allowed.
The octet count (size) of a message reported in a response to
the "LIST" command SHOULD match the actual number of octets sent in a
"RETR" response (not counting byte&nbhy;stuffing). Sizes reported elsewhere,
such as in "STAT" responses and non&nbhy;standardized, free&nbhy;form text in
positive status indicators (following "+OK") need not be accurate,
but it is preferable if they are.
Normal operation for
maildrops that natively support non&nbhy;ASCII characters
will be for both
servers and clients to support the extension discussed in
this specification. Upgrading both clients and servers
is the only fully satisfactory way to support the
capabilities offered by the "UTF8" extension and SMTPUTF8
mail more generally. Servers must, however, anticipate the
possibility of a client attempting to access a message that
requires this extension without having issued the "UTF8"
command. There are no completely satisfactory responses
for this case other than upgrading the client to support
this specification. One solution, unsatisfactory because
the user may be confused by being able to access the
message through some means and not others, is that a server
MAY choose to reject the command to retrieve the message as
discussed in . Other alternatives, including the
possibility of creating and delivering a surrogate form of the
message, are discussed in Section 7 of "IMAP Support for UTF&nbhy;8" .Clients MUST NOT issue the "STLS"
command after issuing UTF8; servers MAY (but are not required
to) enforce this by rejecting with an -ERR response an "STLS"
command
issued subsequent to a successful "UTF8" command. (Because this is a
protocol error as opposed to a failure based on conditions,
an extended response code is not
specified.)If the USER argument is included with this capability, it indicates
that the server accepts UTF&nbhy;8 usernames and passwords.Servers that include the USER argument in the "UTF8" capability
response SHOULD apply SASLprep or one of its
Standards Track successors to the
arguments of the "USER" and "PASS" commands.A client or server that supports APOP and permits UTF&nbhy;8 in usernames or passwords MUST apply SASLprep
or one of its Standards Track successors
to the username and password used to compute the APOP digest.When applying SASLprep, servers MUST reject UTF&nbhy;8 usernames or passwords that contain a UTF&nbhy;8 character listed in
Section 2.3 of SASLprep.
When applying SASLprep to the USER argument, the PASS argument,
or the APOP username argument, a compliant server or client MUST
treat them as a query string . When applying SASLprep to the APOP password argument, a
compliant server or client MUST treat them as a stored string .If the server includes the USER argument in the UTF8 capability response,
the client MAY use UTF&nbhy;8 characters with a "USER", "PASS", or "APOP" command; the
client MAY do so before issuing the "UTF8" command. Clients MUST NOT use UTF&nbhy;8
characters when authenticating if the server did not include the USER argument
in the UTF8 capability response.
The server MUST reject UTF&nbhy;8 usernames or passwords that fail to
comply with the formal syntax
in UTF&nbhy;8.Use of UTF&nbhy;8 strings in the "AUTH" command is governed by
the POP3 SASL mechanism.This
document adds a new POP3 extension capability response tag to indicate support for
a new command: "LANG".
The capability tag and new command are
described below.
LANGnoneLANGAllboth / noAUTHORIZATION, TRANSACTIONthis documentPOP3 allows most +OK and -ERR server responses to include
human-readable text that, in some cases, might be presented to the
user. But that text is limited to ASCII by the POP3 specification. The "LANG" capability and
command permit a POP3 client to negotiate which language the server
uses when sending human-readable text.The "LANG" command requests that human-readable text included in all
subsequent +OK and -ERR responses be localized to a language matching
the language range argument (the "basic language range" as described by
the "Matching of Language Tags"). If the command succeeds, the server returns a
+OK response followed by a single space, the exact language tag
selected, and another space. Human-readable text in the
appropriate language then appears in the rest of the line. This, and subsequent protocol-level
human-readable text, is encoded in the UTF&nbhy;8 charset.If the command fails, the server returns an -ERR response and
subsequent human-readable response text continues to use the language
that was previously used.If the client issues a "LANG" command with the special "*"
language range argument, it indicates a request
to use a language designated as preferred by the server administrator.
The preferred language MAY vary based on the currently active user.If no argument is given and the POP3 server issues a positive
response, that response will usually consist of multiple lines. After the initial
+OK, for each language tag the server supports, the POP3 server
responds with a line for that language. This line is called a
"language listing".In order to simplify parsing, all POP3 servers are required to use
a certain format for language listings. A language listing consists
of the language tag of the message, optionally
followed by a single space and a human-readable description of the
language in the language itself, using the UTF&nbhy;8 charset.
There is no specific order to the listing of languages;
the order may depend on configuration or
implementation.
When a POP3 server uses a native non&nbhy;ASCII character maildrop, it is the responsibility
of the server to comply with the POP3 base
specification and Internet Message
Format when not in UTF&nbhy;8 mode.
When the server is not in UTF&nbhy;8 mode and the message
requires that mode, requests to download the message MAY
be rejected (as specified in the next section) or the
various alternatives outlined in above,
including creation and delivery of surrogates for the original message, MAY be
considered.
Per "POP3 Extension Mechanism",
this document adds a new response code: UTF8, described
below.UTF8-ERRLIST, TOP, RETR
The "UTF8" response code indicates that a failure is due to a request for
message content that contains a UTF&nbhy;8 string when the client is not in
UTF&nbhy;8 mode.
The client MAY reissue the command after entering UTF&nbhy;8 mode.
Sections and of this specification update two capabilities ("UTF8" and "LANG") in
the POP3 capability registry.Section 5 of this specification adds one new response code ("UTF8") to
the POP3 response codes registry.The security considerations of UTF&nbhy;8,
SASLprep, and the Unicode Format for Network Interchange apply to this specification,
particularly with respect to use of UTF&nbhy;8 strings in usernames and passwords.The "LANG *" command might reveal the existence and preferred
language of a user to an active attacker probing the system if the
active language changes in response to the "USER", "PASS", or "APOP"
commands prior to validating the user's credentials. Servers are strongly advised to
implement a configuration to prevent this exposure.It is possible for a man-in-the-middle attacker to insert a "LANG"
command in the command stream, thus, making protocol-level diagnostic
responses unintelligible to the user.
A mechanism to protect the
integrity of the session can be used to defeat such attacks. For example,
a client can issue the "STLS" command before issuing the "LANG"
command.
As with other internationalization upgrades, modifications
to server authentication code (in this case, to support
non&nbhy;ASCII strings) need to be done with care to avoid
introducing vulnerabilities (for example, in string parsing
or matching). This is particularly important if the native
databases or mailstore of the operating system use some
character set or encoding other than Unicode in UTF&nbhy;8.IMAP Support for UTF-8This specification extends the Internet Message Access Protocol
(IMAP) to support UTF-8 encoded international string in usernames, mail addresses
and message headers. This specification replaces RFC 5738.ISO 639-2:1998. Codes for the
representation of names of languages -- Part 2: Alpha-3 codeInternational
Organization for StandardizationThis non-normative section discusses the reasons behind some of the
design choices in this specification.Due to interoperability problems with the MIME Message Header Extensions and limited
deployment of the extended MIME parameter
encodings , it is hoped these 7-bit encoding mechanisms
can be deprecated in the future when UTF-8 header support becomes
prevalent. The USER capability (Section 2.2) and hence the upgraded "USER"
command and additional support for non-ASCII credentials, are
optional because the implementation burden
of SASLprep is not well understood, and
mandating such support in all cases could negatively impact
deployment.Thanks to John Klensin, Joseph Yee, Tony Hansen, Alexey Melnikov, and other Email Address Internationalization
working group participants who provided helpful suggestions and
interesting debate that improved this specification.