APP area C. Bormann Internet-Draft Universitaet Bremen TZI Intended status: Informational February 25, 2013 Expires: August 29, 2013 The BinaryPack1pre2 JSON-like representation format draft-bormann-apparea-bpack-01 Abstract JSON (RFC 4627) is an extremely successful format for the representation of structured information, supporting Boolean values, numbers, strings, arrays, and tables. Recently, a number of applications have started to look for binary representation formats that solve a similar problem. In particular, constrained node networks can benefit from such a binary representation format. A very successful binary representation that is otherwise comparable to JSON is MessagePack. Recently, a number of implementations have modified or extended MessagePack such that it allows for distinguishing UTF-8 strings from binary data. Further discussion on the MessagePack repository has resulted in proposals how to integrate such an addition back into the MessagePack community. This draft, as an independent effort, documents one such format, tentatively calling it BinaryPack1pre2 while the MessagePack extension proposals make their way through the MessagePack community. The current version -01 of this document is a snapshot that demonstrates a general direction. The details may change in future versions based on the development of the MessagePack specification. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Bormann Expires August 29, 2013 [Page 1] Internet-Draft binarypack1pre2 February 2013 This Internet-Draft will expire on August 29, 2013. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Objectives . . . . . . . . . . . . . . . . . . . . . . . 3 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 1.3. Notation . . . . . . . . . . . . . . . . . . . . . . . . 4 2. The BinaryPack1pre2 Representation Format . . . . . . . . . . 5 2.1. Data Types . . . . . . . . . . . . . . . . . . . . . . . 5 2.2. Integers . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3. Floating Point Values . . . . . . . . . . . . . . . . . . 6 2.4. Special Values . . . . . . . . . . . . . . . . . . . . . 6 2.5. Binary: Opaque Byte Strings . . . . . . . . . . . . . . . 7 2.6. UTF-8 Strings . . . . . . . . . . . . . . . . . . . . . . 7 2.7. Arrays . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.8. Tables . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1. JSON roundtripping . . . . . . . . . . . . . . . . . . . 9 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 5. Security Considerations . . . . . . . . . . . . . . . . . . . 9 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 7.1. Normative References . . . . . . . . . . . . . . . . . . 10 7.2. Informative References . . . . . . . . . . . . . . . . . 10 Appendix A. Unicode Considerations . . . . . . . . . . . . . . . 11 Appendix B. Potential future work . . . . . . . . . . . . . . . 12 B.1. Reserved Code Points . . . . . . . . . . . . . . . . . . 12 B.2. 16-bit floating point . . . . . . . . . . . . . . . . . . 12 B.3. DateTime . . . . . . . . . . . . . . . . . . . . . . . . 12 B.4. Prefixing extensions . . . . . . . . . . . . . . . . . . 13 B.5. Extension Points . . . . . . . . . . . . . . . . . . . . 13 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 13 Bormann Expires August 29, 2013 [Page 2] Internet-Draft binarypack1pre2 February 2013 1. Introduction (To be written - for now please see the Abstract.) A description of the MessagePack binary representation format can be found in [msgpack]. A recent proposal for an update, still under discussion, is in [msgpack-update]. One of the early proposals implementing separate types for byte strings and UTF-8 strings was called BinaryPack. An implementation of BinaryPack is available in [binarypack]. (An extension similar in spirit, but different in details, was made for the [msgpack-js] and [msgpack-js-browser] projects.) 1.1. Objectives (TBD, but this is a rough first approach:) The objectives of the present specification, roughly in decreasing order of importance, are: o Representing a reasonable set of basic data types and structures using binary encoding. "Reasonable" here is largely influenced by the capabilities of JSON, with the single addition of adding raw byte strings. The structures supported are limited to trees; no loops or lattice-style graphs. o Being implementable in a very small amount of code, thus being applicable to constrained nodes [I-D.ietf-lwig-terminology], even of class 1. (Complexity goal.) As a corollary: Being close to contemporary machine representations of data (e.g., not requiring binary-to-decimal conversion). o Being applicable to schema-less use. For schema-informed binary encoding, a number of approaches are already available in the IETF, including XDR [RFC4506]. (However, schema-informed use of the present specification, such as for a marshaling scheme for an RPC IDL, is not at all excluded. Any IDL for this is out of scope for this specification.) o Being reasonably compact. "Reasonable" here is bounded by JSON as an upper bound in size, and by implementation complexity maintaining a lower bound. The use of general compression schemes violates both of the complexity goals. o Being reasonably frugal in CPU usage. (The other complexity goal.) This is relevant both for constrained nodes and for potential usage in high-volume applications. Bormann Expires August 29, 2013 [Page 3] Internet-Draft binarypack1pre2 February 2013 o Supporting a reasonable level of round-tripping with JSON, as long as the data represented are within the capabilities of JSON. Defining a unidirectional mapping towards JSON for all types of data. 1.2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. The term "byte" is used in its now customary sense as a synonym for "octet". All multi-byte integers in this protocol are interpreted in network byte order. Where arithmetic is used, this specification uses the notation familiar from the programming language C, except that the operator "**" stands for exponentiation. 1.3. Notation This specification uses a trivial notation for code bytes and the bitfields in them the meaning of which should be mostly obvious. More formally speaking, the meaning of the notation is: Potential values for the code bytes themselves are expressed by templates that represent 8-bit most-significant-bit-first binary numbers (without any special prefix), where 0 stands for 0, 1 for 1, and variable segments in these code byte templates are indicated by sequences of the same letter such as kkkkkkk or ssss, the length of which indicates the length of the variable segment in bits. In the notation of values derived from the code bytes, 0b is used as a prefix for expressing binary numbers in most-significant-bit first notation (akin to the use of 0x for most-significant-digit-first hexadecimal numbers in the C programming language). Where the above- mentioned sequences of letters are then referenced in such a binary number in the text, the intention is that the value from these bitfields in the actual code byte be inserted. Example: The code byte template 101nssss stands for a byte that starts (most-significant-bit-first) with the bits 1, 0, and 1, and continues with five variable bits, the first of Bormann Expires August 29, 2013 [Page 4] Internet-Draft binarypack1pre2 February 2013 which is referenced as "n" and the next four are referenced as "ssss". Based on this code byte template, a reference to 0b0ssss000 means a binary number composed from a zero bit, the four bits that are in the "ssss" field (for 101nssss, the four least significant bits) in the actual byte encountered, kept in the same order, and three more zero bits. Also, 0xhh stands for the hexadecimal value hh, and 1B, 2B, 4B, 8B, nB stand for 1, 2, 4, 8, or n bytes of data following; (1B) etc. stand for the numerical value of these bytes as an integer interpreted in network byte order; nD stands for n data objects, each in turn in BinaryPack1pre2 representation format. 2. The BinaryPack1pre2 Representation Format 2.1. Data Types The BinaryPack1pre2 representation format is able to represent the following data types: o Integers (represented in signed and unsigned forms) o Floating point values (in IEEE 754 32-bit and 64-bit forms) o special values nil, false, true o opaque ("raw") byte strings, or "binary strings" o UTF-8 strings o arrays, which can contain any combination of data types o tables (often called maps, hashes, dictionaries; objects in JSON), which contain pairs, key and value, which may in turn be of any data type This list is mostly faithful to JSON [RFC4627], which however does not distinguish integer from floating point number types. Based on recent discussions on the use of binary representation formats, the present specification distinguishes UTF-8 strings from opaque binary strings. (Interestingly, such a separation was already done in the binaryjs implementation of a "95 % MessagePack" format [binarypack], so the author of the present specification started out by just lazily copying that; more recent input taken from the msgpack developers [msgpack-update] is the technical basis for the current proposal.) Bormann Expires August 29, 2013 [Page 5] Internet-Draft binarypack1pre2 February 2013 2.2. Integers BinaryPack1pre2 provides a number of representations for integer values, assuming that these occur often. The encoder is free to choose any of these representations that is able to represent the desired value. +----------+--------------+----------------------------+ | Bits | Value | Description | +----------+--------------+----------------------------+ | 0nnnnnnn | 0bnnnnnnn | Positive Integer (0..127) | | | | | | 111nnnnn | 0bnnnnn - 32 | Negative Integer (-32..-1) | | | | | | 0xcc 1B | 1B as uint | Unsigned Integer | | | | | | 0xcd 2B | 2B as uint | Unsigned Integer | | | | | | 0xce 4B | 4B as uint | Unsigned Integer | | | | | | 0xcf 8B | 8B as uint | Unsigned Integer | | | | | | 0xd0 1B | 1B as sint | Signed Integer | | | | | | 0xd1 2B | 2B as sint | Signed Integer | | | | | | 0xd2 4B | 4B as sint | Signed Integer | | | | | | 0xd3 8B | 8B as sint | Signed Integer | +----------+--------------+----------------------------+ 2.3. Floating Point Values BinaryPack1pre2 provides 32-bit and 64-bit IEEE 754 values. (See also Appendix B.2.) +---------+-----------------------+-------------+ | Bits | Value | Description | +---------+-----------------------+-------------+ | 0xca 4B | 4B as 32-bit IEEE 754 | Float | | | | | | 0xcb 8B | 8B as 64-bit IEEE 754 | Double | +---------+-----------------------+-------------+ 2.4. Special Values Bormann Expires August 29, 2013 [Page 6] Internet-Draft binarypack1pre2 February 2013 Similar to the special literals "false null true" in JSON, BinaryPack1pre2 provides three special values: +------+-------+---------------+ | Bits | Value | Description | +------+-------+---------------+ | 0xc0 | nil | null, nothing | | | | | | 0xc2 | false | Boolean false | | | | | | 0xc3 | true | Boolean true | +------+-------+---------------+ 2.5. Binary: Opaque Byte Strings (Note that the specific codepoint allocations in this section are very much up for discussion. It can also be argued that we should be spending some of the remaining reserved codepoints for short byte strings.) +------------+----------+----------------------------------+ | Bits | Value | Description | +------------+----------+----------------------------------+ | 0xd5 1B nB | n = (1B) | byte string (0..(2**8-1) bytes) | | | | | | 0xd6 2B nB | n = (2B) | byte string (0..(2**16-1) bytes) | | | | | | 0xd7 4B nB | n = (4B) | byte string (0..(2**32-1) bytes) | +------------+----------+----------------------------------+ 2.6. UTF-8 Strings +-------------+-------------+-----------------------------------+ | Bits | Value | Description | +-------------+-------------+-----------------------------------+ | 101nnnnn nB | n = 0bnnnnn | Short UTF-8 string (0..31 bytes) | | | | | | 0xd9 1B nB | n = (1B) | UTF-8 string (0..(2**8-1) bytes) | | | | | | 0xda 2B nB | n = (2B) | UTF-8 string (0..(2**16-1) bytes) | | | | | | 0xdb 4B nB | n = (4B) | UTF-8 string (0..(2**32-1) bytes) | +-------------+-------------+-----------------------------------+ Bormann Expires August 29, 2013 [Page 7] Internet-Draft binarypack1pre2 February 2013 The strings transported MUST be UTF-8 strings [RFC3629]. (The general assumption is that these UTF-8 strings are in Network Unicode form [RFC5198], see Appendix A for some more discussion.) 2.7. Arrays +-------------+------------+------------------------------------+ | Bits | Value | Description | +-------------+------------+------------------------------------+ | 1001nnnn nD | n = 0bnnnn | Short array (0..15 data elements) | | | | | | 0xdc 2B nD | n = (2B) | array (0..(2**16-1) data elements) | | | | | | 0xdd 4B nD | n = (4B) | array (0..(2**32-1) data elements) | +-------------+------------+------------------------------------+ 2.8. Tables +-------------+----------------+---------------------------------+ | Bits | Value | Description | +-------------+----------------+---------------------------------+ | 1000nnnn nD | n = 2 * 0bnnnn | Short table (0..15 data pairs) | | | | | | 0xde 2B nD | n = 2 * (2B) | table (0..(2**16-1) data pairs) | | | | | | 0xdf 4B nD | n = 2 * (4B) | table (0..(2**32-1) data pairs) | +-------------+----------------+---------------------------------+ The sequence of n elements is a sequence of pairs of data objects, each pair represented as one data object representing the key followed by the data object representing its associated value. 3. Discussion This draft tries to be faithful to the successful MessagePack [msgpack] format, including an recent extension proposal that enables the distinction between opaque binary byte strings and UTF-8 byte strings [msgpack-update]. Little analysis has been made whether a slightly different bit allocation (e.g., using up fewer of the code combination for single- byte integers) would be advantageous. However, the gains from a different allocation are likely to be limited except for pathological cases. (The main benefit achievable may be to have more codepoints reserved for future expansion.) Bormann Expires August 29, 2013 [Page 8] Internet-Draft binarypack1pre2 February 2013 A short floating point (e.g., based on the 16-bit IEEE 754 floating point value) might be a useful additional representation format. Adding decimal floating point values probably is not so useful, except where high fidelity to JSON is desired. Some additional data types might be useful for some protocols, e.g. UUIDs [RFC4122], date/time. See also Appendix B. This would further increase the distance from JSON that BinaryPack1pre2 creates by distinguishing opaque and UTF-8 strings. 3.1. JSON roundtripping BinaryPack1pre2 enables mostly lossless translation to JSON. JSON [RFC4627]. JSON roundtripping, however, is not necessarily the primary design goal of BinaryPack1pre2, but it is a consideration. In the translation of BinaryPack1pre2 to JSON, opaque byte strings SHOULD be converted to equivalent base64url [RFC4648] UTF-8 strings. Without a schema, it is hard to do the inverse consistently, as base64url encoded byte strings are not specially marked up in JSON. When translating BinaryPack1pre2 floating point values to JSON, the usual problem of converting binary fractions to decimal representation arises. In the other direction, the choice of a floating point format may be hard to do properly. Clearly, any number that can be transformed from a 64-bit IEEE 754 number to a 32-bit IEEE 754 number without loss of information can be represented as the latter. Without schema information, it may be hard to find other cases where the precision maybe is not that important. 4. IANA Considerations Once this has received some discussion, we will understand how exactly to register Internet media types for this. The potential extension mechanisms discussed in Appendix B may need an IANA registry. 5. Security Considerations (Nothing but generic warnings about correctly implementing protocol encoders/decoders so far; this section will certainly grow as additional security considerations become known.) 6. Acknowledgements MessagePack was developed and promoted by Sadayuki Furuhashi ("frsyuki"). Bormann Expires August 29, 2013 [Page 9] Internet-Draft binarypack1pre2 February 2013 BinaryPack is a minor derivation of MessagePack that was developed by Eric Zhang for the binaryjs project. A similar, but different extension was made by Tim Caswell for his [msgpack-js] and [msgpack-js-browser] projects. The author of the present specification deserves absolutely no credits whatsoever for any of this. 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003. [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, October 2006. [RFC5198] Klensin, J. and M. Padlipsky, "Unicode Format for Network Interchange", RFC 5198, March 2008. [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network Time Protocol Version 4: Protocol and Algorithms Specification", RFC 5905, June 2010. 7.2. Informative References [I-D.ietf-lwig-terminology] Bormann, C. and M. Ersue, "Terminology for Constrained Node Networks", draft-ietf-lwig-terminology-00 (work in progress), February 2013. [N4246R2] Lunde, K., "Stabilizing CJK Compatibility Ideographs through the use of Standardized Variants", ISO/IEC JTC1/ SC2/WG2 N4246R2, March 2012, . [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, July 2002. [RFC4122] Leach, P., Mealling, M., and R. Salz, "A Universally Unique IDentifier (UUID) URN Namespace", RFC 4122, July 2005. Bormann Expires August 29, 2013 [Page 10] Internet-Draft binarypack1pre2 February 2013 [RFC4506] Eisler, M., "XDR: External Data Representation Standard", STD 67, RFC 4506, May 2006. [RFC4627] Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006. [binarypack] Zhang, E., "BinaryPack for Javascript browsers", 2012, . [msgpack-js-browser] Caswell, T., "msgpack for the browser", 2012, . [msgpack-js] Caswell, T., "msgpack for node", 2012, . [msgpack-update] Furuhashi, S., "msgpack-update-proposal1.md", February 2012, . [msgpack] Ohta, K. and S. Colebourne, "MessagePack format specification", 2011, . Appendix A. Unicode Considerations (TBD. Some initial guidelines at [msgpack-update]. This section should make clear that:) o At the BinaryPack1pre2 encoding/decoding layer, implementations are never concerned about Unicode normalization. o Internet usage of Unicode is governed by [RFC5198]. The present specification will not try to second-guess the evolution of this standards-track document. o [RFC5198] states that >>Before transmission, all character sequences SHOULD be normalized according to Unicode normalization form "NFC"<<. There may be some need to interpret this "SHOULD" in the context of the present specification, as follows. o There is a very strong expectation that applications making use of BinaryPack1pre2 will lean towards using Unicode in NFC form, as opposed to NFD. In other words, receivers may expect data in the maximally composed form, as opposed to decomposed form. Bormann Expires August 29, 2013 [Page 11] Internet-Draft binarypack1pre2 February 2013 o The Normalization component of NFC may create problems in some applications (e.g., see [N4246R2]). Before this is repaired in some future version of Unicode, there is no expectation that all applications generating BinaryPack1pre2 always perform the canonical normalization where information loss would result. o There is a strong expectation that BinaryPack1pre2 receivers be resilient to the small variations in Unicode usage discussed here. Appendix B. Potential future work Two data types have been discussed for addition to BinaryPack1pre2. B.1. Reserved Code Points As of today, the following code points are reserved and could be used for further extension, if required: 0xc1, 0xc4..0xc9, 0xd4, 0xd8 B.2. 16-bit floating point 16-bit floating points have become popular recently. BinaryPack1pre2 could enable the efficient transport of small floating point numbers by adding a Half-precision floating point representation: +---------+-----------------------+-------------+ | Bits | Value | Description | +---------+-----------------------+-------------+ | 0xc9 2B | 2B as 16-bit IEEE 754 | Half | | | | | | 0xca 4B | 4B as 32-bit IEEE 754 | Float | | | | | | 0xcb 8B | 8B as 64-bit IEEE 754 | Double | +---------+-----------------------+-------------+ B.3. DateTime Many applications need the transport of Date/Time information. Some need micro- or nanosecond resolution, some are more concerned about significant range. In the IETF, both NTP timestamps [RFC5905] and ISO8601 dates [RFC3339] are popular. The former probably require short and long versions to accommodate the different requirements in precision and range. As a start, a 32.32 and a 64.64 NTP timestamp could be defined. ISO8601 dates would need a length indicator and could Bormann Expires August 29, 2013 [Page 12] Internet-Draft binarypack1pre2 February 2013 therefore look close to the string8 form in BinaryPack1pre2. It is worth limiting the set of choices based on some more input on what is actually required. B.4. Prefixing extensions As the small number of remaining code points could be used up quickly, some additions might preferably be expressed by a prefixing scheme. E.g., if 0xc1 is picked for prefixing, the format 0xc1 0xnn 0xd5 0x08 ... could be used for designating an 8-byte binary string (0xd5 0x08 ...) as e.g. a date/time in 32.32 NTP timestamp format; the same value for 0xnn could also be followed by a 16-byte binary string for a full 64.64 NTP timestamp and maybe even followed by an UTF-8 string for GeneralizedTime _or_ an ISO8601 time, depending on which of these formats are desirable. Implementations unaware of the semantics for a specific value of 0xnn could still process the information as a binary or UTF-8 string. The number of extensions defined this way should be kept very small, not only to preserve coding efficiency by making do with the single- byte discriminator. The values for 0xnn would then be maintained in an IANA registry, with a suitably careful allocation policy. This needs further discussion. B.5. Extension Points More generally, evolution of a format always raises considerations about compatibility. There are two directions of compatibility: - Old data/old senders to new receivers (forward compatibility) and - new data/new senders to old receivers (backward compatibility). Further extension of the msgpack format currently always loses backward compatibility, as there is no way for an older implementation to find out the length that is consumed by a construct that uses a new codepoint. In addition to a prefixing mechanism, the BinaryPack1pre2 format could include deliberate extension points that would at least allow an old receiver to decode future versions of the BinaryPack1pre2 format without losing synchronization in the byte stream, while possibly having to treat some of the information as opaque. Author's Address Bormann Expires August 29, 2013 [Page 13] Internet-Draft binarypack1pre2 February 2013 Carsten Bormann Universitaet Bremen TZI Postfach 330440 Bremen D-28359 Germany Phone: +49-421-218-63921 Email: cabo@tzi.org Bormann Expires August 29, 2013 [Page 14]