wdiff rfc7464.original rfc7464.txt

json
Internet Engineering Task Force (IETF) N. Williams
Internet-Draft
Request for Comments: 7464 Cryptonector
Intended status:
Category: Standards Track December 23, 2014
Expires: June 26, February 2015
ISSN: 2070-1721

JavaScript Object Notation (JSON) Text Sequences
draft-ietf-json-text-sequence-13

Abstract

This document describes the JSON JavaScript Object Notation (JSON) text
sequence format and associated media type, type "application/json-seq". A
JSON text sequence consists of any number of JSON texts, all encoded
in UTF-8, each prefixed by an ASCII Record Separator (0x1E), and each
ending with an ASCII Line Feed character (0x1A). (0x0A).

Status of this This Memo

This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents an Internet Standards Track document.

This document is a product of the Internet Engineering Task Force
(IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list It represents the consensus of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid the IETF community. It has
received public review and has been approved for a maximum publication by the
Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of six months RFC 5741.

Information about the current status of this document, any errata,
and how to provide feedback on it may be updated, replaced, or obsoleted by other documents obtained at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

This Internet-Draft will expire on June 26, 2015.
http://www.rfc-editor.org/info/rfc7464.

This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License.

Table of Contents

1. Introduction and Motivation . . . . . . . . . . . . . . . . 3 . 2
1.1. Conventions used Used in this document This Document . . . . . . . . . . . . . 3 2
2. JSON Text Sequence Format . . . . . . . . . . . . . . . . . 4 . 3
2.1. JSON text sequence parsing . Text Sequence Parsing . . . . . . . . . . . . . . . . 4 3
2.2. JSON text sequence encoding Text Sequence Encoding . . . . . . . . . . . . . . . . 5 4
2.3. Incomplete/invalid Incomplete/Invalid JSON texts need not be fatal Texts Need Not Be Fatal . . . . . . 5 4
2.4. Top-level numeric, 'true', 'false', Top-Level Values: numbers, true, false, and 'null' values null . . . 6 . . 5
3. Security Considerations . . . . . . . . . . . . . . . . . . 7 . 6
4. IANA Considerations . . . . . . . . . . . . . . . . . . . . 8 . 6
5. Acknowledgements Normative References . . . . . . . . . . . . . . . . . . . . 7
Acknowledgements . . . 9
6. Normative References . . . . . . . . . . . . . . . . . . . . 10 . 8
Author's Address . . . . . . . . . . . . . . . . . . . . . . 11 . . 8

1. Introduction and Motivation

The JavaScript Object Notation (JSON) [RFC7159] is a very handy
serialization format. However, when serializing a large sequence of
values as an array, or a possibly indeterminate-length or never-
ending sequence of values, JSON becomes difficult to work with.

Consider a sequence of one million values, each possibly 1 one kilobyte
when encoded -- roughly one gigabyte. It is often desirable to
process such a dataset in an incremental manner: manner without having to
first read all of it before beginning to produce results.
Traditionally
Traditionally, the way to do this with JSON is to use a "streaming"
parser, but these are neither not widely available, widely used, nor or easy to
use.

This document describes the concept and format of "JSON text
sequences", which are specifically not JSON texts themselves but are
composed of (possible) JSON texts. JSON text sequences can be parsed
(and produced) incrementally without having to have a streaming
parser (nor streaming encoder).

1.1. Conventions used Used in this document This Document

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119].

2. JSON Text Sequence Format

Two different sets of ABNF rules are provided for the definition of
JSON text sequences: one for parsers, parsers and one for encoders. Having
two different sets of rules permits recovery by parsers from
sequences where some of the elements are truncated for whatever
reason. The syntax for parsers is specified in terms of octet
strings which that are then interpreted as JSON texts texts, if possible. The
syntax for encoders, on the other hand, assumes that sequence
elements are not truncated.

JSON text sequences MUST use UTF-8 encoding; other encodings of JSON
(i.e., UTF-16 and UTF-32) MUST NOT be used.

2.1. JSON text sequence parsing Text Sequence Parsing

The ABNF [RFC5234] for the JSON text sequence parser is as given in
Figure 1.

JSON-sequence

input-JSON-sequence = *(1*RS possible-JSON)
RS = %x1E; "record separator" (RS), see RFC20 RFC 20
; Also known as: Unicode Character 'INFORMATION INFORMATION SEPARATOR
; TWO' TWO (U+001E)
possible-JSON = 1*(not-RS); attempt to parse as UTF-8-encoded
; JSON text (see RFC7159) RFC 7159)
not-RS = %x00-1d / %x1f-ff; any octets other than RS

Figure 1: JSON text sequence Text Sequence ABNF

In prose: a series of octet strings, each containing any octet other
than a record separator (RS) (0x1E) [RFC0020], all [RFC20]. All octet strings
separated from each other are
preceded by an RS octets. byte. Each octet string in the sequence is to be
parsed as a JSON text in the UTF-8 encoding [RFC3629].

If parsing of such an octet string as a UTF-8-encoded JSON text
fails, the parser SHOULD nonetheless continue parsing the remainder
of the sequence. The parser can report such failures to applications
(which
applications, which might then choose to terminate parsing of a sequence).
sequence. Multiple consecutive RS octets do not denote empty
sequence elements between them, them and can be ignored.

This document does not define a mechanism for reliably identifying
text sequence by position (for example, when sending individual
elements of an array as unique text sequences). For applications
where truncation is a possibility, this means that intended sequence
elements can be truncated, truncated and can even be missing entirely,
therefore entirely;
therefore, a reference to an nth element would be unreliable.

There is no end of sequence indicator.

2.2. JSON text sequence encoding Text Sequence Encoding

The ABNF for the JSON text sequence encoder is given in Figure 2.

JSON-sequence = *(RS JSON-text LF)
RS = %x1E; see RFC20 RFC 20
; Also known as: Unicode Character 'INFORMATION INFORMATION SEPARATOR
; TWO' TWO (U+001E)
LF = %x0A; "line feed" (LF), see RFC20 RFC 20
JSON-text = <given by RFC7159, RFC 7159, using UTF-8 encoding>

Figure 2: JSON text sequence Text Sequence ABNF

In prose: any number of JSON texts, each encoded in UTF-8 [RFC3629],
each preceded by one ASCII RS character, and each followed by a line
feed (LF). Since RS is an ASCII control character character, it may only
appear in JSON strings in escaped form (see [RFC7159]), and since RS
may not appear in JSON texts in any other form, RS unambiguously
delimits the start of any element in the sequence. RS is sufficient
to unambiguously delimit all top-level JSON value types other than
numbers. Following each JSON text in the sequence with an LF allows
detection of truncated JSON texts consisting of a number at the top-
level; see Section 2.4.

JSON text sequence encoders are expected to ensure that the sequence
elements are properly formed. When the JSON text sequence encoder
does the JSON text encoding, the sequence elements will naturally be
properly formed. When the JSON text sequence encoder accepts
already-encoded JSON texts, the JSON text sequence encoder ought to
to
parse them before adding them to a sequence.

Note that on some systems it's it"s possible to input RS by typing
'ctrl-^';
"ctrl-^"; on some system or applications applications, the correct sequence may be
'ctrl-v crtl-^'.
"ctrl-v ctrl-^". This is helpful when constructing a sequence
manually with a text editor.

2.3. Incomplete/invalid Incomplete/Invalid JSON texts need not be fatal

Per- Texts Need Not Be Fatal

Per Section 2.1, JSON text sequence parsers should not abort when an
octet string contains a malformed JSON text, instead text. Instead, the JSON text
sequence parser should skip to the next RS. Such a situation may
arise in contexts where, for example, append-writes data that is appended to log
files are to log files is truncated by the filesystem (e.g., due to a crash,
crash or administrative process termination).

Incremental JSON text parsers may be used, though of course failure
to parse a given text may result after first producing some
incremental parse results.

Sequence parsers should have an option to warn about truncated JSON
texts.

2.4. Top-level numeric, 'true', 'false', Top-Level Values: numbers, true, false, and 'null' values null

While objects, arrays, and strings are self-delimited in JSON texts,
numbers,
numbers and the values 'true', 'false', and 'null' are not. Only
whitespace can delimit the latter four kinds of values.

JSON text sequences use 0x0A as a "canary" octet to detect
truncation.

Parsers MUST check that any JSON texts that are a top-level number,
or which that might be 'true', 'false', or 'null' 'null', include JSON whitespace
(at least one byte matching the "ws" ABNF rule from [RFC7159]) after
that value, otherwise value; otherwise, the JSON-text may have been truncated. Note
that the LF following each JSON text matches the "ws" ABNF rule.

Parsers MUST drop JSON-text sequence elements consisting of non-self-
delimited top-level values that may have been truncated (that are not
delimited by whitespace). Parsers can report such texts as warnings
(including, optionally, the parsed text and/or the original octet
string).

For example, '<RS>123<RS>' might have been intended to carry the top-
level number 1234, but must have been it got truncated. Similarly, '<RS>true<RS>'
might have been intended to carry the invalid text 'trueish'.
'<RS>truefalse<RS>' is not two top-level values, 'true', and 'false';
it is simply not a valid JSON text.

Implementations may produce a value when parsing '<RS>"foo"<RS>'
because their JSON text parser might be able to consume bytes
incrementally, and
incrementally; since the JSON text in this case is a self-
delimiting self-delimiting
top-level value, the parser can produce the result without consuming
an additional byte. Such implementations ought to skip to the next
RS byte, possibly reporting any intervening non-whitespace bytes.

Detection of truncation of non-self-delimited sequence elements
(numbers, true, false, and null) is only possible when the sequence
encoder produces or receives complete JSON texts. Implementations
where the sequence encoder is not also in charge of encoding the
individual JSON texts should ensure that those JSON texts are
complete.

3. Security Considerations

All the security considerations of JSON [RFC7159] apply. This format
provides no cryptographic integrity protection of any kind.

As usual, parsers must operate on as-good-as untrusted input. input that is assumed to be
untrusted. This means that parsers must fail gracefully in the face
of malicious inputs.

Note that incremental JSON text parsers can produce partial results
and later indicate failure to parse the remainder of a text. A
sequence parser that uses an incremental JSON text parser might treat
a sequence like '<RS>"foo"<LF>456<LF><RS>' as a sequence of one
element ("foo"), while a sequence parser that uses a non-incremental
JSON text parser might treat the same sequence as being empty. This
effect, and texts that fail to parse and are ignored ignored, can be used to
smuggle data past sequence parsers that don't warn about JSON text
failures.

Repeated parsing and re-encoding of a JSON text sequence can result
in the addition (or stripping) of trailing LF bytes from (to)
individual sequence element JSON texts. This can break signature
validation. JSON has no canonical form for JSON texts, therefore
neither does the JSON text sequence format.

4. IANA Considerations

The MIME media type for JSON text sequences is application/json-seq.

Type name: application

Subtype name: json-seq

Required parameters: N/A

Optional parameters: N/A

Encoding considerations: binary

Security considerations: See <this document, once published>, RFC 7464, Section 3.

Interoperability considerations: Described herein.

Published specification: <this document, once published>. RFC 7464.

Applications that use this media type: <by publication time

<https://stedolan.github.io/jq> is likely to support this format>.
<https://github.com/mapbox/cligj>
<https://github.com/hildjj/json-text-sequence>

Fragment identifier considerations: N/A. N/A

Additional information:

o Deprecated alias names for this type: N/A. N/A

o Magic number(s): N/A

o File extension(s): N/A. N/A

o Macintosh file type code(s): N/A.

o N/A

Person & email address to contact for further information:

json@ietf.org

Intended usage: COMMON

Author: See the "Authors' Addresses" section of this document.

o Nicolas Williams (nico@cryptonector.com)

Change controller: IETF

5. Normative References

[RFC20] Cerf, V., "ASCII format for network interchange", STD 80,
RFC 20, October 1969,
<http://www.rfc-editor.org/info/rfc20>.

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.

[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, November 2003,
<http://www.rfc-editor.org/info/rfc3629>.

[RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234, January 2008,
<http://www.rfc-editor.org/info/rfc5234>.

[RFC7159] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data
Interchange Format", RFC 7159, March 2014,
<http://www.rfc-editor.org/info/rfc7159>.

Acknowledgements

Phillip Hallam-Baker proposed the use of JSON text sequences for
logfiles and pointed out the need for resynchronization. Stephen
Dolan created <https://github.com/stedolan/jq>, which uses something
like JSON text sequences (with LF as the separator between texts on
output, and requiring only such whitespace as needed to disambiguate
on input). Carsten Bormann suggested the use of ASCII RS, and Joe
Hildebrand suggested the use of LF in addition to RS for
disambiguating top-level number values. Paul Hoffman shepherded the
Internet-Draft.
document. Many others contributed reviews and comments on the JSON
Working Group mailing list.

6. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.

[RFC0020] Cerf, V., "ASCII format for network interchange", RFC 20,
October 1969.

[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, November 2003.

[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", STD 68, RFC 5234, January 2008.

[RFC7159] Bray, T., "The JavaScript Object Notation (JSON) Data
Interchange Format", RFC 7159, March 2014.

Author's Address

Nicolas Williams
Cryptonector, LLC

Email:

EMail: nico@cryptonector.com