From a89cfa6fbb8dec24d76999628936eb59351cd2ef Mon Sep 17 00:00:00 2001 From: Lucas Pardue Date: Thu, 17 Oct 2024 00:53:32 +0100 Subject: [PATCH] Return more-specific error when input might be application/json-seq JSON Test Sequences, aka JSON-SEQ, aka application/json-seq are defined in https://datatracker.ietf.org/doc/html/rfc7464. Per the RFC, the format is: any number of JSON texts, each encoded in UTF-8 [RFC3629], each preceded by one ASCII RS character, and each followed by a line feed (LF). jq supports this format but requires the --seq parameter to be used in order to correct parse it. If the option is omitted, then an ambiguous and confusing error message is printed. The RFC is designed to avoid this ambiguity: Since RS is an ASCII control character, it may only appear in JSON strings in escaped form (see [RFC7159]), and since RS may not appear in JSON texts in any other form, RS unambiguously delimits the start of any element in the sequence. RS is sufficient to unambiguously delimit all top-level JSON value types other than numbers. This change adds ASCII RS character (0x1e) detection when --seq is omitted, and prints a useful error message recommending to retry with the option. Fixes #3156. --- src/jv_parse.c | 2 ++ tests/jq.test | 4 ++++ 2 files changed, 6 insertions(+) diff --git a/src/jv_parse.c b/src/jv_parse.c index 519c2047f2..74ea7530a9 100644 --- a/src/jv_parse.c +++ b/src/jv_parse.c @@ -514,6 +514,8 @@ static pfunc check_literal(struct jv_parser* p) { case 'f': pattern = "false"; plen = 5; v = jv_false(); break; case '\'': return "Invalid string literal; expected \", but got '"; + case 0x1e: + return "Record Separator (RS) detected, this might be application/json-seq. Try using the --seq option."; case 'n': // if it starts with 'n', it could be a literal "nan" if (p->tokenbuf[1] == 'u') { diff --git a/tests/jq.test b/tests/jq.test index 404994e268..6f2a88a395 100644 --- a/tests/jq.test +++ b/tests/jq.test @@ -2191,6 +2191,10 @@ try fromjson catch . "{'a': 123}" "Invalid string literal; expected \", but got ' at line 1, column 5 (while parsing '{'a': 123}')" +try fromjson catch . +"\u001e{\"a\": 123}" +"Record Separator (RS) detected, this might be application/json-seq. Try using the --seq option. at line 1, column 2 (while parsing '\u001e{\"a\": 123}')" + # ltrimstr/1 rtrimstr/1 don't leak on invalid input #2977 try ltrimstr(1) catch "x", try rtrimstr(1) catch "x" | "ok"