Skip to content

Latest commit

 

History

History
326 lines (233 loc) · 8.01 KB

json-lines.md

File metadata and controls

326 lines (233 loc) · 8.01 KB

JSON Lines

JSON Lines essentially consists of several lines where each individual line is a valid JSON object, separated by a newline character \n.

Venice has built-in support for the JSON Lines text format as described in JSON Lines. It reads/writes JSON Lines from/to Venice data structures. No 3rd-party libraries are required.

Usage

To convert to/from a JSON Line string, use jsonl/write-str and jsonl/read-str:

(do
  (load-module :jsonl)
  
  ;; write two JSON lines (passing as list with two values)
  (println (jsonl/write-str [{"a" 10 :b 20} {"a" 11 "b" 21}])))
  
;; output (a two-line string)
;; {"a":10,"b":20}
;; {"a":11,"b":21}
(do
  (load-module :jsonl)

  ;; read three JSON lines (returned as list with three values)
  (println (jsonl/read-str """
                           {"a":10,"b":20}
                           {"a":11,"b":21}
                           {"a":12,"b":23}
                           """)))
                           
;; output (a Venice list with 3 maps)
;; ({"a" 10 "b" 20} {"a" 11 "b" 21} {"a" 12 "b" 23})

Note that these operations are not symmetric. Converting Venice data into JSON is lossy. JSON has a restricted set of data types so not all Venice datatypes can be adequately converted. E.g. there is no real decimal type and Venice int is converted to long.

Streams

JSON Lines can be spit to Java OutputStreams, Writers, or files:

  • io/bytebuf-out-stream
  • io/file-out-stream
  • io/buffered-writer
  • io/file
(do
  (load-module :jsonl)
  
  ;; spit a list of json lines (linefeeds are added implicitly )
  (try-with [wr (io/buffered-writer (io/file "data.jsonl"))]
     (jsonl/spit wr [{"a" 100, "b" 200} 
                     {"a" 101, "b" 201} 
                     {"a" 102, "b" 202}])
     (flush wr)))
(do
  (load-module :jsonl)
  
  ;; spit a list of json lines, line by line 
  (try-with [wr (io/buffered-writer (io/file "data.jsonl"))]
    (jsonl/spitln wr {"a" 100, "b" 200})
    (jsonl/spitln wr {"a" 101, "b" 201})
    (jsonl/spit wr {"a" 102, "b" 202})   ;; no LF after last line
    (flush wr)))

JSON Lines can be slurped from byte buffers, Java InputStreams, Readers, or files:

  • bytebuf
  • io/file-in-stream
  • io/bytebuf-in-stream
  • io/buffered-reader
  • io/file
(do
  (load-module :jsonl)
  
  (jsonl/slurp (io/file "data.jsonl")))
(do
  (load-module :jsonl)
  
  (try-with [rd (io/buffered-reader (io/file "data.jsonl"))]
    (jsonl/slurp rd)))

For memory efficient reading of large JSON Lines datasets use a transducer with filter-map-reduce functionality:

Note: make sure that Venice' up-front macro expansion is activated when processing large datasets to get best performance!

(do
  (load-module :jsonl)

  (defn test-data [lines]
    (let [template {"a" 100, "b" 200}
          data     (reduce #(conj %1 (assoc template :id %2)) 
                           [] 
                           (range 0 lines))]
      (jsonl/write-str data)))

  ;; transducer filter-map
  (def xform (comp (map #(dissoc % :c))
                   (map #(update % :b (fn [x] (+ x 5))))
                   (filter #(= 100 (:a %)))))

  (let [json (test-data 1_000)]
    (try-with [rd (io/buffered-reader json)]
      (let [slurper (jsonl/lazy-seq-slurper rd :key-fn keyword)]
        ;; transduce the lazy sequence
        (pr-str (transduce xform conj slurper))))))

Converting JSON Lines object key/value types

Venice supports for JSON Lines the same data type customizations as for its standard JSON handling.

Map JSON object keys to keywords

(do
  (load-module :jsonl)
  
  (jsonl/read-str """{"a":100,"b":100}""" :key-fn keyword)
  ;;=> {:a 100 :b 100}
)

Mapping JSON Lines object values explicitly

(do
  (load-module :jsonl)
  
  (jsonl/read-str """{"a": "2018-08-01T10:15:30", "b": "100.23", "c": 100}""" 
                  :key-fn keyword 
                  :value-fn (fn [k v] (case k 
                                        :a (time/local-date-time v)
                                        :b (decimal v) 
                                        v)))
                 
  ;;=> {:a 2018-08-01T10:15:30 :b 100.23M :c 100}
)

Note: the value function value-fn is applied after the key function key-fn and thus receives the mapped keys :a, :b, ... in the example above.

Special data types

Decimals

When dealing with floating-point numbers, we often encounter rounding errors known as the double precision issue.

(do
  (load-module :jsonl)
  
  (jsonl/write-str {:a (+ 0.1 0.2)})
  ;;=> "{\"a\":0.30000000000000004}"
)

Decimals avoid this problem and are the means of choice when dealing with financial amounts. But JSON does not support decimals as data type.

Venice decimals are converted to strings by default:

(do
  (load-module :jsonl)
  
  (jsonl/write-str {:a 100.23M})
  ;;=> "{\"a\":\"100.23\"}"
)

But Venice decimals can also be forced to be converted to floating-point numbers:

(do
  (load-module :jsonl)
  
  (json/write-str {:a (+ 0.1M 0.2M)} :decimal-as-double true)
  ;;=> "{\"a\":0.3}"
  
  (jsonl/write-str {:a 100.23M} :decimal-as-double true)
  ;;=> "{\"a\":100.23}"
)

Venice can emit decimals as 'double' floating-point values in exact representation using the :decimal-as-double option. On reading back this floating-point string with the :decimal option the number is directly converted into a decimal without intermediate double conversion, thus keeping the precision and allow for full decimal value range.

(do
  (load-module :jsonl)
 
  (jsonl/write-str {:a 100.33M} :decimal-as-double true)
  ;;=> "{\"a\":100.33}"

  (jsonl/write-str {:a 99999999999999999999999999999999999999999999999999.33M} 
                   :decimal-as-double true)
  ;;=> "{\"a\":99999999999999999999999999999999999999999999999999.33}"
  
  
  (jsonl/read-str """{"a":10.33}""" :decimal true)
  ;;=> {"a" 10.33M}  
  
  (jsonl/read-str """{"a":99999999999999999999999999999999999999999999999999.33}""" 
                  :decimal true)
  ;;=> {"a" 99999999999999999999999999999999999999999999999999.33M}  
)

Alternatively decimals can be parsed explicitly with a value mapping function:

(do
  (load-module :jsonl)
  
  (jsonl/read-str """{"a": "2018-08-01T10:15:30", "b": "100.23"}""" 
                  :key-fn keyword 
                  :value-fn (fn [k v] (case k 
                                        :a (time/local-date-time v)
                                        :b (decimal v) 
                                        v)))
                 
  ;;=> {:a 2018-08-01T10:15:30 :b 100.23M}
)

Binary data

Venice binary data is converted to a Base64 encoded string:

(do
  (load-module :jsonl)
  
  (jsonl/write-str {:a (bytebuf-from-string "abcdefgh" :utf-8)})
  
  ;;=> "{\"a\":\"YWJjZGVmZ2g=\"}"
)

Date & Time

Venice date/time data types are formatted as ISO date/time strings:

(do
  (load-module :jsonl)
  
  (jsonl/write-str {:a (time/local-date 2018 8 1)})
  ;;=> "{\"a\":\"2018-08-01\"}"

  (jsonl/write-str {:a (time/local-date-time "2018-08-01T14:20:10.200")})
  ;;=> "{\"a\":\"2018-08-01T14:20:10.2\"}"

  (jsonl/write-str {:a (time/zoned-date-time "2018-08-01T14:20:10.200+01:00")})
  ;;=> "{\"a\":\"2018-08-01T14:20:10.2+01:00\"}"
)

Integers

JSON does not distinguish between integer and long values hence Venice integers are converted to longs always on JSON write/read:

(do
  (load-module :jsonl)
  
  (-> (jsonl/write-str {:a 100I})
      (jsonl/read-str :key-fn keyword))
  
  ;;=> {:a 100}
)

However, if integers are required they can be parsed explicitly:

(do
  (load-module :jsonl)
  
  (-> (jsonl/write-str {:a 100I})
      (jsonl/read-str :key-fn keyword 
                      :value-fn (fn [k v] (case k 
                                            :a (int v) 
                                            v))))
                 
  ;;=> {:a 100I}
)