Skip to content

JSON vs CBOR performance for ASCII text #519

@sugmanue

Description

@sugmanue

Hi there,

We're testing the performance of CBOR vs plain JSON and looks like, at least for ASCII text, JSON is quite faster, this speaks volumes about the Jackson performance of JSON but looks like CBOR still has room for improvement.

The performance tests can be found on this repository. For a simple class with five String fields and ASCII (non-escaped) strings JSON is almost twice as fast as CBOR for larger strings (between 193 and 231 chars)

Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   SMALL  avgt    5  266.316 ±  9.938  ns/op
MyBenchmark.json  ASCII_PRINTABLE   SMALL  avgt    5  243.984 ± 13.422  ns/op

Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE  MEDIUM  avgt    5  725.376 ±  8.700  ns/op
MyBenchmark.json  ASCII_PRINTABLE  MEDIUM  avgt    5  464.803 ± 20.404  ns/op

Benchmark                (flavor)  (size)  Mode  Cnt     Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  1126.297 ±  7.843  ns/op
MyBenchmark.json  ASCII_PRINTABLE   LARGE  avgt    5   664.466 ± 23.541  ns/op

As expected, this is not the case for multi-byte chars, for instance, chars from the CJK block, emojis or full ASCII (some of which requires escaping in plain JSON). See below

Benchmark         (flavor)  (size)  Mode  Cnt     Score    Error  Units
MyBenchmark.cbor       CJK   LARGE  avgt    5  2076.014 ± 67.569  ns/op
MyBenchmark.json       CJK   LARGE  avgt    5  2939.622 ± 16.501  ns/op

Benchmark         (flavor)  (size)  Mode  Cnt     Score     Error  Units
MyBenchmark.cbor     EMOJI   LARGE  avgt    5  2400.312 ±  11.203  ns/op
MyBenchmark.json     EMOJI   LARGE  avgt    5  8467.852 ± 243.559  ns/op

Benchmark           (flavor)  (size)  Mode  Cnt     Score     Error  Units
MyBenchmark.cbor  FULL_ASCII   LARGE  avgt    5  1106.835 ±  33.094  ns/op
MyBenchmark.json  FULL_ASCII   LARGE  avgt    5  2084.745 ± 104.819  ns/op

Given the prevalence of ASCII text it would be great if the performance could be at least as good but I feel that it should be better.

I played a bit with the loop in tight loop inside _finishShortText (see here) and I see some improvements but not consistent across architectures (better for my M1 laptop, not so much for x86).

I also played a bit with creating the String directly from the input buffer and letting Java take care of UTF8 (see here, missing some other spots and possibly flawed as I was just playing), that approach looks promising (see below) but has a drawback, it won't be able to detect malformed UTF-8 as Jackson does now. I think it can be added behind a feature flag for when we know and trust the source of the data.

Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  286.758 ± 11.447  ns/op
MyBenchmark.json  ASCII_PRINTABLE   LARGE  avgt    5  664.253 ± 20.294  ns/op

Any thoughts, better ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions