JSON vs CBOR performance for ASCII text

Hi there,

We're testing the performance of CBOR vs plain JSON and looks like, at least for ASCII text, JSON is quite faster, this speaks volumes about the Jackson performance of JSON but looks like CBOR still has room for improvement. 

The performance tests can be found on [this repository](https://github.com/sugmanue/cbor-vs-json-perf-test). For a simple class with five `String` fields and ASCII (non-escaped) strings JSON is almost twice as fast as CBOR for larger strings (between 193 and 231 chars)

```
Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   SMALL  avgt    5  266.316 ±  9.938  ns/op
MyBenchmark.json  ASCII_PRINTABLE   SMALL  avgt    5  243.984 ± 13.422  ns/op

Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE  MEDIUM  avgt    5  725.376 ±  8.700  ns/op
MyBenchmark.json  ASCII_PRINTABLE  MEDIUM  avgt    5  464.803 ± 20.404  ns/op

Benchmark                (flavor)  (size)  Mode  Cnt     Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  1126.297 ±  7.843  ns/op
MyBenchmark.json  ASCII_PRINTABLE   LARGE  avgt    5   664.466 ± 23.541  ns/op

```

As expected, this is not the case for multi-byte chars, for instance, chars from the CJK block, emojis or full ASCII (some of which requires escaping in plain JSON). See below

```
Benchmark         (flavor)  (size)  Mode  Cnt     Score    Error  Units
MyBenchmark.cbor       CJK   LARGE  avgt    5  2076.014 ± 67.569  ns/op
MyBenchmark.json       CJK   LARGE  avgt    5  2939.622 ± 16.501  ns/op

Benchmark         (flavor)  (size)  Mode  Cnt     Score     Error  Units
MyBenchmark.cbor     EMOJI   LARGE  avgt    5  2400.312 ±  11.203  ns/op
MyBenchmark.json     EMOJI   LARGE  avgt    5  8467.852 ± 243.559  ns/op

Benchmark           (flavor)  (size)  Mode  Cnt     Score     Error  Units
MyBenchmark.cbor  FULL_ASCII   LARGE  avgt    5  1106.835 ±  33.094  ns/op
MyBenchmark.json  FULL_ASCII   LARGE  avgt    5  2084.745 ± 104.819  ns/op
```

Given the prevalence of ASCII text it would be great if the performance could be at least as good but I feel that it *should* be better.

I played a bit with the loop in tight loop inside `_finishShortText` (see [here](https://gist.github.com/sugmanue/44124c3b12095e82ef82e6d303a3ae35)) and I see some improvements but not consistent across architectures (better for my M1 laptop, not so much for x86).

I also played a bit with creating the `String` directly from the input buffer and letting Java take care of UTF8 (see [here](https://gist.github.com/sugmanue/0d763382c429dac2f8e94a08210622bc), missing some other spots and possibly flawed as I was just playing), that approach looks promising (see below) but has a drawback, it won't be able to detect malformed UTF-8 as Jackson does now. I think it can be added behind a feature flag for when we know and trust the source of the data.

```
Benchmark                (flavor)  (size)  Mode  Cnt    Score    Error  Units
MyBenchmark.cbor  ASCII_PRINTABLE   LARGE  avgt    5  286.758 ± 11.447  ns/op
MyBenchmark.json  ASCII_PRINTABLE   LARGE  avgt    5  664.253 ± 20.294  ns/op
```

Any thoughts, better ideas?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

JSON vs CBOR performance for ASCII text #519

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

JSON vs CBOR performance for ASCII text #519

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions