-
-
Notifications
You must be signed in to change notification settings - Fork 145
Description
Hi there,
We're testing the performance of CBOR vs plain JSON and looks like, at least for ASCII text, JSON is quite faster, this speaks volumes about the Jackson performance of JSON but looks like CBOR still has room for improvement.
The performance tests can be found on this repository. For a simple class with five String
fields and ASCII (non-escaped) strings JSON is almost twice as fast as CBOR for larger strings (between 193 and 231 chars)
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor ASCII_PRINTABLE SMALL avgt 5 266.316 ± 9.938 ns/op
MyBenchmark.json ASCII_PRINTABLE SMALL avgt 5 243.984 ± 13.422 ns/op
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor ASCII_PRINTABLE MEDIUM avgt 5 725.376 ± 8.700 ns/op
MyBenchmark.json ASCII_PRINTABLE MEDIUM avgt 5 464.803 ± 20.404 ns/op
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor ASCII_PRINTABLE LARGE avgt 5 1126.297 ± 7.843 ns/op
MyBenchmark.json ASCII_PRINTABLE LARGE avgt 5 664.466 ± 23.541 ns/op
As expected, this is not the case for multi-byte chars, for instance, chars from the CJK block, emojis or full ASCII (some of which requires escaping in plain JSON). See below
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor CJK LARGE avgt 5 2076.014 ± 67.569 ns/op
MyBenchmark.json CJK LARGE avgt 5 2939.622 ± 16.501 ns/op
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor EMOJI LARGE avgt 5 2400.312 ± 11.203 ns/op
MyBenchmark.json EMOJI LARGE avgt 5 8467.852 ± 243.559 ns/op
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor FULL_ASCII LARGE avgt 5 1106.835 ± 33.094 ns/op
MyBenchmark.json FULL_ASCII LARGE avgt 5 2084.745 ± 104.819 ns/op
Given the prevalence of ASCII text it would be great if the performance could be at least as good but I feel that it should be better.
I played a bit with the loop in tight loop inside _finishShortText
(see here) and I see some improvements but not consistent across architectures (better for my M1 laptop, not so much for x86).
I also played a bit with creating the String
directly from the input buffer and letting Java take care of UTF8 (see here, missing some other spots and possibly flawed as I was just playing), that approach looks promising (see below) but has a drawback, it won't be able to detect malformed UTF-8 as Jackson does now. I think it can be added behind a feature flag for when we know and trust the source of the data.
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor ASCII_PRINTABLE LARGE avgt 5 286.758 ± 11.447 ns/op
MyBenchmark.json ASCII_PRINTABLE LARGE avgt 5 664.253 ± 20.294 ns/op
Any thoughts, better ideas?