|
| 1 | +# JSON framework |
| 2 | + |
| 3 | +## Context and Problem Statement |
| 4 | + |
| 5 | +JSON serialization and deserialization are key elements of the Java client's performance (memory and CPU). |
| 6 | + |
| 7 | +The classic approach in Java used by many libraries is to use reflection to instantiate and populate objects. The major problem of this approach is that reflection is slow. JSON frameworks also need to build complex representations of class structure that can add up in large API surfaces like Elasticsearch's. |
| 8 | + |
| 9 | +We can leverage the fact that the ES Java client is heavily based on code generation to produce code that avoids reflection all together and uses simpler data structures. |
| 10 | + |
| 11 | +[Jackson's Afterburner](https://github.com/FasterXML/jackson-modules-base/tree/2.x/afterburner) uses a somehow similar approach, using dynamic bytecode generation at runtime. In our context we know the data structures ahead of time, so we can generate code ahead of time and skip the overhead of runtime bytecode generation. |
| 12 | + |
| 13 | +## Decision Drivers |
| 14 | + |
| 15 | +* Limit memory usage |
| 16 | +* Avoid costly features like reflection |
| 17 | + |
| 18 | +## Considered Options |
| 19 | + |
| 20 | +### Serialization |
| 21 | + |
| 22 | +Only one option was considered: every object class implements `JsonpSerializable` that has a single `serialize()` method that writes the object to a streaming JSON generator. The code of this method is generated for every class, delegating to common utility classes where needed. |
| 23 | + |
| 24 | +On top of that, the `JsonpSerializer` interface allows serializing any value, including primitive types and user-provided objects. A `JsonpMapper` can lookup a serializer for any value type. |
| 25 | + |
| 26 | +The rest of this document will address deserialization. |
| 27 | + |
| 28 | +### Deserialization |
| 29 | + |
| 30 | +Deserialization is more involved than serialization, as we must deal with complex JSON that sometimes allows representing a single structure using different forms (e.g. single-element arrays as a single value, property shortcuts, strings for any scalar type, etc.) |
| 31 | + |
| 32 | +We considered two levels of code generation: |
| 33 | + |
| 34 | +* generate a custom deserializer function per type that reads and parses the JSON stream |
| 35 | +* generate the construction of an object that handles the deserialization and calls setter methods on the object builder, similar to Elasticsearch's `ObjectParser` |
| 36 | + |
| 37 | +## Decision Outcome |
| 38 | + |
| 39 | +Generating a custom function per type would be the most performant, since a deserializer object that calls setter methods (as lambda expressions) adds some overhead compared to direct calls. |
| 40 | + |
| 41 | +However, a code generator is a program that produces a program. Kind of "meta programming". Given the complexity of deserialization and the unknown unknowns at the beginning of this project, we decided to us the second approach (a deserializer object that calls setters) even if it's less performant, in order to speed up development. It's still a lot more performant than using reflection! |
| 42 | + |
| 43 | +## Detailed design |
| 44 | + |
| 45 | +### Building blocks |
| 46 | + |
| 47 | +`JsonpDeserializer` is the common interface for all deserializers it provides two groups of methods: |
| 48 | +* methods to know or test if that deserializer accepts a given JSON event. This is useful to disambiguate some variations and of course check that the JSON stream is what is expected. |
| 49 | +* methods to deserialize a value, either at the current position in the stream, or from an event that was previously read. |
| 50 | + |
| 51 | +This interface also provides static deserializers for all scalar types (string, integer, etc.) |
| 52 | + |
| 53 | +The `ObjectDeserializer` is used to deserialize regular structures: |
| 54 | +- it has a map of serializers for every field, with their aliases, |
| 55 | +- supports shortcut properties, |
| 56 | +- handles `AdditionalProperty` and `AdditionalProperties`, |
| 57 | +- handles `SingleKeyDictionary` types that are flattened when represented as Java code. |
| 58 | + |
| 59 | +### Building an object deserializer |
| 60 | + |
| 61 | +We'll use `TermQuery` as an illustration as it uses most of `ObjectDeserializer` features. |
| 62 | + |
| 63 | +The generator produces a `setup<TypeName>Deserializer` method that does the configuration. This separate method is needed when a class has subclasses, as it will also be called to set up deserializers of child classes, as illustrated below by calling the `QueryBase` set up method (and is also why it's `protected`). |
| 64 | + |
| 65 | +```java |
| 66 | +protected static void setupTermQueryDeserializer(ObjectDeserializer<TermQuery.Builder> op) { |
| 67 | + QueryBase.setupQueryBaseDeserializer(op); |
| 68 | + op.add(Builder::value, FieldValue._DESERIALIZER, "value"); |
| 69 | + op.add(Builder::caseInsensitive, JsonpDeserializer.booleanDeserializer(), "case_insensitive"); |
| 70 | + |
| 71 | + op.setKey(Builder::field, JsonpDeserializer.stringDeserializer()); |
| 72 | + op.shortcutProperty("value", true); |
| 73 | +} |
| 74 | +``` |
| 75 | + |
| 76 | +The `setKey` call is the implementation of `SingleKeyDictionary`: it will "lift" the enclosing property name as a value of one the object's field. Like if `{"some-field":{"value":1.0}}` was actually `{"field":"some-field","value":1.0}`. |
| 77 | + |
| 78 | +The `shortcutProperty` call configures the "value" property as being the shortcut, i.e. `{"some-field":1.0}` is interpreted as `{"some-field":{"value":1.0}}`. |
| 79 | + |
| 80 | +And then we can create the actual deserializer: |
| 81 | + |
| 82 | +```java |
| 83 | +public static final JsonpDeserializer<TermQuery> _DESERIALIZER = |
| 84 | + ObjectBuilderDeserializer.lazy( |
| 85 | + Builder::new, |
| 86 | + TermQuery::setupTermQueryDeserializer |
| 87 | + ); |
| 88 | +``` |
| 89 | + |
| 90 | +`ObjectBuilderDeserializer` wraps an `ObjectDeserializer` with the mechanics needed to create a builder, deserialize it, and create the actual object afterward. It is built with the builder's constructor and the setup method. |
| 91 | +
|
| 92 | +### Lazy deserializers |
| 93 | +
|
| 94 | +The `lazy` method builds an implementation of `Deserializer` that will effectively lazily build the `ObjectBuilderDeserializer` the first time it's called. There are two main reasons for this approach. |
| 95 | + |
| 96 | +#### Circular dependencies and static initializers |
| 97 | + |
| 98 | +The JVM will run static field initializers as part of the class initialization that happens when a class is first referenced in an application. The static initializers of the parent class and of the classes referenced by the current class's static initializers are called before initializing the class itself. |
| 99 | + |
| 100 | +While this works for the majority of cases, this can cause issues in the case of recursive dependencies between the static initializers of different classes. This results in fancy things line NPEs or stack overflows at class loading time! |
| 101 | + |
| 102 | +And there are a number of circular dependencies in the Java client, mainly in queries and aggregations. Limiting class initialization to just creating a lazy wrapper avoids any problem at class loading time. |
| 103 | + |
| 104 | +#### Request-only classes don't need deserializers? |
| 105 | + |
| 106 | +API classes in the Java client can be used in requests, in responses, or both. Classes in the first category (requests only) don't need a deserializer. So incurring the cost of creating a deserializer when the class is loaded would just be wasteful. |
| 107 | + |
| 108 | +We could have decided to _not_ add deserializers to request-only classes, but that would have prevented the implementation of `withJson()`. This method allows users to create request objects from a JSON string. Under the hood it calls the object's deserializer. |
| 109 | + |
| 110 | +In this scenario, having a lazily initialized deserializer enables some interesting features while paying the price for its creation only if it's actually used. |
| 111 | + |
| 112 | +### Container deserializers |
| 113 | + |
| 114 | +Variant containers (i.e. externally tagged types like `Query`) use `ObjectDeserializer` explained above. The container class (that implements `TaggedUnion`) has a "pseudo-property" for each of the variants and regular properties for container-level fields. |
| 115 | + |
| 116 | +### Internally tagged variants deserialization |
| 117 | + |
| 118 | +Internally tagged variants (with a `"type"` property) and require peeking inside the JSON object to find out what their actual variant is. This is the role of `JsonpUtils.lookAheadFieldValue()`: it will read JSON events until finding the property that defines the variant, and return that property's value and a JSON parser that will traverse the buffered data. |
| 119 | + |
| 120 | +### Untagged union deserialization |
| 121 | + |
| 122 | +Untagged unions (without a discriminant) deserialization is handled by `UnionDeserializer`. It is configured by adding each of the union members with their deserializers. Union members can be of two kinds: |
| 123 | + |
| 124 | +- objects: their fields will be used to build "member handlers" associated to field names that uniquely identify the member. If member `X` has fields `a` and `b`, and member `Y` has fields `b` and `c`, then finding an `a` property identifies member `X` while `b` doesn't allow disambiguating variants. |
| 125 | + |
| 126 | +- non-objects, like array or string: the JSON event type will be used to identify the variant. |
| 127 | + |
| 128 | +Like seen previously with internally tagged variants, `UnionDeserializer` will look a head and buffer JSON events until finding the information needed to identify the variant. The events that were buffered are then replayed to deserialize the selected variant. |
0 commit comments