Skip to content

Commit f850d57

Browse files
swallezl-trotta
andauthored
Add JSON deserialization design doc (#1075)
* Add JSON deserialization design doc * Add comment on test --------- Co-authored-by: Laura Trotta <[email protected]>
1 parent c027cda commit f850d57

File tree

3 files changed

+159
-23
lines changed

3 files changed

+159
-23
lines changed

docs/design/0003-json-framework.md

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
# JSON framework
2+
3+
## Context and Problem Statement
4+
5+
JSON serialization and deserialization are key elements of the Java client's performance (memory and CPU).
6+
7+
The classic approach in Java used by many libraries is to use reflection to instantiate and populate objects. The major problem of this approach is that reflection is slow. JSON frameworks also need to build complex representations of class structure that can add up in large API surfaces like Elasticsearch's.
8+
9+
We can leverage the fact that the ES Java client is heavily based on code generation to produce code that avoids reflection all together and uses simpler data structures.
10+
11+
[Jackson's Afterburner](https://github.com/FasterXML/jackson-modules-base/tree/2.x/afterburner) uses a somehow similar approach, using dynamic bytecode generation at runtime. In our context we know the data structures ahead of time, so we can generate code ahead of time and skip the overhead of runtime bytecode generation.
12+
13+
## Decision Drivers
14+
15+
* Limit memory usage
16+
* Avoid costly features like reflection
17+
18+
## Considered Options
19+
20+
### Serialization
21+
22+
Only one option was considered: every object class implements `JsonpSerializable` that has a single `serialize()` method that writes the object to a streaming JSON generator. The code of this method is generated for every class, delegating to common utility classes where needed.
23+
24+
On top of that, the `JsonpSerializer` interface allows serializing any value, including primitive types and user-provided objects. A `JsonpMapper` can lookup a serializer for any value type.
25+
26+
The rest of this document will address deserialization.
27+
28+
### Deserialization
29+
30+
Deserialization is more involved than serialization, as we must deal with complex JSON that sometimes allows representing a single structure using different forms (e.g. single-element arrays as a single value, property shortcuts, strings for any scalar type, etc.)
31+
32+
We considered two levels of code generation:
33+
34+
* generate a custom deserializer function per type that reads and parses the JSON stream
35+
* generate the construction of an object that handles the deserialization and calls setter methods on the object builder, similar to Elasticsearch's `ObjectParser`
36+
37+
## Decision Outcome
38+
39+
Generating a custom function per type would be the most performant, since a deserializer object that calls setter methods (as lambda expressions) adds some overhead compared to direct calls.
40+
41+
However, a code generator is a program that produces a program. Kind of "meta programming". Given the complexity of deserialization and the unknown unknowns at the beginning of this project, we decided to us the second approach (a deserializer object that calls setters) even if it's less performant, in order to speed up development. It's still a lot more performant than using reflection!
42+
43+
## Detailed design
44+
45+
### Building blocks
46+
47+
`JsonpDeserializer` is the common interface for all deserializers it provides two groups of methods:
48+
* methods to know or test if that deserializer accepts a given JSON event. This is useful to disambiguate some variations and of course check that the JSON stream is what is expected.
49+
* methods to deserialize a value, either at the current position in the stream, or from an event that was previously read.
50+
51+
This interface also provides static deserializers for all scalar types (string, integer, etc.)
52+
53+
The `ObjectDeserializer` is used to deserialize regular structures:
54+
- it has a map of serializers for every field, with their aliases,
55+
- supports shortcut properties,
56+
- handles `AdditionalProperty` and `AdditionalProperties`,
57+
- handles `SingleKeyDictionary` types that are flattened when represented as Java code.
58+
59+
### Building an object deserializer
60+
61+
We'll use `TermQuery` as an illustration as it uses most of `ObjectDeserializer` features.
62+
63+
The generator produces a `setup<TypeName>Deserializer` method that does the configuration. This separate method is needed when a class has subclasses, as it will also be called to set up deserializers of child classes, as illustrated below by calling the `QueryBase` set up method (and is also why it's `protected`).
64+
65+
```java
66+
protected static void setupTermQueryDeserializer(ObjectDeserializer<TermQuery.Builder> op) {
67+
QueryBase.setupQueryBaseDeserializer(op);
68+
op.add(Builder::value, FieldValue._DESERIALIZER, "value");
69+
op.add(Builder::caseInsensitive, JsonpDeserializer.booleanDeserializer(), "case_insensitive");
70+
71+
op.setKey(Builder::field, JsonpDeserializer.stringDeserializer());
72+
op.shortcutProperty("value", true);
73+
}
74+
```
75+
76+
The `setKey` call is the implementation of `SingleKeyDictionary`: it will "lift" the enclosing property name as a value of one the object's field. Like if `{"some-field":{"value":1.0}}` was actually `{"field":"some-field","value":1.0}`.
77+
78+
The `shortcutProperty` call configures the "value" property as being the shortcut, i.e. `{"some-field":1.0}` is interpreted as `{"some-field":{"value":1.0}}`.
79+
80+
And then we can create the actual deserializer:
81+
82+
```java
83+
public static final JsonpDeserializer<TermQuery> _DESERIALIZER =
84+
ObjectBuilderDeserializer.lazy(
85+
Builder::new,
86+
TermQuery::setupTermQueryDeserializer
87+
);
88+
```
89+
90+
`ObjectBuilderDeserializer` wraps an `ObjectDeserializer` with the mechanics needed to create a builder, deserialize it, and create the actual object afterward. It is built with the builder's constructor and the setup method.
91+
92+
### Lazy deserializers
93+
94+
The `lazy` method builds an implementation of `Deserializer` that will effectively lazily build the `ObjectBuilderDeserializer` the first time it's called. There are two main reasons for this approach.
95+
96+
#### Circular dependencies and static initializers
97+
98+
The JVM will run static field initializers as part of the class initialization that happens when a class is first referenced in an application. The static initializers of the parent class and of the classes referenced by the current class's static initializers are called before initializing the class itself.
99+
100+
While this works for the majority of cases, this can cause issues in the case of recursive dependencies between the static initializers of different classes. This results in fancy things line NPEs or stack overflows at class loading time!
101+
102+
And there are a number of circular dependencies in the Java client, mainly in queries and aggregations. Limiting class initialization to just creating a lazy wrapper avoids any problem at class loading time.
103+
104+
#### Request-only classes don't need deserializers?
105+
106+
API classes in the Java client can be used in requests, in responses, or both. Classes in the first category (requests only) don't need a deserializer. So incurring the cost of creating a deserializer when the class is loaded would just be wasteful.
107+
108+
We could have decided to _not_ add deserializers to request-only classes, but that would have prevented the implementation of `withJson()`. This method allows users to create request objects from a JSON string. Under the hood it calls the object's deserializer.
109+
110+
In this scenario, having a lazily initialized deserializer enables some interesting features while paying the price for its creation only if it's actually used.
111+
112+
### Container deserializers
113+
114+
Variant containers (i.e. externally tagged types like `Query`) use `ObjectDeserializer` explained above. The container class (that implements `TaggedUnion`) has a "pseudo-property" for each of the variants and regular properties for container-level fields.
115+
116+
### Internally tagged variants deserialization
117+
118+
Internally tagged variants (with a `"type"` property) and require peeking inside the JSON object to find out what their actual variant is. This is the role of `JsonpUtils.lookAheadFieldValue()`: it will read JSON events until finding the property that defines the variant, and return that property's value and a JSON parser that will traverse the buffered data.
119+
120+
### Untagged union deserialization
121+
122+
Untagged unions (without a discriminant) deserialization is handled by `UnionDeserializer`. It is configured by adding each of the union members with their deserializers. Union members can be of two kinds:
123+
124+
- objects: their fields will be used to build "member handlers" associated to field names that uniquely identify the member. If member `X` has fields `a` and `b`, and member `Y` has fields `b` and `c`, then finding an `a` property identifies member `X` while `b` doesn't allow disambiguating variants.
125+
126+
- non-objects, like array or string: the JSON event type will be used to identify the variant.
127+
128+
Like seen previously with internally tagged variants, `UnionDeserializer` will look a head and buffer JSON events until finding the information needed to identify the variant. The events that were buffered are then replayed to deserialize the selected variant.

java-client/src/main/java/co/elastic/clients/json/UnionDeserializer.java

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,13 @@
3535
import java.util.Set;
3636
import java.util.function.BiFunction;
3737

38+
/**
39+
* A deserializer for union types that finds the actual variant using structural inspection of the JSON value.
40+
*
41+
* @param <Union> The union type we want to deserialize into
42+
* @param <Kind> The union's discriminant type
43+
* @param <Member> The base type of possible member values in the union.
44+
*/
3845
public class UnionDeserializer<Union, Kind, Member> implements JsonpDeserializer<Union> {
3946

4047
public static class AmbiguousUnionException extends RuntimeException {
@@ -48,6 +55,11 @@ private abstract static class EventHandler<Union, Kind, Member> {
4855
abstract EnumSet<Event> nativeEvents();
4956
}
5057

58+
/**
59+
* Handler for a single member (kind) of the union. It holds the list of properties that are unique to it
60+
* among all handlers, so that we can unambiguously identify it by looking at the properties that exist
61+
* in a JSON object.
62+
*/
5163
private static class SingleMemberHandler<Union, Kind, Member> extends EventHandler<Union, Kind, Member> {
5264
private final JsonpDeserializer<? extends Member> deserializer;
5365
private final Kind tag;
@@ -109,7 +121,7 @@ public static class Builder<Union, Kind, Member> implements ObjectBuilder<JsonpD
109121

110122
private final BiFunction<Kind, Member, Union> buildFn;
111123

112-
private final List<UnionDeserializer.SingleMemberHandler<Union, Kind, Member>> objectMembers = new ArrayList<>();
124+
private final List<SingleMemberHandler<Union, Kind, Member>> objectMembers = new ArrayList<>();
113125
private final Map<Event, EventHandler<Union, Kind, Member>> otherMembers = new HashMap<>();
114126
private final boolean allowAmbiguousPrimitive;
115127

@@ -135,7 +147,7 @@ private void addAmbiguousDeserializer(Event e, Kind tag, JsonpDeserializer<? ext
135147
mmh.handlers.sort(Comparator.comparingInt(a -> a.deserializer.acceptedEvents().size()));
136148
}
137149

138-
private void addMember(Event e, Kind tag, UnionDeserializer.SingleMemberHandler<Union, Kind, Member> member) {
150+
private void addMember(Event e, Kind tag, SingleMemberHandler<Union, Kind, Member> member) {
139151
if (otherMembers.containsKey(e)) {
140152
if (!allowAmbiguousPrimitive || e == Event.START_OBJECT || e == Event.START_ARRAY) {
141153
throw new AmbiguousUnionException("Union member '" + tag + "' conflicts with other members");
@@ -150,26 +162,31 @@ private void addMember(Event e, Kind tag, UnionDeserializer.SingleMemberHandler<
150162
}
151163
}
152164

165+
/**
166+
* Adds a member to the union deserializer.
167+
*/
153168
public Builder<Union, Kind, Member> addMember(Kind tag, JsonpDeserializer<? extends Member> deserializer) {
154169

155170
JsonpDeserializer<?> unwrapped = DelegatingDeserializer.unwrap(deserializer);
156171
if (unwrapped instanceof ObjectDeserializer) {
157172
ObjectDeserializer<?> od = (ObjectDeserializer<?>) unwrapped;
158173
Set<String> allFields = od.fieldNames();
159-
Set<String> fields = new HashSet<>(allFields); // copy to update
160-
for (UnionDeserializer.SingleMemberHandler<Union, Kind, Member> member: objectMembers) {
161-
// Remove respective fields on both sides to keep specific ones
162-
fields.removeAll(member.fields);
174+
175+
Set<String> uniqueFields = new HashSet<>(allFields); // copy that we'll update
176+
for (SingleMemberHandler<Union, Kind, Member> member: objectMembers) {
177+
// Keep fields that are unique to this member
178+
uniqueFields.removeAll(member.fields);
179+
// Remove the new member's fields from the existing member to ensure uniqueness
163180
member.fields.removeAll(allFields);
164181
}
165-
UnionDeserializer.SingleMemberHandler<Union, Kind, Member> member = new SingleMemberHandler<>(tag, deserializer, fields);
182+
SingleMemberHandler<Union, Kind, Member> member = new SingleMemberHandler<>(tag, deserializer, uniqueFields);
166183
objectMembers.add(member);
167184
if (od.shortcutProperty() != null) {
168185
// also add it as a string
169186
addMember(Event.VALUE_STRING, tag, member);
170187
}
171188
} else {
172-
UnionDeserializer.SingleMemberHandler<Union, Kind, Member> member = new SingleMemberHandler<>(tag, deserializer);
189+
SingleMemberHandler<Union, Kind, Member> member = new SingleMemberHandler<>(tag, deserializer);
173190
for (Event e: deserializer.nativeEvents()) {
174191
addMember(e, tag, member);
175192
}
@@ -181,7 +198,7 @@ public Builder<Union, Kind, Member> addMember(Kind tag, JsonpDeserializer<? exte
181198
@Override
182199
public JsonpDeserializer<Union> build() {
183200
// Check that no object member had all its fields removed
184-
for (UnionDeserializer.SingleMemberHandler<Union, Kind, Member> member: objectMembers) {
201+
for (SingleMemberHandler<Union, Kind, Member> member: objectMembers) {
185202
if (member.fields.isEmpty()) {
186203
throw new AmbiguousUnionException("All properties of '" + member.tag + "' also exist in other object members");
187204
}

java-client/src/test/java/co/elastic/clients/elasticsearch/model/SerializationTest.java

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,11 @@
4141

4242
public class SerializationTest extends ModelTestCase {
4343

44+
/**
45+
* Loads all {@code _DESERIALIER} fields. Since the actual deserializers are lazily constructed at runtime
46+
* the first time a deserializer is used, we load them all to make sure they can be created and initialized
47+
* successfully.
48+
*/
4449
@Test
4550
public void loadAllDeserializers() throws Exception {
4651

@@ -67,20 +72,6 @@ public void loadAllDeserializers() throws Exception {
6772
// Check that all classes that have a _DESERIALIZER field also have the annotation
6873
ClassInfoList withDeserializer = scan.getAllClasses().filter((c) -> c.hasDeclaredField("_DESERIALIZER"));
6974
assertFalse(withDeserializer.isEmpty(), "No classes with a _DESERIALIZER field");
70-
71-
// Disabled for now, empty response classes still need a deserializer object
72-
// e.g. ExistsIndexTemplateResponse, PingResponse, ExistsResponse, ExistsAliasResponse
73-
//
74-
// Set<String> annotationNames = withAnnotation.stream().map(c -> c.getName()).collect(Collectors.toSet());
75-
// Set<String> withFieldNames = withDeserializer.stream().map(c -> c.getName()).collect(Collectors.toSet());
76-
//
77-
// withFieldNames.removeAll(annotationNames);
78-
//
79-
// assertFalse(
80-
// withFieldNames.size() + " classes with the field but not the annotation: " + withFieldNames,
81-
// !withFieldNames.isEmpty()
82-
// );
83-
8475
}
8576

8677
@Test

0 commit comments

Comments
 (0)