Skip to content

Poor performance for String serialization #2645

@kilink

Description

@kilink

Gson version

Gson 2.10.1

Java / Android version

Happening on JDK21, but presumably is an issue for any JDK, as the relevant code hasn't changed in 10+ years.

Description

Serializing a String to JSON using toJson can result in excessive copying of the internal StringBuffer array for certain inputs, in particular for Strings with a length >= 33. This all boils down to how the internal StringBuffer used by StringWriter is initialized, and its resizing behavior: it starts with a capacity of 16, then tries to use double the previous capacity + 2 (or the size of the String being written if it is larger than that). If the capacity is increased only to accommodate the size of the String being written, then the final call to write the closing double quote will trigger another resize.

Expected behavior

Avoid excessive allocations / copying as much as possible. JsonWriter's string method could check if the Writer is a StringWriter, and call ensureCapacity on its StringBuffer, to avoid the excessive resizing.

--- a/gson/src/main/java/com/google/gson/stream/JsonWriter.java
+++ b/gson/src/main/java/com/google/gson/stream/JsonWriter.java
@@ -32,6 +32,7 @@ import com.google.gson.Strictness;
 import java.io.Closeable;
 import java.io.Flushable;
 import java.io.IOException;
+import java.io.StringWriter;
 import java.io.Writer;
 import java.math.BigDecimal;
 import java.math.BigInteger;
@@ -734,6 +735,9 @@ public class JsonWriter implements Closeable, Flushable {
   }
 
   private void string(String value) throws IOException {
+    if (out instanceof StringWriter) {
+      ((StringWriter) out).getBuffer().ensureCapacity(value.length() + 2);
+    }
     String[] replacements = htmlSafe ? HTML_SAFE_REPLACEMENT_CHARS : REPLACEMENT_CHARS;
     out.write('\"');
     int last = 0;

Obviously the above really only works for Strings that don't require escaping, but that may even be sufficient for most scenarios.

Actual behavior

Reproduction steps

It's easier to look at a trivial case where the input being serialized does not require any escaping:

import com.google.common.base.Strings;
import com.google.gson.Gson;
import java.io.StringWriter;

Gson gson = new Gson();
StringWriter writer = new StringWriter();
gson.toJson(Strings.repeat("A", 33), writer);
System.out.println(writer.getBuffer().capacity()); // prints 70
System.out.println(writer.getBuffer().length()); // prints 35

The above resulted in three resizes of the internal array, the last of which being the most excessive (35 -> 75 for a single character).

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions