Skip to content

Make behaviour consistent when big oneliners JSON is provided #44

@andsel

Description

@andsel

With PR #43 was introduced the param decode_size_limit_bytes to provide a limit to the length of the json line that can be parsed to avoid potential OOM errors with very big oneliner files.
The setting has a default value of 20Mb, and this introduces a breakage in behaviour. If a user normally consumes lines of a size bigger than 20MB but that doesn't result in OOM error, with the new version 3.2.0 he will experience a looping error like:

[2024-08-30T16:35:28,969][ERROR][logstash.javapipeline    ][main][1ecda24c09fdc5ba076096bc6e7499b710cb91e796741106f9e28599ed6a58a0] A plugin had an unrecoverable error. Will restart this plugin.
  Pipeline_id:main
  Plugin: <LogStash::Inputs::Stdin codec=><LogStash::Codecs::JSONLines decode_size_limit_bytes=>32768, id=>"debdf17e-41b7-48ab-a678-1c9324a1bc9d", enable_metric=>true, charset=>"UTF-8", delimiter=>"\n">, id=>"1ecda24c09fdc5ba076096bc6e7499b710cb91e796741106f9e28599ed6a58a0", enable_metric=>true>
  Error: input buffer full
  Exception: Java::JavaLang::IllegalStateException
  Stack: org.logstash.common.BufferedTokenizerExt.extract(BufferedTokenizerExt.java:83)
org.logstash.common.BufferedTokenizerExt$INVOKER$i$1$0$extract.call(BufferedTokenizerExt$INVOKER$i$1$0$extract.gen)

that stuck the pipeline without any progression.

Proposal

This issue propose to get back to the original behaviour by default and eventually, when the codec has such decode_size_limit_bytes configured, if a line trigger is bigger than the limit, anyway create an event containing the partial string data. It also tag the event so that the pipeline can route and manage the error condition.

This can be implemented after BufferedTokenizerExt is fixed to throw an exception also when the offending token is not the first of the fragment (elastic/logstash#17017).
Ideally the tokenizer should return an iterator that verifies the size limit on every next method call.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions