-
Notifications
You must be signed in to change notification settings - Fork 24
Description
With PR #43 was introduced the param decode_size_limit_bytes to provide a limit to the length of the json line that can be parsed to avoid potential OOM errors with very big oneliner files.
The setting has a default value of 20Mb, and this introduces a breakage in behaviour. If a user normally consumes lines of a size bigger than 20MB but that doesn't result in OOM error, with the new version 3.2.0 he will experience a looping error like:
[2024-08-30T16:35:28,969][ERROR][logstash.javapipeline ][main][1ecda24c09fdc5ba076096bc6e7499b710cb91e796741106f9e28599ed6a58a0] A plugin had an unrecoverable error. Will restart this plugin.
Pipeline_id:main
Plugin: <LogStash::Inputs::Stdin codec=><LogStash::Codecs::JSONLines decode_size_limit_bytes=>32768, id=>"debdf17e-41b7-48ab-a678-1c9324a1bc9d", enable_metric=>true, charset=>"UTF-8", delimiter=>"\n">, id=>"1ecda24c09fdc5ba076096bc6e7499b710cb91e796741106f9e28599ed6a58a0", enable_metric=>true>
Error: input buffer full
Exception: Java::JavaLang::IllegalStateException
Stack: org.logstash.common.BufferedTokenizerExt.extract(BufferedTokenizerExt.java:83)
org.logstash.common.BufferedTokenizerExt$INVOKER$i$1$0$extract.call(BufferedTokenizerExt$INVOKER$i$1$0$extract.gen)
that stuck the pipeline without any progression.
Proposal
This issue propose to get back to the original behaviour by default and eventually, when the codec has such decode_size_limit_bytes configured, if a line trigger is bigger than the limit, anyway create an event containing the partial string data. It also tag the event so that the pipeline can route and manage the error condition.
This can be implemented after BufferedTokenizerExt is fixed to throw an exception also when the offending token is not the first of the fragment (elastic/logstash#17017).
Ideally the tokenizer should return an iterator that verifies the size limit on every next method call.