Skip to content

Conversation

@andreidan
Copy link
Contributor

If setNextReader in any sub fetch phase throws we will not stop the profile timer we started. Subphases do various things in setNextReader from validations, throwing Exceptions outright, to reading Lucene specific things like doc values.

This moves the timer.stop call in a finally block to make sure everything timer related is copacetic.

Relates to #132128

If setNextReader in any sub fetch phase throws we will not stop
the profile timer we started. Subphases do various things in
setNextReader from validations, throwing Exceptions outright, to
reading Lucene specific things like doc values.

This moves the timer.stop call in a finally block to make sure
everything timer related is copacetic.
@elasticsearchmachine
Copy link
Collaborator

Hi @andreidan, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Aug 8, 2025
@andreidan andreidan requested review from nik9000 and piergm August 8, 2025 10:45
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

@andreidan
Copy link
Contributor Author

Reproduced the error, without this fix using:

--- a/server/src/main/java/org/elasticsearch/search/fetch/subphase/FetchFieldsPhase.java
+++ b/server/src/main/java/org/elasticsearch/search/fetch/subphase/FetchFieldsPhase.java
@@ -129,6 +129,9 @@ public final class FetchFieldsPhase implements FetchSubPhase {
         return new FetchSubPhaseProcessor() {
             @Override
             public void setNextReader(LeafReaderContext readerContext) {
+                if (true) {
+                    throw new NullPointerException(" oh no ");
+                }

:

all shards failed
Failed to execute phase [query], all shards failed; shardFailures {[9g6CjqBHTTGLU7gD08QlWw][test][0]: org.elasticsearch.transport.RemoteTransportException: [node_s1][127.0.0.1:13512][indices:data/read/search[phase/query]]
Caused by: java.lang.IllegalStateException: #start call misses a matching #stop call
}
	at __randomizedtesting.SeedInfo.seed([636E8637557889D2:984E65DBA8F8B633]:0)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:624)
	at org.elasticsearch.action.search.SearchQueryThenFetchAsyncAction.onPhaseFailure(SearchQueryThenFetchAsyncAction.java:81)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:326)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:669)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.finishOneShard(AbstractSearchAsyncAction.java:430)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:422)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:279)
	at org.elasticsearch.action.ActionListenerImplementations.safeAcceptException(ActionListenerImplementations.java:78)
	at org.elasticsearch.action.ActionListenerImplementations.safeOnFailure(ActionListenerImplementations.java:89)
	at org.elasticsearch.action.DelegatingActionListener.onFailure(DelegatingActionListener.java:32)
	at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:54)
	at org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:598)
	at org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleException(TransportService.java:1705)
	at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1485)
	at org.elasticsearch.transport.InboundHandler.doHandleException(InboundHandler.java:510)
	at org.elasticsearch.transport.InboundHandler.handleException(InboundHandler.java:497)
	at org.elasticsearch.transport.InboundHandler.handlerResponseError(InboundHandler.java:488)
	at org.elasticsearch.transport.InboundHandler.executeResponseHandler(InboundHandler.java:164)
	at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:134)
	at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:102)
	at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:830)
	at org.elasticsearch.transport.InboundPipeline.forwardFragment(InboundPipeline.java:116)
	at org.elasticsearch.transport.InboundPipeline.lambda$doHandleBytes$1(InboundPipeline.java:78)
	at org.elasticsearch.transport.InboundDecoder.finishMessage(InboundDecoder.java:145)
	at org.elasticsearch.transport.InboundDecoder.internalDecode(InboundDecoder.java:130)
	at org.elasticsearch.transport.InboundDecoder.decode(InboundDecoder.java:59)
	at org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:79)
	at org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:69)
	at org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:54)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at org.elasticsearch.transport.netty4.NettyByteBufSizer.channelRead(NettyByteBufSizer.java:33)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:697)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:660)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.base/java.lang.Thread.run(Thread.java:1447)
Caused by: org.elasticsearch.ElasticsearchException$1: #start call misses a matching #stop call
	at org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:798)
	at org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:324)
	... 45 more
Caused by: java.lang.IllegalStateException: #start call misses a matching #stop call

Once this PR is in we will be able to see the real source of the problem (that's now being masked by this timer thing due to the try - finally block in FetchPhase#execute)

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll help! It looks like there's some other bug hiding behind this one that we were eating by throwing the profiling exception here. Exciting! Let's merge this and see!

@andreidan andreidan added v8.19.2 auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport Automatically create backport pull requests when merged labels Aug 8, 2025
@andreidan andreidan merged commit f4a7948 into elastic:main Aug 8, 2025
33 checks passed
@andreidan andreidan deleted the timer branch August 8, 2025 14:56
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.1

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 132570

andreidan added a commit to andreidan/elasticsearch that referenced this pull request Aug 8, 2025
If setNextReader in any sub fetch phase throws we will not stop
the profile timer we started. Subphases do various things in
setNextReader from validations, throwing Exceptions outright, to
reading Lucene specific things like doc values.

This moves the timer.stop call in a finally block to make sure
everything timer related is copacetic.
elasticsearchmachine pushed a commit that referenced this pull request Aug 8, 2025
If setNextReader in any sub fetch phase throws we will not stop
the profile timer we started. Subphases do various things in
setNextReader from validations, throwing Exceptions outright, to
reading Lucene specific things like doc values.

This moves the timer.stop call in a finally block to make sure
everything timer related is copacetic.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending >bug :Search Foundations/Search Catch all for Search Foundations Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v8.19.2 v9.1.2 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants