Remove CallerRunsPolicy from QueryBatcherImpl

[`QueryBatcherImpl`](https://github.com/marklogic/java-client-api/blob/4.0-master/src/main/java/com/marklogic/client/datamovement/impl/QueryBatcherImpl.java) supports two scenarios: query or iterator.  

For an Iterator `QueryBatcherImpl` will queue up the values from the `Iterator` as tasks to process (via listeners) as quickly as possible.   This could quickly run us out of memory if the processing is slower than the iterating/queueing (which is most likely the case).  Thus we chose [`ThreadPoolExecutor.CallerRunsPolicy`](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.CallerRunsPolicy.html) to throttle the queuing when it's getting ahead of the processing.  This is great for throttling but can get in the way of fully utilizing all threads (#879).  In this case what is needed is a policy where the queuing thread blocks until the queue has some more space, then it continues filling.

For a query `QueryBatcherImpl` starts one thread per forest and only queues up another batch after checking that the first batch wasn't empty.  This way it doesn't need to estimate the number of batches to retrieve, it just keeps going until it gets an empty batch, then it stops querying that forest.  Immediately after it sees that a batch is not empty it queues the next batch before it runs the first batch through the listeners.  So the only way that the rest of the threads (beyond the number of forests) get used is if the listeners take some time (which is likely).  If listeners take too long and the queue fills, `CallerRunsPolicy` might be undesirable because the thread trying to queue the next batch (the caller) will immediately be conscripted to retrieve the next batch of uris rather than finish running through listeners.  This makes it temporarily retrieve URIs depth-first before going through all the listeners.  This could create stack depth or memory usage complications.  A better policy in this scenario would be that once all threads are in use to always process the listeners first then try to queue up the next batch and block until there is space in the queue for it.  Processing the listeners first should reduce the queue size as each thread will add at most one more task to the queue and only when it's ready to run the next task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove CallerRunsPolicy from QueryBatcherImpl #881

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Remove CallerRunsPolicy from QueryBatcherImpl #881

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions