-
Notifications
You must be signed in to change notification settings - Fork 74
Description
QueryBatcherImpl supports two scenarios: query or iterator.
For an Iterator QueryBatcherImpl will queue up the values from the Iterator as tasks to process (via listeners) as quickly as possible. This could quickly run us out of memory if the processing is slower than the iterating/queueing (which is most likely the case). Thus we chose ThreadPoolExecutor.CallerRunsPolicy to throttle the queuing when it's getting ahead of the processing. This is great for throttling but can get in the way of fully utilizing all threads (#879). In this case what is needed is a policy where the queuing thread blocks until the queue has some more space, then it continues filling.
For a query QueryBatcherImpl starts one thread per forest and only queues up another batch after checking that the first batch wasn't empty. This way it doesn't need to estimate the number of batches to retrieve, it just keeps going until it gets an empty batch, then it stops querying that forest. Immediately after it sees that a batch is not empty it queues the next batch before it runs the first batch through the listeners. So the only way that the rest of the threads (beyond the number of forests) get used is if the listeners take some time (which is likely). If listeners take too long and the queue fills, CallerRunsPolicy might be undesirable because the thread trying to queue the next batch (the caller) will immediately be conscripted to retrieve the next batch of uris rather than finish running through listeners. This makes it temporarily retrieve URIs depth-first before going through all the listeners. This could create stack depth or memory usage complications. A better policy in this scenario would be that once all threads are in use to always process the listeners first then try to queue up the next batch and block until there is space in the queue for it. Processing the listeners first should reduce the queue size as each thread will add at most one more task to the queue and only when it's ready to run the next task.