Skip to content

Very slow to extract predictions from large projects #7976

@Fredrik00

Description

@Fredrik00

Is your feature request related to a problem? Please describe.
I have a large project in Label Studio for annotating bounding boxes (over 100k tasks). As part of my workflow I want to extract the annotations for all manually labelled tasks, followed by all predictions with a score above some threshold. I attempted to extract predictions through the task API, but the predictions information was empty. If I attempt to use the predictions API instead, it times out before I can get a response.
My only option right now appears to be using the predictions API with a task filter, fetching the predictions one task at a time, but this takes a very long time, even when running the requests in parallel (90+ minutes in my case).

Describe the solution you'd like
I think there are two options for improving prediction extraction:

  1. (Preferred) Make it possible to include prediction/annotation data from the tasks API (api/tasks). This would make it possible to apply filters and paging, which I believe would be much faster than the current workaround.
    Additionally filtering the predictions by model_versions to reduce the payload size would also be great here.
  2. Add paging/filtering to the predictions API (api/predictions), similar to that of the tasks API. This would allow us to fetch predictions for multiple tasks at a time, with a reasonable page size to ensure the request does not time out.

Describe alternatives you've considered
Attempted workarounds described above.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions