Skip to content

Introduce ObjectStore methods that take Session data #7135

@waynr

Description

@waynr

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

In influxdata/influxdb#25911, we are discussing ways to record trace spans that distinguish between two key code paths in an implementation of ObjectStore that either retrieves objects from an object store or from a local in-memory cache. The problem we have right now is that our ObjectStore implementation has no way to receive span info passed in from the calling context.

Describe the solution you'd like

This issue is proposing the following:

  • Add a new Session trait to the object_store crate similar to what exists in the datafusion::catalog
    • This would have a single method that returns a session config similar to datafusion::prelude::SessionConfig
      • This session config would have similar set_extension, with_extension, and get_extension methods that allow storing and retrieving Arc<dyn Any + Send + Sync + 'static> instances by type ID
  • Add a new set of methods to the ObjectStore trait take &dyn Session as a parameter
    • eg fn get_with_session(&self, session: &dyn Session, location: &Path)
    • these methods could have default impls that delegate to the corresponding existing method (eg get_with_session would just ignore the session parameter and call get by default) to avoid forcing existing implementers to change

This would allow me to do some refactoring in https://github.com/influxdata/influxdb/ and https://github.com/influxdata/influxdb3_core/ that would result in passing a &dyn Session with a properly-parented child span and a custom *_with_session defined in impl ObjectStore for MemCachedObjectStore -- resulting in trace spans properly contextualized in the hierarchy of a given query which also differentiate between calls that result in cached vs object store parquet file retrieval.

Describe alternatives you've considered

I've considered:

  • Using simple metrics to capture the high-level (ie not contextualized within a trace span hierarchy) difference between object store and cached parquet file retrievals.
    • Doesn't help us identify when poor query performance is caused by cache misses
  • Setting up MemcachedObjectStore with its own root hierarchy of spans
    • Also doesn't help us identify when poor query performance is caused by cache misses

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions