Skip to content

Messed up cache database when multiple samples containing the same file is requested #417

@AlkaidCheng

Description

@AlkaidCheng
from servicex import ServiceXSpec, General, Sample, deliver

def run_query(input_filename=None):
    import uproot
    #
    return uproot.open({input_filename: "CollectionTree"}).arrays(['EventInfoAux.eventNumber'])

filename = ("root://eosuser.cern.ch//eos/atlas/atlascerngroupdisk/phys-higgs/HSG1/MxAOD/h028/mc16a/Nominal/"
            "mc16a.PowhegPy8_bbH125.MxAODDetailedNoSkim.e6050_s3126_r9364_p4180_h028.root")

spec = ServiceXSpec(
    General=General(
        ServiceX="servicex-uc-af",
        Codegen="python",
        OutputFormat="root-file",
        Delivery="LocalCache",
    ),
    Sample=[
        Sample(
            Name="foo",
            XRootDFiles=[filename],
            Function=run_query
        ),
        Sample(
            Name="bar",
            XRootDFiles=[filename],
            Function=run_query
        )
    ]
)
deliver(spec)

This will give

{'foo': ['/tmp/servicex_chlcheng/1e39cdcb-8c78-46ac-9eee-9b23961d3c0e/_fb8d56b26c71d7760000ede59ac8cb59ead5b951364_p4180_h028.root'],
 'bar': ['/tmp/servicex_chlcheng/f62efa75-184a-490d-b77a-d77b5cda3db6/_fb8d56b26c71d7760000ede59ac8cb59ead5b951364_p4180_h028.root']}

Rerunning the request will give the error

File /pscratch/sd/c/chlcheng/local/miniconda/envs/ml-gpu/lib/python3.11/site-packages/servicex/query_cache.py:89, in QueryCache.get_transform_by_hash(self, hash)
     86     return None
     88 if len(records) != 1:
---> 89     raise CacheException("Multiple records found in db for hash")
     90 else:
     91     return TransformedResults(**records[0])

CacheException: Multiple records found in db for hash

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions