Skip to content
This repository was archived by the owner on Jun 26, 2025. It is now read-only.
This repository was archived by the owner on Jun 26, 2025. It is now read-only.

Dummy data files with date columns  #686

@iaindillingham

Description

@iaindillingham

It's now possible to pass a dummy data file when generating a cohort, avoiding some clunkiness associated with the expectations framework.

opensafely/documentation#499 describes how a researcher passed a dummy data file with a date column. However, the corresponding column in the study definition was the result of calling patients.categorised_as, meaning that if the dummy data file hadn't been passed, then the date column would have been a string column.

When verifying dummy data files, cohort-extractor assumes that a date column (query_args["column_type"] == "date") will be associated with a date_format argument. This assumption doesn't hold, here, so cohort-extractor raised the following error:

Traceback (most recent call last):
  File "/usr/local/bin/cohortextractor", line 33, in <module>
    sys.exit(load_entry_point('opensafely-cohort-extractor', 'console_scripts', 'cohortextractor')())
  File "/app/cohortextractor/cohortextractor.py", line 668, in main
    generate_cohort(
  File "/app/cohortextractor/cohortextractor.py", line 151, in generate_cohort
    _generate_cohort(
  File "/app/cohortextractor/cohortextractor.py", line 202, in _generate_cohort
    study.to_file(
  File "/app/cohortextractor/study_definition.py", line 92, in to_file
    validate_dummy_data(self.covariate_definitions, dummy_data_file)
  File "/app/cohortextractor/validate_dummy_data.py", line 31, in validate_dummy_data
    validator = get_binary_validator(col_name, query_args)
  File "/app/cohortextractor/validate_dummy_data.py", line 174, in get_binary_validator
    if query_args["date_format"] == "YYYY":
KeyError: 'date_format'

Here's the line, with some context:

column_type = query_args["column_type"]
if column_type == "date":
if query_args["date_format"] == "YYYY":
return date_validator_year

The line and the KeyError aren't especially informative. Instead, we could...

  • return date_validator_day (we might need to replace the call to isinstance with a call to is_datetime64_any_dtype)
  • raise a more informative error
  • something else 🤔

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions