- 
                Notifications
    
You must be signed in to change notification settings  - Fork 13
 
Dummy data files with date columns #686
Description
It's now possible to pass a dummy data file when generating a cohort, avoiding some clunkiness associated with the expectations framework.
opensafely/documentation#499 describes how a researcher passed a dummy data file with a date column. However, the corresponding column in the study definition was the result of calling patients.categorised_as, meaning that if the dummy data file hadn't been passed, then the date column would have been a string column.
When verifying dummy data files, cohort-extractor assumes that a date column (query_args["column_type"] == "date") will be associated with a date_format argument. This assumption doesn't hold, here, so cohort-extractor raised the following error:
Traceback (most recent call last):
  File "/usr/local/bin/cohortextractor", line 33, in <module>
    sys.exit(load_entry_point('opensafely-cohort-extractor', 'console_scripts', 'cohortextractor')())
  File "/app/cohortextractor/cohortextractor.py", line 668, in main
    generate_cohort(
  File "/app/cohortextractor/cohortextractor.py", line 151, in generate_cohort
    _generate_cohort(
  File "/app/cohortextractor/cohortextractor.py", line 202, in _generate_cohort
    study.to_file(
  File "/app/cohortextractor/study_definition.py", line 92, in to_file
    validate_dummy_data(self.covariate_definitions, dummy_data_file)
  File "/app/cohortextractor/validate_dummy_data.py", line 31, in validate_dummy_data
    validator = get_binary_validator(col_name, query_args)
  File "/app/cohortextractor/validate_dummy_data.py", line 174, in get_binary_validator
    if query_args["date_format"] == "YYYY":
KeyError: 'date_format'
Here's the line, with some context:
cohort-extractor/cohortextractor/validate_dummy_data.py
Lines 172 to 175 in 1514a7a
| column_type = query_args["column_type"] | |
| if column_type == "date": | |
| if query_args["date_format"] == "YYYY": | |
| return date_validator_year | 
The line and the KeyError aren't especially informative. Instead, we could...
- return 
date_validator_day(we might need to replace the call toisinstancewith a call tois_datetime64_any_dtype) - raise a more informative error
 - something else 🤔