Are you an agency who works with Tyler Tech's Open Data Platform? Do you have a bunch of Census API Endpoints with data that needs collecting? Not enough humans to do the downloading? Then this app might be a solution to your data problem. Sorry to hear about the humans though - Good luck with that.
Designed to work with the Open Data Platfrom this app can be molded to look a table of endpoints on the platform, and will pull and wrangle those data and ship them off as one unified table back to a different table on the platform.
Make your table of endpoints a handy table of contents for your analyses and/or a collection of metadata about your data pipeline sources and now your endpoints are a public service, a process control table, and a dash of infrastructure as code...err, well, tables at least!
This project is for managing CT-related data flow for external, public sources from API's to/from the CT Open Data Portal. But others are welcome to use an adapt to their specific needs.
This is for public-facing data only. Under no circumstances should content/data containing sensitive or potentially identifiable be pushed to this repository.
Currently, for ease of reuse, the app expects some environmental variables to exist in order to run.
- CENSUS_API_KEY: Your Census API key.
- SOURCE_ID: The Open Data Platform "Four by Four" Identifier.
- TARGET_ID: The Destination table for the data on the Open Data Platform.
- SOCRATA_USER: Open Data Platform Username - if needed.
- SOCRATA_PASS: Open Data Platfrom Password - if needed.
- SOCRATA_TOKEN: Open Data Platform Application token.
- DOMAIN: Domain name of Portal e.g. data.ct.gov
Take care not to print, or commit sensitive information and/or environmental variables to your repos and logs.
This app currently uses uv for managing dependencies
and exectuion. uv
is included in the provided codespace devcontainer.json. Otherwise
users should have it installed locally. Read the uv
installation instructions
on GitHub for steps on how install it your local environment.
- Locally:
- Clone the repo.
- Set the required environmental variables.
- Ensure
uv
is installed. - Inside of the terminal
cd
to the repo folder and either:- execute
uv run flow
. - or execute
uv run src/flow/main.py
.
- execute
- Docker
- Clone the repo.
- Execute the
build.sh
file to build the image. - Execute the
run.sh
file to run the container and pull data from the Census API to your destination table.
In this repo is a template devcontainer.json
for developers who prefer to work inside
codespaces or containers.
Ruff is used to check the project on a push or pull request.
If using a GitHub Codespace with the devcontainer.json provided Ruff
will automatically
format code on save as well.
pre-commit is used and configured to have Ruff
fix and format code in a commit. It will block the commit until the issues
are resolved and/or the suggested fixes are committed.
There is a provided dependabot.yml
that directs GitHub's dependabot
to scan for vulnerabilities and security updates weekly. Dependabot scans
the devcontainers
and pip
ecosystems.