-
Notifications
You must be signed in to change notification settings - Fork 52
Add waterdata infrastructure #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
…t page not downloading, start to add more function outlines
dataretrieval/waterdata.py
Outdated
| bbox: Optional[List[float]] = None, | ||
| limit: Optional[int] = None, | ||
| max_results: Optional[int] = None, | ||
| convertType: bool = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we pass the api_key API parameter here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add documentation about this (still learning the details myself), but your API key should be passed as a header if it exists as an environment variable. This is the line used to grab the api key in one of the helper functions:
token = os.getenv("API_USGS_PAT")
So you'll want to get your API key, and then set it using:
os.environ["API_USGS_PAT"] = "<your key>"
You may need to restart your session to get it to "register".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And to be clear: all you need to do is have your key in your environment, you don't need to "set it" in the functions anywhere.
|
Understood, thanks! I was pulling it out of an environment variable myself
and expecting to set it in the retrieval functions, but the functions
pulling it themselves also works.
I appreciate your hard work on these!
…On Thu, Sep 25, 2025, 08:25 Elise Hinman ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In dataretrieval/waterdata.py
<#183 (comment)>
:
> + parameter_code: Optional[Union[str, List[str]]] = None,
+ statistic_id: Optional[Union[str, List[str]]] = None,
+ properties: Optional[List[str]] = None,
+ time_series_id: Optional[Union[str, List[str]]] = None,
+ daily_id: Optional[Union[str, List[str]]] = None,
+ approval_status: Optional[Union[str, List[str]]] = None,
+ unit_of_measure: Optional[Union[str, List[str]]] = None,
+ qualifier: Optional[Union[str, List[str]]] = None,
+ value: Optional[Union[str, List[str]]] = None,
+ last_modified: Optional[str] = None,
+ skipGeometry: Optional[bool] = None,
+ time: Optional[Union[str, List[str]]] = None,
+ bbox: Optional[List[float]] = None,
+ limit: Optional[int] = None,
+ max_results: Optional[int] = None,
+ convertType: bool = True
I will add documentation about this (still learning the details myself),
but your API key should be passed as a header if it exists as an
environment variable. This is the line used to grab the api key in one of
the helper functions:
token = os.getenv("API_USGS_PAT")
So you'll want to get your API key, and then set it using:
os.environ["API_USGS_PAT"] = "<your key>"
You may need to restart your session to get it to "register".
—
Reply to this email directly, view it on GitHub
<#183 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUI5SVU5QEID6FDYDBVEHL3UP3M7AVCNFSM6AAAAACHAHDID6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTENRYGA2DAOBUHA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
I see that in _default_headers()
Thanks!
…On Thu, Sep 25, 2025 at 8:32 AM Elise Hinman ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In dataretrieval/waterdata.py
<#183 (comment)>
:
> + parameter_code: Optional[Union[str, List[str]]] = None,
+ statistic_id: Optional[Union[str, List[str]]] = None,
+ properties: Optional[List[str]] = None,
+ time_series_id: Optional[Union[str, List[str]]] = None,
+ daily_id: Optional[Union[str, List[str]]] = None,
+ approval_status: Optional[Union[str, List[str]]] = None,
+ unit_of_measure: Optional[Union[str, List[str]]] = None,
+ qualifier: Optional[Union[str, List[str]]] = None,
+ value: Optional[Union[str, List[str]]] = None,
+ last_modified: Optional[str] = None,
+ skipGeometry: Optional[bool] = None,
+ time: Optional[Union[str, List[str]]] = None,
+ bbox: Optional[List[float]] = None,
+ limit: Optional[int] = None,
+ max_results: Optional[int] = None,
+ convertType: bool = True
And to be clear: all you need to do is have your key in your environment,
you don't need to "set it" in the functions anywhere.
—
Reply to this email directly, view it on GitHub
<#183 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABUI5SSOIKUSVNDVU3D7QVD3UP4IXAVCNFSM6AAAAACHAHDID6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTENRYGA4DINBSGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
dataretrieval/waterdata_helpers.py
Outdated
| from datetime import datetime | ||
| import pandas as pd | ||
| import json | ||
| import geopandas as gpd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to keep geopandas as an optional dependency, I think. See the nldi module for example.
… knows when they will receive a pandas df
jzemmels
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested out all of the read_waterdata_ functions in various ways and didn't run into errors. Just one comment on the usage of limit and max_results that confuses me. I've also read through the helper functions and everything looks good. I know there are other PRs proposed for the ehinman:add-waterdata-infrastructure branch, so let me know if you'd like me to look again after those changes are resolved.
dataretrieval/waterdata.py
Outdated
| output_id = "daily_id" | ||
|
|
||
| # Build argument dictionary, omitting None values | ||
| args = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slick
dataretrieval/waterdata.py
Outdated
| return waterdata_helpers.get_ogc_data(args, output_id, service) | ||
|
|
||
| def get_monitoring_locations( | ||
| monitoring_location_id: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to add some argument checking statements, perhaps somewhere in the helpers? Making sure monitoring_location_id is a string, etc. Not sure how difficult this would be to implement, but I'm able to pass seemingly whatever I want to these arguments and a request is still made.
dataretrieval/waterdata.py
Outdated
| time: Optional[Union[str, List[str]]] = None, | ||
| bbox: Optional[List[float]] = None, | ||
| limit: Optional[int] = None, | ||
| max_results: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to a question I brought up in the dataRetrieval PR, I'm either not understanding how limit and max_results are supposed to work or they're not working as intended. The max_results argument doesn't seem to impact the number of rows returned in the output. Examples:
# wd.get_monitoring_locations(site_type_code="GW") # fetches all GW ML ids
wd.get_monitoring_locations(site_type_code="GW", max_results = 10) # fetches 10,000 GW ML ids
wd.get_monitoring_locations(site_type_code="GW", max_results = 5, limit = 10) # fetches 10 GW ML ids
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching up the monitoring location IDs and parameter codes fetched across these tests would be good. For a future PR: consider making lists of potential argument values and randomly selecting one for each of the test runs.
Waterdata revisions
This PR will add access to the new water data APIs via the
waterdatamodule.9/26/25: Added some updates to the README.md about the new module and API keys. Ready for testing and review.
EOD 9/25/25:
qualifieris a tricky argument and product owner suggests against using it as an argument unless you're really confident and restrictive about what you want: it can be a list of multiple qualifiers and if you just pick one qualifier value, it will only match rows with JUST that one. Default is to return a geopandas dataframe when geometry are returned, but because geopandas is an optional dependency, functions will return pandas dataframes if geopandas is not available. Unit tests have been created, with opportunities for more. I'd say the functions are ready for testing. I need to add in some info on the new functions in the README, etc.9/25/25: POST calls using the CQL2 query language appear to be working, and documentation for the functions has been added. I'm noticing some inconsistencies in some of the input parameters like
qualifierthat still need to be addressed/parsed. I also need to create unit tests and I'd like to have the functions return a geopandas dataframe whenskipGeometry=False.9/19/25: It is currently a work in progress that appears to work for GET calls in which the user requests one parameter (e.g. one site, one pcode, etc.) at a time. Still working out the POST calls in which a user may request multiple parameters (e.g. data from multiple sites, with multiple pcodes), which requires the use of the CQL2 query language. Stay tuned.