This Project Pythia Cookbook covers using the Open Science Data Federation (OSDF), a service for streaming scientific data across the globe.
Have you ever been frustrated by the complications of accessing scientific data? Why can't it "just work", like watching a Netflix movie?
The OSDF is a service that simplifies the streaming of a wide range of scientific datasets with a goal that data access "just works". It is meant to improve data availability for researchers working at any scale from individual laptops to distributed computing services such as the OSG's OSPool.
This cookbook gives motivating use cases from the geoscience community, including using datasets from NSF NCAR's Research Data Archive (RDA) and the datasets of AWS OpenData.
Brian Bockelman Harsha R. Hampapura Alexander Hoelzeman Emma Turetsky Amandha Wingert Barok Aashish Panta Justin Hiemstra Douglas Schuster Riley Conroy Kibiwott Koech
This cookbook is broken up into two pieces - some background knowledge on the OSDF service itself and then a series of motivating examples from different repositories accessible via the OSDF.
What is the OSDF? Who supports it? How can it benefit from my science? A dive into the infrastructure itself.
The Research Data Archive is NCAR's centrally-run archive, making decades of federally-funded earth systems data available. Learn how to use common data science tools when streaming from the RDA.
Florida International University (FIU) runs the Envistor project, aggregating climate datasets from the south Florida region.
NOAA maintains a copy of its SONAR-based datasets of Atlanta fisheries data in the popular Zarr format. This chapter shows how to load and use the datasets and fuse it with other products.
All of AWS OpenData is connected to the OSDF! This chapter includes examples of streaming Sentinel-2 data, stored in AWS's OpenData program, to your notebook.
You can either run the notebook using Binder or on your local machine.
The simplest way to interact with a Jupyter Notebook is through
Binder, which enables the execution of a
Jupyter Book in the cloud. The details of how this works are not
important for now. All you need to know is how to launch a Pythia
Cookbooks chapter via Binder. Simply navigate your mouse to
the top right corner of the book chapter you are viewing and click
on the rocket ship icon, (see figure below), and be sure to select
“launch Binder”. After a moment you should be presented with a
notebook that you can interact with. I.e. you’ll be able to execute
and even change the example programs. You’ll see that the code cells
have no output at first, until you execute them by pressing
{kbd}Shift
+{kbd}Enter
. Complete details on how to interact with
a live Jupyter notebook are described in Getting Started with
Jupyter.
Note, not all Cookbook chapters are executable. If you do not see the rocket ship icon, such as on this page, you are not viewing an executable book chapter.
If you are interested in running this material locally on your computer, you will need to follow this workflow:
-
Clone the
https://github.com/ProjectPythia/osdf-cookbook
repository:git clone https://github.com/ProjectPythia/osdf-cookbook.git
-
Move into the
osdf-cookbook
directorycd osdf-cookbook
-
Create and activate your conda environment from the
environment.yml
fileconda env create -f environment.yml conda activate osdf-cookbook
-
Move into the
notebooks
directory and start up Jupyterlabcd notebooks/ jupyter lab