Skip to content

Conversation

GL-S
Copy link
Collaborator

@GL-S GL-S commented Aug 8, 2025

Proposed changes

I have updated all land cover notebooks and relevant code to use STAC rather than datacube's dc.load().

The changes made to code in landcover.py are only for the plotting of land cover.
I have tested the DEA_Land_Cover.ipynb notebook that uses dc.load() with these recent changes, and the workflow is not broken.
So both the STAC and the datacube approach are still possible.

I have also made some minor changes to formatting and spelling.

Checklist

If this is a notebook, then have you:

  • Checked the structure of the notebook follows our DEA-notebooks template
  • Removed any unused Python packages from Load packages
  • Removed any unused/empty code cells
  • Removed any guidance cells (e.g. General advice)
  • Ensured that all code cells follow the PEP8 standard for code. The jupyterlab_code_formatter tool can be used to format code cells to a consistent style: select each code cell, then click Edit and then one of the Apply X Formatter options (YAPF or Black are recommended).
  • Included relevant tags in the final notebook cell (refer to the DEA Tags Index, and re-use tags if possible)
  • Tested notebook on the DEA Sandbox
  • Cleared all outputs, run notebook from start to finish, and save the notebook in the state where all cells have been sequentially evaluated
  • If applicable, update the Notebook currently compatible with line below the notebook title to reflect the environments the notebook is compatible with
  • Check for any spelling mistakes using the DEA Sandbox's built-in spellchecker (double click on markdown cells then right-click on pink highlighted words). For example:

sandbox_spellchecker

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@GL-S
Copy link
Collaborator Author

GL-S commented Aug 8, 2025

For some reason the figure in the pixel drill is different for the two methods for the same year, while all other plots and animations in the other notebooks are identical. I will need to investigate further to understand what's going on

@GL-S
Copy link
Collaborator Author

GL-S commented Aug 8, 2025

For some reason the figure in the pixel drill is different for the two methods for the same year, while all other plots and animations in the other notebooks are identical. I will need to investigate further to understand what's going on

Does odc.stac assign the central date to annual products, or the first day of the year?
datacube uses the central date, so LC for year 2020 is assigned to the 1st or 2nd July 2020.

I think by filtering using the year only and nearest as method (as we do in the pixel drill notebook) I am actually calling different years depending on if I use datacube or odc.stat.

EDIT: yes, that's what's happening. With odc.stac is actually more correct!
I think with datacube we were selecting 2019 instead.

@GL-S
Copy link
Collaborator Author

GL-S commented Aug 8, 2025

Also, should we keep the STAC versions of the notebook separate, like in a subfolder? Or is it ok to replace the datacube ones?

I am also thinking that maybe I shouldn't have modified the Microsoft Planetary Computer one, as it was also a comparison of methos on how to access different data for the same region and time period.

@cbur24
Copy link
Collaborator

cbur24 commented Aug 8, 2025

@GL-S @robbibt @caitlinadams

Maybe we should have some standardised text at the top of each notebook that loads with odc-stac? Something like:

⚠️ Important Note
This notebook loads data into xarray using odc-stac. This means:

  1. The notebook can be run either within the sandbox, or on any computer with an internet connection and the required Python packages installed (e.g. by running pip install dea-tools in your Python environment).
  2. Data loading performance will depend on the available internet bandwidth.

@cbur24
Copy link
Collaborator

cbur24 commented Aug 8, 2025

Also, should we keep the STAC versions of the notebook separate, like in a subfolder? Or is it ok to replace the datacube ones?

I am also thinking that maybe I shouldn't have modified the Microsoft Planetary Computer one, as it was also a comparison of methos on how to access different data for the same region and time period.

No, I believe the idea is to eventually replace all notebooks with the odc-stac versions, thereby freeing dea-notebooks from the sandbox

@GL-S
Copy link
Collaborator Author

GL-S commented Aug 8, 2025

@GL-S @robbibt @caitlinadams

Maybe we should have some standardised text at the top of each notebook that loads with odc-stac? Something like:

⚠️ Important Note This notebook loads data into xarray using odc-stac. This means:

  1. The notebook can be run either within the sandbox, or on any computer with an internet connection and the required Python packages installed (e.g. by running pip install dea-tools in your Python environment).
  2. Data loading performance will depend on the available internet bandwidth.

Yes, some kind of warning is a good is idea.

@robbibt
Copy link
Member

robbibt commented Aug 8, 2025

Yeah, I think some standards and templating will be really important. It might even be a good opportunity to refresh the entire top block of the Notebooks! Especially now we have things like Knowledge Hub, and if in the future we want to point out to options like Google Colab/Binder etc.

Just a general question: is there a short deadline for getting these changes merged? It just might be good to have a DEA Notebooks catchup soon to discuss how they'll fit with the broader STAC refresh plans, and see if we can make them as standardised and streamlined as possible.

@cbur24
Copy link
Collaborator

cbur24 commented Aug 8, 2025

Yeah, I think some standards and templating will be really important. It might even be a good opportunity to refresh the entire top block of the Notebooks! Especially now we have things like Knowledge Hub, and if in the future we want to point out to options like Google Colab/Binder etc.

Just a general question: is there a short deadline for getting these changes merged? It just might be good to have a DEA Notebooks catchup at some point to discuss how they'll fit with the broader STAC refresh plans, and see if we can make them as standardised and streamlined as possible.

No pressing deadlines on this, was just something that came up so we progressed it. I agree - let's set a task for updating the DEA-Notebooks template with new info on stac, links to binder etc.

@robbibt
Copy link
Member

robbibt commented Aug 8, 2025

Also, should we keep the STAC versions of the notebook separate, like in a subfolder? Or is it ok to replace the datacube ones?

I am also thinking that maybe I shouldn't have modified the Microsoft Planetary Computer one, as it was also a comparison of methos on how to access different data for the same region and time period.

No, I believe the idea is to eventually replace all notebooks with the odc-stac versions, thereby freeing dea-notebooks from the sandbox

There's pros and cons of both approaches - personally I think having duplicates of every notebook will get unwieldy very fast, and make maintenance really difficult at the scale of the entire repo (it's already a lot of work even with our current notebooks). So I'd probably lean towards not duplicating, but nothing is set in stone yet.

@robbibt
Copy link
Member

robbibt commented Aug 8, 2025

No pressing deadlines on this, was just something that came up so we progressed it. I agree - let's set a task for updating the DEA-Notebooks template with new info on stac, links to binder etc.

It's super neat, and having some working examples like this will make it so much easier to start doing this for the full repo - very exciting!

@GL-S GL-S marked this pull request as draft September 29, 2025 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants