-
Notifications
You must be signed in to change notification settings - Fork 0
merging
For a given region, such as “chfc”, it is conceivable to read the
multi-year data (MY) with the analysis-forecast (ANFC). As long as the
datasets are compatible, for the most part they play well together, but
there are some pitfalls to consider which we’ll try to point out. Here
we simply read the databases and merge them using the multiple = TRUE
argument. Easy!
suppressPackageStartupMessages({
library(andreas)
library(dplyr)
library(stars)
})
path = copernicus_path("chfc")
DB = read_database(path, multiple = TRUE)
dplyr::count(DB, product)## # A tibble: 2 × 2
## product n
## <chr> <int>
## 1 GLOBAL_ANALYSISFORECAST_PHY_001_024 4858
## 2 GLOBAL_MULTIYEAR_PHY_001_030 82845
We can go a bit further and tally the number of variables per product family.
tally = dplyr::count(DB, product, id, variable)
print(tally, n = nrow(tally))## # A tibble: 24 × 4
## product id variable n
## <chr> <chr> <chr> <int>
## 1 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy-cur_anf… uo 49
## 2 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy-cur_anf… vo 49
## 3 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy-so_anfc… so 49
## 4 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy-thetao_… thetao 49
## 5 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy-wcur_an… wo 1141
## 6 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy_anfc_0.… mlotst 49
## 7 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy_anfc_0.… pbo 1141
## 8 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy_anfc_0.… sob 1141
## 9 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy_anfc_0.… tob 1141
## 10 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy_anfc_0.… zos 49
## 11 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_my_0.08… bottomT 10408
## 12 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_my_0.08… mlotst 10408
## 13 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_my_0.08… so 10408
## 14 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_my_0.08… thetao 10408
## 15 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_my_0.08… uo 10408
## 16 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_my_0.08… vo 10408
## 17 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_my_0.08… zos 10408
## 18 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_myint_0… bottomT 1427
## 19 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_myint_0… mlotst 1427
## 20 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_myint_0… so 1427
## 21 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_myint_0… thetao 1427
## 22 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_myint_0… uo 1427
## 23 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_myint_0… vo 1427
## 24 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_myint_0… zos 1427
We can filter the database in such a way that we are drawing from two different product suites within the same regional database.
First filter…
db = DB |>
filter(variable == "thetao") |>
group_by(product, id) |>
slice_head(n = 1) |>
arrange(date)
print(select(db, product, id, date))## # A tibble: 3 × 3
## # Groups: product, id [3]
## product id date
## <chr> <chr> <date>
## 1 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_my_0.083deg_… 1993-01-01
## 2 GLOBAL_MULTIYEAR_PHY_001_030 cmems_mod_glo_phy_myint_0.083d… 2021-07-01
## 3 GLOBAL_ANALYSISFORECAST_PHY_001_024 cmems_mod_glo_phy-thetao_anfc_… 2025-05-28
Now read…
s = read_andreas(db, path)
s## stars object with 3 dimensions and 1 attribute
## attribute(s):
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## thetao -1.722617 5.689129 12.22019 12.83201 20.24998 28.99976 103364
## dimension(s):
## from to offset delta refsys
## x 1 415 -77.04 0.08333 WGS 84
## y 1 261 56.71 -0.08333 WGS 84
## time 1 3 NA NA POSIXct
## values x/y
## x NULL [x]
## y NULL [y]
## time 1993-01-01 UTC, 2021-07-01 UTC, 2025-05-28 UTC
Plot…
plot(s)
## Issues
to consider
Keeps these issues in mind when trying to work with multiple databases merged into one.
If the product is accidentally dropped then we lack the context to
read files from the databases. Be sure you check that product is
retained.
And astute observer might notice that two variables for sea water
temperature at the floor are tob and bottomT. There are ways to deal
with that in your code, but the variable column in the database must
retain these values.
Currently, every regional database has one or more product databases - and they are compatible. Incompatible databases include those where the database columns don’t match, or the rasters can’t be easily combined into one compound array (stars).
While the tabular databases can be merged form different regions, their rasters cannot - they have different bounding boxes for starters. Just don’t do it; instead script as if they are separate data streams.
You can, but the simplicity of mergign them (almost not time cost) doesn’t warrant saving a special merged database. Why pay all the overhead of duplicative databases for almost no gain?