WyAdapt API Example

Author

Shannon E. Albeke

WyAdapt Future Climate API Example

For this example, we provide some basic example workflows to accessing, querying and even downloading climate data available within the WyAdapt cyberinfrastructure. We use REST API’s to query for existing climate summary data that are stored within the ARCC S3 Pathfinder storage server as Cloud Optimized GeoTiffs (COGs). Currently we have multiple GCM’s in which daily data have been aggregated to monthly and annual COGs.

Using R with WyAdapt Climate Data

Within the WyAdapt.org application, you can find a complete listing of API Endpoints here: https://wyadapt.org/swagger/index.html

For the below examples we will want to use Program R and have the following Packages installed. If you do not have any of the following packages installed, the following code should get you started.

# install.packages(c("sf", "terra", "lubridate", "httr", "jsonlite", "tidyverse", "tidyjson", "plotly", "kableExtra"))

Once you have the correct packages installed, let’s load them into our session so we can do some work.

Now that our environment is prepared, let’s first check out the available datasets. Let’s start with the GCMs that are available:

gcm<- GET("https://wyadapt.org/api/ClimateDataRepo/GcmRcpVariantCombinations") %>% 
  content() %>% 
  spread_all() %>% 
  data.frame() %>% 
  select(gcm, variant, description) %>% 
  distinct() %>% 
  arrange(gcm) 
gcm variant description
access-cm2 r5i1p1f1 Commonwealth Scientific and Industrial Research Organization (CSIRO) and Bureau of Meteorology (BOM) Australia
canesm5 r1i1p2f1 Canadian Centre for Climate Modelling and Analysis
cesm2 r11i1p1f1 Community Earth System Model Contributors
cnrm-esm2-1 r1i1p1f2 Centre National de Recherches Météorologiques/ Centre Européen de Recherche et Formation Avancée en Calcul Scientifique
ec-earth3 r1i1p1f1 EC-Earth consortium
ec-earth3-veg r1i1p1f1 EC-Earth consortium
ensemble p75 75th percentile across all GCM
ensemble p10 10th percentile across all GCM
ensemble p50 50th percentile across all GCM
ensemble p90 90th percentile across all GCM
ensemble p25 25th percentile across all GCM
fgoals-g3 r1i1p1f1 Chinese Academy of Sciences
giss-e2-1-g r1i1p1f2 Goddard Institute for Space Studies
miroc6 r1i1p1f1 Atmosphere and Ocean Research Institute (The University of Tokyo), National Institute for Environmental Studies, and Japan Agency for Marine-Earth Science and Technology
mpi-esm1-2-hr r7i1p1f1 Max Planck Institute for Meteorology
mpi-esm1-2-hr r3i1p1f1 Max Planck Institute for Meteorology
mpi-esm1-2-lr r7i1p1f1 Max Planck Institute for Meteorology
noresm2-mm r1i1p1f1 Norwegian Earth System Model
taiesm1 r1i1p1f1 Research Center for Environmental Changes, Academia Sinica (Taiwan)
ukesm1-0-ll r2i1p1f2 Met Office, Hadley Centre (UK)

Admittedly, GCM’s can feel a little obtuse for individuals not in ‘the know’, but the ensembles have piqued my interest given they are an aggregation across all of the available GCM’s. What are the available variables?

vars<- GET("https://wyadapt.org/api/ClimateDataRepo/Variables") %>% 
  content() %>% 
  spread_all() %>% 
  data.frame() %>% 
  select(variable, alias, units, scaleFactor, description) 
variable alias units scaleFactor description
prec Precipitation mm 0.01 Precipitation for monthly or annual
t2 Average Temperature °C 0.01 Average Temperature for monthly or annual
t2max Maximum Temperature °C 0.01 Maximum Temperature for monthly or annual
t2min Minimum Temperature °C 0.01 Minimum Temperature for monthly or annual
cdd Max Consecutive Days Precipitation <1mm/day days 1.00 Max number of consecutive days with precipitation < 1mm/day (integer)
cwd Max Consecutive Days Precipitation >1mm/day days 1.00 Max number of consecutive days with precipitation >= 1 mm/day (integer)
fd Frost Days days 1.00 Frost days: Annual number of days with Min temp < 0C (integer)
id Icing Days days 1.00 Icing days: Annual number of days with Max temp < 0C (integer)
r20mm Annual Count Precipitaiton >=20mm/day days 1.00 Annual count of days when precipitation >= 20mm/day (integer)
r95p Annual Total Precipitation When Daily >95th mm 0.01 Annual total precipitation when daily precip exceeds the 95th percentile of wet day precip (float)
rx1day Annual Max 1-day Precipitation mm 0.01 Rx1day - Annual maximum one-day precipitation (float)
rx5day Annual Max 5-day Precipitation mm 0.01 Rx5day - Annual maximum five-day precipitation (float)
sdii Simple Precipitation Intensity Index mm 0.01 Simple Precipitation Intensity Index (Annual mean Wet days >1mm precipitation) (float)
snow Snow mm 0.01 Snow Water Equivalent monthly
snowApril SWE April 1st mm 0.01 Snow Water Equivalent on April 1st for given year
snowFeb SWE February 1st mm 0.01 Snow Water Equivalent on February 1st for given year
snowJan SWE January 1st mm 0.01 Snow Water Equivalent on January 1st for given year
snowJune SWE June 1st mm 0.01 Snow Water Equivalent on June 1st for given year
snowMar SWE March 1st mm 0.01 Snow Water Equivalent on March 1st for given year
snowMay SWE May 1st mm 0.01 Snow Water Equivalent on May 1st for given year
tnn Monthly Minimum of Daily Temperature °C 0.01 Monthly minimum of daily minimum temperature (float)
txx Monthly Maximum of Daily Temperature °C 0.01 Monthly maximum of daily maximum temperature (float)

There are a lot of different variables and it is important to note that there is a Scale Factor we need to pay attention to.

I mentioned above that daily data are aggregated into monthly and annual

( temporal<- GET("https://wyadapt.org/api/ClimateDataRepo/Timescale") %>% 
  content() %>% 
  unlist() )
[1] "30yearmonthlyrefdif" "monthlyrefdif"       "annual"             
[4] "monthly"             "30yearannual"        "30yearmonthly"      
[7] "30yearannualrefdif"  "annualrefdif"       

….and the available spatial resolutions

res<- GET("https://wyadapt.org/api/ClimateDataRepo/Resolution") %>% 
  content() %>% 
  spread_all() %>% 
  data.frame() %>% 
  select(-c(`document.id`, `..JSON`))
gcm variant resolution description
access-cm2 r5i1p1f1 d02_9km 9 kilometer cell size
canesm5 r1i1p2f1 d02_9km 9 kilometer cell size
cesm2 r11i1p1f1 d02_9km 9 kilometer cell size
cnrm-esm2-1 r1i1p1f2 d02_9km 9 kilometer cell size
ec-earth3 r1i1p1f1 d02_9km 9 kilometer cell size
ec-earth3-veg r1i1p1f1 d02_9km 9 kilometer cell size

OK, we have a pretty good idea of the available datasets, let’s learn how to interact with them given that there are several methods!

Since the climate data are COGs, we can query them spatially from the cloud and not have to worry about downloading files. To do this let’s first add some polygons to our environment….let’s go with the Counties of Wyoming from the ESRI Feature Service. Notice that we add an ID column that matches row.names() of the layer. We will use this later on in our processing.

# Get Wyoming counties
url<- parse_url("https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Boundaries_2022/FeatureServer/2/query")

url$query <- list(where = "STATE_FIPS='56'",
                  outFields = "*",
                  returnGeometry = "true",
                  f = "geojson")
request <- build_url(url)

county <- st_read(request) %>% 
  mutate(ID = as.integer(rownames(.)))
Reading layer `OGRGeoJSON' from data source 
  `https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Boundaries_2022/FeatureServer/2/query?where=STATE_FIPS%3D%2756%27&outFields=%2A&returnGeometry=true&f=geojson' 
  using driver `GeoJSON'
Simple feature collection with 23 features and 14 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -111.0546 ymin: 40.99478 xmax: -104.0523 ymax: 45.00582
Geodetic CRS:  WGS 84
ggplot(county) + 
  geom_sf()

Let’s look at the ensemble of Annual Mean Temperature for Albany County for the entire timeseries. First we need to obtain the URLs for the specified variable and timescale (notice the query parameters within the URL). You will notice we are adding a prefix to the URL “/vsicurl/” in a new column “urlCOG”. This tells R that we are simply querying the file on the S3, please do not download.

cogs<- GET("https://wyadapt.org/api/ClimateDataRepo/ClimateUrl?timescale=annual&gcm=ensemble&variant=p50&variable=t2") %>% 
  content() %>% 
  spread_all() %>% 
  data.frame() %>% 
  select(-c(document.id, `..JSON`)) 
url filename resolution biascorrected timescale gcm rcp variable variant years sampMonth scale units alias statistics
https://pathfinder.arcc.uwyo.edu/wyadapt/d02_9km/bias_corrected/cogs/annual/t2/t2_ensemble.p50_ssp370_BC_d02_1981-2099_annual.tif t2_ensemble.p50_ssp370_BC_d02_1981-2099_annual.tif d02_9km TRUE annual ensemble ssp370 t2 p50 1981-2099 13 0.01 °C Average Temperature [-620, 3186, 1274.2671, 604.6369]

Now that we know which data we are querying (annual estimates of mean temperature from the median ensemble values) and where the data are (urlCOG), we are all set to conduct some raster analyses using cloud resources! First we stack the rasters, then we extract the mean of all cells within our polygon boundary (filtering for Albany County) for each layer (in this case annual mean temperature). Be sure to use the URL with vsicurl prefix AND to adjust the raster values by the scale factor!

# Stack the rasters
rs<- rast(cogs$url, vsi = TRUE)

# extract mean values for only Albany County
e<- terra::extract(rs, county %>% 
                     filter(FIPS %in% c("56001")), fun = mean)

You will notice our output has 1 row, representing Albany County, and 119 other columns representing the annual mean temperatures for 1981-2099. Let’s build on this logic to make the data more user friendly and meaningful. You will notice we are using the ID column we made when reading in this dataset.

# extract mean values for only Albany County, parse metadata from the name and append County info
e<- terra::extract(rs, county %>% 
                     filter(FIPS %in% c("56001")), fun = mean) %>% 
  pivot_longer(2:ncol(.), names_to = "raster", values_to = "value") %>% 
  separate_wider_delim(raster, delim = "_", names = c("gcm", "variable", "waterYear")) %>% 
  separate_wider_delim(gcm, delim = ".", names = c("gcm", "variant")) %>% 
  mutate(sampDate = as.Date(paste(waterYear, "01", "01", sep = "-")),
         waterYear = as.integer(waterYear)) %>% 
  left_join(cogs %>% 
              select(gcm, variant, variable, scale), by = c("gcm", "variant", "variable")) %>% 
  left_join(county %>% 
              select(ID, NAME:FIPS), by = "ID") %>% 
  mutate(value = value * scale) # need to rescale the results

Excellent! Let’s visualize the results:

ggplotly(
e %>% 
  ggplot(aes(x = sampDate, y = value, group = NAME, color = NAME)) +
  geom_line() +
  ylab(paste(cogs$alias[1], cogs$units[1])) +
  theme_classic()
)

Hopefully you can imagine that the above workflow can be expanded to work for all of the counties simultaneously!

# extract mean values for all, parse metadata from the name and append County info
eAll<- terra::extract(rs, county, fun = mean) %>% 
  pivot_longer(2:ncol(.), names_to = "raster", values_to = "value") %>% 
  separate_wider_delim(raster, delim = "_", names = c("gcm", "variable", "waterYear")) %>% 
  separate_wider_delim(gcm, delim = ".", names = c("gcm", "variant")) %>% 
  mutate(sampDate = as.Date(paste(waterYear, "01", "01", sep = "-")),
         waterYear = as.integer(waterYear)) %>% 
  left_join(cogs %>% 
              select(gcm, variant, variable, scale), by = c("gcm", "variant", "variable")) %>% 
  left_join(county %>% 
              select(ID, NAME:FIPS), by = "ID") %>% 
  mutate(value = value * scale) # need to rescale the results

ggplotly(
eAll %>% 
  ggplot(aes(x = sampDate, y = value, group = NAME, color = NAME)) +
  geom_line() +
  ylab(paste(cogs$alias[1], cogs$units[1])) +
  theme_classic()
)

OK, this is great, we can look at the median temperatures through time for each county. But we know there are more than one GCM, hence the ensemble of the models. Perhaps it might be a good idea to incorporate model variance….luckily for us, these data are available in which the ensemble dataset has the 10th, 25th, 50th, 75th and 90th percentiles calculated. Let’s take advantage of this, using our Albany County example.

# need to expand our queries to include all ensembles
cogs<- GET("https://wyadapt.org/api/ClimateDataRepo/ClimateUrl?timescale=annual&gcm=ensemble&variable=t2") %>% 
    content() %>% 
    spread_all() %>% 
    data.frame() %>% 
  select(-c(document.id, `..JSON`)) %>% 
  mutate(urlCOG = paste0("/vsicurl/", url))

Again, we have a list, albeit larger, of urls. We can stack this larger dataset and extract values. Notice that the code for a single GCM (ensemble-p50) is exactly the same as for when we query all of the percentiles!

# stack the rasters
rs<- rast(cogs$urlCOG)

e<- terra::extract(rs, county %>% 
                     filter(FIPS %in% c("56001")), fun = mean) %>% 
  pivot_longer(2:ncol(.), names_to = "raster", values_to = "value") %>% 
  separate_wider_delim(raster, delim = "_", names = c("gcm", "variable", "waterYear")) %>% 
  separate_wider_delim(gcm, delim = ".", names = c("gcm", "variant")) %>% 
  mutate(sampDate = as.Date(paste(waterYear, "01", "01", sep = "-")),
         waterYear = as.integer(waterYear)) %>% 
  left_join(cogs %>% 
              select(gcm, variant, variable, scale), by = c("gcm", "variant", "variable")) %>% 
  left_join(county %>% 
              select(ID, NAME:FIPS), by = "ID") %>% 
  mutate(value = value * scale) # need to rescale the results

Now we have to adjust the shape of the data slightly so we can visualize the temperature range across GCMs.

# flatten the data
dat<- e %>% 
  pivot_wider(id_cols = c(NAME, STATE_ABBR, variable, sampDate), names_from = variant, values_from = value)

ggplotly(
  dat %>% 
  ggplot(aes(sampDate)) +
  geom_ribbon(aes(ymin = p10, ymax = p90), fill = "grey90") +
  geom_ribbon(aes(ymin = p25, ymax = p75), fill = "grey75") +
  geom_line(aes( y = p50, group = NAME, color = NAME)) +
  ylab(paste(cogs$alias[1], cogs$units[1])) +
  theme_classic()
)

We can map one of the layers if we like.

mapview::mapview(rs[[44]] * cogs$scale[3], layer.name = "2024 Mean Temp (C)")

Download WyAdapt Future Climate Rasters

If you really want, you can easily download these data to your personal machine. The below example should get you started.

# output location
odir<- "G:/Test/COG"

cogs<- GET("https://wyadapt.org/api/ClimateDataRepo/ClimateUrl?timescale=annual&gcm=ensemble-p50&variable=t2") %>% 
    content() %>% 
    spread_all() %>% 
    data.frame() %>% 
    select(-c(document.id, `..JSON`)) %>% 
    mutate(urlCOG = paste0("/vsicurl/", url)) %>% 
  slice_head(n = 3) # shorten the amount of data as this is an examples

# Download the files from Repository
for(i in 1:nrow(cogs)){
        try(utils::download.file(url = cogs$url[i], destfile = paste0(odir, "/", cogs$filename[i]), quiet = TRUE, mode = "wb"), silent = TRUE)
}