WyAdapt API Example

Author

Shannon E. Albeke

WyAdapt Future Climate API Example

For this example, we provide some basic example workflows to accessing, querying and even downloading climate data available within the WyAdapt cyberinfrastructure. We use REST API’s to query for existing climate summary data that are stored within the ARCC S3 Pathfinder storage server as Cloud Optimized GeoTiffs (COGs). Currently we have multiple GCM’s in which daily data have been aggregated to monthly and annual COGs.

Using R with WyAdapt Climate Data

Within the WyAdapt.org application, you can find a complete listing of API Endpoints here: https://wyadapt.org/swagger/index.html

For the below examples we will want to use Program R and have the following Packages installed. If you do not have any of the following packages installed, the following code should get you started.

# install.packages(c("sf", "terra", "lubridate", "httr", "jsonlite", "tidyverse", "tidyjson", "plotly", "kableExtra"))

Once you have the correct packages installed, let’s load them into our session so we can do some work.

Now that our environment is prepared, let’s first check out the available datasets. Let’s start with the GCMs that are available:

gcm<- GET("https://wyadapt.org/api/ClimateDataRepo/GcmRcpVariantCombinations") %>% 
  content() %>% 
  spread_all() %>% 
  data.frame() %>% 
  select(gcm, variant, description) %>% 
  distinct() %>% 
  arrange(gcm)

gcm	variant	description
access-cm2	r5i1p1f1	Commonwealth Scientific and Industrial Research Organization (CSIRO) and Bureau of Meteorology (BOM) Australia
canesm5	r1i1p2f1	Canadian Centre for Climate Modelling and Analysis
cesm2	r11i1p1f1	Community Earth System Model Contributors
cnrm-esm2-1	r1i1p1f2	Centre National de Recherches Météorologiques/ Centre Européen de Recherche et Formation Avancée en Calcul Scientifique
ec-earth3	r1i1p1f1	EC-Earth consortium
ec-earth3-veg	r1i1p1f1	EC-Earth consortium
ensemble	p75	75th percentile across all GCM
ensemble	p10	10th percentile across all GCM
ensemble	p50	50th percentile across all GCM
ensemble	p90	90th percentile across all GCM
ensemble	p25	25th percentile across all GCM
fgoals-g3	r1i1p1f1	Chinese Academy of Sciences
giss-e2-1-g	r1i1p1f2	Goddard Institute for Space Studies
miroc6	r1i1p1f1	Atmosphere and Ocean Research Institute (The University of Tokyo), National Institute for Environmental Studies, and Japan Agency for Marine-Earth Science and Technology
mpi-esm1-2-hr	r7i1p1f1	Max Planck Institute for Meteorology
mpi-esm1-2-hr	r3i1p1f1	Max Planck Institute for Meteorology
mpi-esm1-2-lr	r7i1p1f1	Max Planck Institute for Meteorology
noresm2-mm	r1i1p1f1	Norwegian Earth System Model
taiesm1	r1i1p1f1	Research Center for Environmental Changes, Academia Sinica (Taiwan)
ukesm1-0-ll	r2i1p1f2	Met Office, Hadley Centre (UK)

Admittedly, GCM’s can feel a little obtuse for individuals not in ‘the know’, but the ensembles have piqued my interest given they are an aggregation across all of the available GCM’s. What are the available variables?

vars<- GET("https://wyadapt.org/api/ClimateDataRepo/Variables") %>% 
  content() %>% 
  spread_all() %>% 
  data.frame() %>% 
  select(variable, alias, units, scaleFactor, description)

variable	alias	units	scaleFactor	description
prec	Precipitation	mm	0.01	Precipitation for monthly or annual
t2	Average Temperature	°C	0.01	Average Temperature for monthly or annual
t2max	Maximum Temperature	°C	0.01	Maximum Temperature for monthly or annual
t2min	Minimum Temperature	°C	0.01	Minimum Temperature for monthly or annual
cdd	Max Consecutive Days Precipitation <1mm/day	days	1.00	Max number of consecutive days with precipitation < 1mm/day (integer)
cwd	Max Consecutive Days Precipitation >1mm/day	days	1.00	Max number of consecutive days with precipitation >= 1 mm/day (integer)
fd	Frost Days	days	1.00	Frost days: Annual number of days with Min temp < 0C (integer)
id	Icing Days	days	1.00	Icing days: Annual number of days with Max temp < 0C (integer)
r20mm	Annual Count Precipitaiton >=20mm/day	days	1.00	Annual count of days when precipitation >= 20mm/day (integer)
r95p	Annual Total Precipitation When Daily >95th	mm	0.01	Annual total precipitation when daily precip exceeds the 95th percentile of wet day precip (float)
rx1day	Annual Max 1-day Precipitation	mm	0.01	Rx1day - Annual maximum one-day precipitation (float)
rx5day	Annual Max 5-day Precipitation	mm	0.01	Rx5day - Annual maximum five-day precipitation (float)
sdii	Simple Precipitation Intensity Index	mm	0.01	Simple Precipitation Intensity Index (Annual mean Wet days >1mm precipitation) (float)
snow	Snow	mm	0.01	Snow Water Equivalent monthly
snowApril	SWE April 1st	mm	0.01	Snow Water Equivalent on April 1st for given year
snowFeb	SWE February 1st	mm	0.01	Snow Water Equivalent on February 1st for given year
snowJan	SWE January 1st	mm	0.01	Snow Water Equivalent on January 1st for given year
snowJune	SWE June 1st	mm	0.01	Snow Water Equivalent on June 1st for given year
snowMar	SWE March 1st	mm	0.01	Snow Water Equivalent on March 1st for given year
snowMay	SWE May 1st	mm	0.01	Snow Water Equivalent on May 1st for given year
tnn	Monthly Minimum of Daily Temperature	°C	0.01	Monthly minimum of daily minimum temperature (float)
txx	Monthly Maximum of Daily Temperature	°C	0.01	Monthly maximum of daily maximum temperature (float)

There are a lot of different variables and it is important to note that there is a Scale Factor we need to pay attention to.

I mentioned above that daily data are aggregated into monthly and annual

( temporal<- GET("https://wyadapt.org/api/ClimateDataRepo/Timescale") %>% 
  content() %>% 
  unlist() )

[1] "30yearmonthlyrefdif" "monthlyrefdif"       "annual"             
[4] "monthly"             "30yearannual"        "30yearmonthly"      
[7] "30yearannualrefdif"  "annualrefdif"

….and the available spatial resolutions

res<- GET("https://wyadapt.org/api/ClimateDataRepo/Resolution") %>% 
  content() %>% 
  spread_all() %>% 
  data.frame() %>% 
  select(-c(`document.id`, `..JSON`))

gcm	variant	resolution	description
access-cm2	r5i1p1f1	d02_9km	9 kilometer cell size
canesm5	r1i1p2f1	d02_9km	9 kilometer cell size
cesm2	r11i1p1f1	d02_9km	9 kilometer cell size
cnrm-esm2-1	r1i1p1f2	d02_9km	9 kilometer cell size
ec-earth3	r1i1p1f1	d02_9km	9 kilometer cell size
ec-earth3-veg	r1i1p1f1	d02_9km	9 kilometer cell size

OK, we have a pretty good idea of the available datasets, let’s learn how to interact with them given that there are several methods!

Since the climate data are COGs, we can query them spatially from the cloud and not have to worry about downloading files. To do this let’s first add some polygons to our environment….let’s go with the Counties of Wyoming from the ESRI Feature Service. Notice that we add an ID column that matches row.names() of the layer. We will use this later on in our processing.

# Get Wyoming counties
url<- parse_url("https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Boundaries_2022/FeatureServer/2/query")

url$query <- list(where = "STATE_FIPS='56'",
                  outFields = "*",
                  returnGeometry = "true",
                  f = "geojson")
request <- build_url(url)

county <- st_read(request) %>% 
  mutate(ID = as.integer(rownames(.)))

Reading layer `OGRGeoJSON' from data source 
  `https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Boundaries_2022/FeatureServer/2/query?where=STATE_FIPS%3D%2756%27&outFields=%2A&returnGeometry=true&f=geojson' 
  using driver `GeoJSON'
Simple feature collection with 23 features and 14 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -111.0546 ymin: 40.99478 xmax: -104.0523 ymax: 45.00582
Geodetic CRS:  WGS 84

ggplot(county) + 
  geom_sf()

Let’s look at the ensemble of Annual Mean Temperature for Albany County for the entire timeseries. First we need to obtain the URLs for the specified variable and timescale (notice the query parameters within the URL). You will notice we are adding a prefix to the URL “/vsicurl/” in a new column “urlCOG”. This tells R that we are simply querying the file on the S3, please do not download.

cogs<- GET("https://wyadapt.org/api/ClimateDataRepo/ClimateUrl?timescale=annual&gcm=ensemble&variant=p50&variable=t2") %>% 
  content() %>% 
  spread_all() %>% 
  data.frame() %>% 
  select(-c(document.id, `..JSON`))

url	filename	resolution	biascorrected	timescale	gcm	rcp	variable	variant	years	sampMonth	scale	units	alias	statistics
https://pathfinder.arcc.uwyo.edu/wyadapt/d02_9km/bias_corrected/cogs/annual/t2/t2_ensemble.p50_ssp370_BC_d02_1981-2099_annual.tif	t2_ensemble.p50_ssp370_BC_d02_1981-2099_annual.tif	d02_9km	TRUE	annual	ensemble	ssp370	t2	p50	1981-2099	13	0.01	°C	Average Temperature	[-620, 3186, 1274.2671, 604.6369]

Now that we know which data we are querying (annual estimates of mean temperature from the median ensemble values) and where the data are (urlCOG), we are all set to conduct some raster analyses using cloud resources! First we stack the rasters, then we extract the mean of all cells within our polygon boundary (filtering for Albany County) for each layer (in this case annual mean temperature). Be sure to use the URL with vsicurl prefix AND to adjust the raster values by the scale factor!

# Stack the rasters
rs<- rast(cogs$url, vsi = TRUE)

# extract mean values for only Albany County
e<- terra::extract(rs, county %>% 
                     filter(FIPS %in% c("56001")), fun = mean)

You will notice our output has 1 row, representing Albany County, and 119 other columns representing the annual mean temperatures for 1981-2099. Let’s build on this logic to make the data more user friendly and meaningful. You will notice we are using the ID column we made when reading in this dataset.

# extract mean values for only Albany County, parse metadata from the name and append County info
e<- terra::extract(rs, county %>% 
                     filter(FIPS %in% c("56001")), fun = mean) %>% 
  pivot_longer(2:ncol(.), names_to = "raster", values_to = "value") %>% 
  separate_wider_delim(raster, delim = "_", names = c("gcm", "variable", "waterYear")) %>% 
  separate_wider_delim(gcm, delim = ".", names = c("gcm", "variant")) %>% 
  mutate(sampDate = as.Date(paste(waterYear, "01", "01", sep = "-")),
         waterYear = as.integer(waterYear)) %>% 
  left_join(cogs %>% 
              select(gcm, variant, variable, scale), by = c("gcm", "variant", "variable")) %>% 
  left_join(county %>% 
              select(ID, NAME:FIPS), by = "ID") %>% 
  mutate(value = value * scale) # need to rescale the results

Excellent! Let’s visualize the results:

ggplotly(
e %>% 
  ggplot(aes(x = sampDate, y = value, group = NAME, color = NAME)) +
  geom_line() +
  ylab(paste(cogs$alias[1], cogs$units[1])) +
  theme_classic()
)

Hopefully you can imagine that the above workflow can be expanded to work for all of the counties simultaneously!

# extract mean values for all, parse metadata from the name and append County info
eAll<- terra::extract(rs, county, fun = mean) %>% 
  pivot_longer(2:ncol(.), names_to = "raster", values_to = "value") %>% 
  separate_wider_delim(raster, delim = "_", names = c("gcm", "variable", "waterYear")) %>% 
  separate_wider_delim(gcm, delim = ".", names = c("gcm", "variant")) %>% 
  mutate(sampDate = as.Date(paste(waterYear, "01", "01", sep = "-")),
         waterYear = as.integer(waterYear)) %>% 
  left_join(cogs %>% 
              select(gcm, variant, variable, scale), by = c("gcm", "variant", "variable")) %>% 
  left_join(county %>% 
              select(ID, NAME:FIPS), by = "ID") %>% 
  mutate(value = value * scale) # need to rescale the results

ggplotly(
eAll %>% 
  ggplot(aes(x = sampDate, y = value, group = NAME, color = NAME)) +
  geom_line() +
  ylab(paste(cogs$alias[1], cogs$units[1])) +
  theme_classic()
)

OK, this is great, we can look at the median temperatures through time for each county. But we know there are more than one GCM, hence the ensemble of the models. Perhaps it might be a good idea to incorporate model variance….luckily for us, these data are available in which the ensemble dataset has the 10th, 25th, 50th, 75th and 90th percentiles calculated. Let’s take advantage of this, using our Albany County example.

# need to expand our queries to include all ensembles
cogs<- GET("https://wyadapt.org/api/ClimateDataRepo/ClimateUrl?timescale=annual&gcm=ensemble&variable=t2") %>% 
    content() %>% 
    spread_all() %>% 
    data.frame() %>% 
  select(-c(document.id, `..JSON`)) %>% 
  mutate(urlCOG = paste0("/vsicurl/", url))

Again, we have a list, albeit larger, of urls. We can stack this larger dataset and extract values. Notice that the code for a single GCM (ensemble-p50) is exactly the same as for when we query all of the percentiles!

# stack the rasters
rs<- rast(cogs$urlCOG)

e<- terra::extract(rs, county %>% 
                     filter(FIPS %in% c("56001")), fun = mean) %>% 
  pivot_longer(2:ncol(.), names_to = "raster", values_to = "value") %>% 
  separate_wider_delim(raster, delim = "_", names = c("gcm", "variable", "waterYear")) %>% 
  separate_wider_delim(gcm, delim = ".", names = c("gcm", "variant")) %>% 
  mutate(sampDate = as.Date(paste(waterYear, "01", "01", sep = "-")),
         waterYear = as.integer(waterYear)) %>% 
  left_join(cogs %>% 
              select(gcm, variant, variable, scale), by = c("gcm", "variant", "variable")) %>% 
  left_join(county %>% 
              select(ID, NAME:FIPS), by = "ID") %>% 
  mutate(value = value * scale) # need to rescale the results

Now we have to adjust the shape of the data slightly so we can visualize the temperature range across GCMs.

# flatten the data
dat<- e %>% 
  pivot_wider(id_cols = c(NAME, STATE_ABBR, variable, sampDate), names_from = variant, values_from = value)

ggplotly(
  dat %>% 
  ggplot(aes(sampDate)) +
  geom_ribbon(aes(ymin = p10, ymax = p90), fill = "grey90") +
  geom_ribbon(aes(ymin = p25, ymax = p75), fill = "grey75") +
  geom_line(aes( y = p50, group = NAME, color = NAME)) +
  ylab(paste(cogs$alias[1], cogs$units[1])) +
  theme_classic()
)

We can map one of the layers if we like.

mapview::mapview(rs[[44]] * cogs$scale[3], layer.name = "2024 Mean Temp (C)")

Download WyAdapt Future Climate Rasters

If you really want, you can easily download these data to your personal machine. The below example should get you started.

# output location
odir<- "G:/Test/COG"

cogs<- GET("https://wyadapt.org/api/ClimateDataRepo/ClimateUrl?timescale=annual&gcm=ensemble-p50&variable=t2") %>% 
    content() %>% 
    spread_all() %>% 
    data.frame() %>% 
    select(-c(document.id, `..JSON`)) %>% 
    mutate(urlCOG = paste0("/vsicurl/", url)) %>% 
  slice_head(n = 3) # shorten the amount of data as this is an examples

# Download the files from Repository
for(i in 1:nrow(cogs)){
        try(utils::download.file(url = cogs$url[i], destfile = paste0(odir, "/", cogs$filename[i]), quiet = TRUE, mode = "wb"), silent = TRUE)
}