R dataset
Need help with stationaRy?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.
rich-iannone

Description

Get hourly meteorological data from one of thousands of global stations

216 Stars 34 Forks Other 918 Commits 2 Opened issues

Services available

Need anything else?

stationaRy

CRAN
status R build
status Codecov test
coverage

Overview

Get meteorological data from met stations located all over the world. That’s what you can do with this R package. There are LOTS of stations too (29,729 available in this dataset) and many have data that go pretty far back in time. The data comes from the Integrated Surface Dataset (ISD), which is maintained by the National Oceanic and Atmospheric Administration (NOAA).

Retrieving Met Data with a
station_id

Let’s get some met data from La Guardia Airport in New York City (the station ID value is

"725030-14732"
). This station has a pretty long history (starting operations in 1973) but we’ll just obtain data from the years of 2017 and 2018.
lga_met_data 
lga_met_data
#> # A tibble: 17,520 x 10
#>    id    time                 temp    wd    ws atmos_pres dew_point    rh
#>                                 
#>  1 7250… 2017-01-01 00:00:00   7.2   230   5.7      1012.      -4.4  43.5
#>  2 7250… 2017-01-01 01:00:00   7.8   230   4.6      1012.      -3.9  43.4
#>  3 7250… 2017-01-01 02:00:00   7.2   230   3.6      1012.      -2.2  51.3
#>  4 7250… 2017-01-01 03:00:00   7.8   240   5.7      1013.      -3.3  45.4
#>  5 7250… 2017-01-01 04:00:00   7.8   240   4.6      1013.      -3.9  43.4
#>  6 7250… 2017-01-01 05:00:00   8.3   240   4.6      1014.      -4.4  40.4
#>  7 7250… 2017-01-01 06:00:00   8.3   250   5.1      1015.      -3.9  41.9
#>  8 7250… 2017-01-01 07:00:00   8.3   260   5.7      1016.      -3.3  43.8
#>  9 7250… 2017-01-01 08:00:00   8.3   240   5.1      1017.      -2.8  45.5
#> 10 7250… 2017-01-01 09:00:00   8.3   260   6.2      1019.      -2.8  45.5
#> # … with 17,510 more rows, and 2 more variables: ceil_hgt ,
#> #   visibility 

Discovering Met Stations

At a minimum we need a station’s identifier to obtain its met data. We can start the process of getting an identifier by accessing the entire catalog of station metadata with the

get_station_metadata()
function. The output tibble has station
id
values in the first column. Let’s get a subset of stations from that: those stations that are located in Norway.
stations_norway %
  dplyr::filter(country == "NO")

stations_norway #> # A tibble: 405 x 16 #> id usaf wban name country state icao lat lon elev begin_date #>
#> 1 0100… 0100… 99999 BOGU… NO ENRS NA NA NA 2001-09-27 #> 2 0100… 0100… 99999 JAN … NO ENJA 70.9 -8.67 9 1931-01-01 #> 3 0100… 0100… 99999 ROST NO NA NA NA 1986-11-20 #> 4 0100… 0100… 99999 SORS… NO ENSO 59.8 5.34 48.8 1986-11-20 #> 5 0100… 0100… 99999 BRIN… NO 61.4 5.87 327 1987-01-17 #> 6 0100… 0100… 99999 RORV… NO 64.8 11.2 14 1987-01-16 #> 7 0100… 0100… 99999 FRIGG NO ENFR 60.0 2.25 48 1988-03-20 #> 8 0100… 0100… 99999 VERL… NO 80.0 16.2 8 1986-11-09 #> 9 0100… 0100… 99999 HORN… NO 77 15.5 12 1985-06-01 #> 10 0100… 0100… 99999 NY-A… NO ENAS 78.9 11.9 8 1973-01-01 #> # … with 395 more rows, and 5 more variables: end_date , #> # begin_year , end_year , tz_name , years

This table can be even more greatly reduced to isolate the stations of interest. For example, we could elect to get only high-altitude stations (above 1000 meters) in Norway.

norway_high_elev % 
  dplyr::filter(elev > 1000)

norway_high_elev #> # A tibble: 12 x 16 #> id usaf wban name country state icao lat lon elev begin_date #>
#> 1 0122… 0122… 99999 MANN… NO 62.4 7.77 1294 2010-03-15 #> 2 0123… 0123… 99999 HJER… NO 62.2 9.55 1012 2010-09-07 #> 3 0134… 0134… 99999 MIDT… NO 60.6 7.27 1162 2011-11-25 #> 4 0135… 0135… 99999 FINS… NO 60.6 7.53 1208 2003-03-30 #> 5 0135… 0135… 99999 FINS… NO 60.6 7.5 1224 1973-01-02 #> 6 0135… 0135… 99999 SAND… NO 60.2 7.48 1250 2004-01-07 #> 7 0136… 0136… 99999 JUVV… NO 61.7 8.37 1894 2009-06-26 #> 8 0136… 0136… 99999 SOGN… NO 61.6 8 1413 1979-03-01 #> 9 0137… 0137… 99999 KVIT… NO 61.5 10.1 1028 1973-01-01 #> 10 0143… 0143… 99999 MIDT… NO 59.8 6.98 1081 1973-01-01 #> 11 0144… 0144… 99999 BLAS… NO 59.3 6.87 1105. 1973-01-01 #> 12 0146… 0146… 99999 GAUS… NO 59.8 8.65 1804. 2014-06-05 #> # … with 5 more variables: end_date , begin_year , end_year , #> # tz_name , years

The station IDs from the tibble can be transformed into a vector of station IDs with

dplyr::pull()
.
norway_high_elev %>% dplyr::pull(id)
#>  [1] "012200-99999" "012390-99999" "013460-99999" "013500-99999" "013510-99999"
#>  [6] "013520-99999" "013620-99999" "013660-99999" "013750-99999" "014330-99999"
#> [11] "014400-99999" "014611-99999"

Suppose you’d like to collect several years of met data from a particular station and fetch only the observations that meet some set of conditions. Here’s an example of obtaining temperatures above 15 degrees Celsius from the high-altitude

"JUVVASSHOE"
station in Norway and adding a column with temperatures in degrees Fahrenheit.
station_data %
  dplyr::filter(name == "JUVVASSHOE") %>%
  dplyr::pull(id) %>%
  get_met_data(years = 2011:2019)

high_temp_data % dplyr::select(id, time, wd, ws, temp) %>% dplyr::filter(temp > 16) %>% dplyr::mutate(temp_f = ((temp * (9/5)) + 32) %>% round(1)) %>% dplyr::arrange(dplyr::desc(temp_f))

high_temp_data
#> # A tibble: 50 x 6
#>    id           time                   wd    ws  temp temp_f
#>                              
#>  1 013620-99999 2019-07-26 15:00:00   160     5  18.5   65.3
#>  2 013620-99999 2019-07-26 17:00:00   210     3  18.4   65.1
#>  3 013620-99999 2019-07-26 18:00:00   180     2  18.3   64.9
#>  4 013620-99999 2019-07-26 16:00:00   180     4  18.2   64.8
#>  5 013620-99999 2014-07-23 16:00:00   270     2  17.6   63.7
#>  6 013620-99999 2019-07-26 14:00:00   150     4  17.5   63.5
#>  7 013620-99999 2014-07-23 17:00:00   300     4  17.3   63.1
#>  8 013620-99999 2019-07-28 16:00:00   130     6  17.3   63.1
#>  9 013620-99999 2014-07-23 18:00:00   280     3  17.2   63
#> 10 013620-99999 2018-07-04 15:00:00 340 2 17.2 63
#> # … with 40 more rows

Additional Data Fields

There can be a substantial amount of additional met data beyond wind speed, ambient temperature, etc. However, these additional fields can vary greatly across stations. The nomenclature for the additional categories of data uses ‘two-letter + digit’ identifiers (e.g.,

AA1
,
GA1
, etc.). Within each category are numerous fields, where the variables are coded as
[identifer]_[index]
). More information about these additional data fields can be found in this PDF document.

To find out which categories of additional data fields are available for a station, we can use the

station_coverage()
function. You’ll get a tibble with the available additional categories and their counts over the specified period.
additional_data_fields %
  dplyr::filter(name == "JUVVASSHOE") %>%
  dplyr::pull(id) %>%
  station_coverage(years = 2015)
additional_data_fields
#> # A tibble: 87 x 3
#>    id           category count
#>                
#>  1 013620-99999 AA1          0
#>  2 013620-99999 AB1          0
#>  3 013620-99999 AC1          0
#>  4 013620-99999 AD1          0
#>  5 013620-99999 AE1          0
#>  6 013620-99999 AG1          0
#>  7 013620-99999 AH1          0
#>  8 013620-99999 AI1          0
#>  9 013620-99999 AJ1        194
#> 10 013620-99999 AK1          0
#> # … with 77 more rows

We can use purrr’s

map_df()
function to get additional data field coverage for a subset of stations (those that are near sea level and have data in 2019). With the
station_coverage()
function set to output tibbles in
wide
mode (one row per station, field categories as columns, and counts of observations as values), we can ascertain which stations have the particular fields we need.
stns %
  dplyr::filter(country == "NO", elev <= 5 & end_year == 2019)

coverage_tbl % dplyr::pull(id) %>% .[[x]] %>% station_coverage( years = 2019, wide_tbl = TRUE ) } )

coverage_tbl
#> # A tibble: 1 x 88
#>   id       AA1   AB1   AC1   AD1   AE1   AG1   AH1   AI1   AJ1   AK1   AL1   AM1
#> *              
#> 1 01167…   491     0     0     0     0     0     0     0   167     0     0     0
#> # … with 75 more variables: AN1 , AO1 , AP1 , AU1 ,
#> #   AW1 , AX1 , AY1 , AZ1 , CB1 , CF1 ,
#> #   CG1 , CH1 , CI1 , CN1 , CN2 , CN3 ,
#> #   CN4 , CR1 , CT1 , CU1 , CV1 , CW1 ,
#> #   CX1 , CO1 , CO2 , ED1 , GA1 , GD1 ,
#> #   GF1 , GG1 , GH1 , GJ1 , GK1 , GL1 ,
#> #   GM1 , GN1 , GO1 , GP1 , GQ1 , GR1 ,
#> #   HL1 , IA1 , IA2 , IB1 , IB2 , IC1 ,
#> #   KA1 , KB1 , KC1 , KD1 , KE1 , KF1 ,
#> #   KG1 , MA1 , MD1 , ME1 , MF1 , MG1 ,
#> #   MH1 , MK1 , MV1 , MW1 , OA1 , OB1 ,
#> #   OC1 , OE1 , RH1 , SA1 , ST1 , UA1 ,
#> #   UG1 , UG2 , WA1 , WD1 , WG1 

For the

"KAWAIHAE"
station in Hawaii, some interesting data fields are available. In particular, its
SA1
category provides sea surface temperature data, where the
sa1_1
and
sa1_2
variables represent the sea surface temperature and its quality code.

Combining the use of

get_met_data()
with functions from dplyr, we can create a table of the mean ambient and sea-surface temperatures by month. The additional data is included in the met data table by using the
add_fields
argument and specifying the
"SA1"
category (multiple categories can be included).
kawaihae_sst %
  dplyr::mutate(
    year = lubridate::year(time),
    month = lubridate::month(time)
  ) %>%
  dplyr::filter(sa1_2 == 1) %>%
  dplyr::group_by(year, month) %>%
  dplyr::summarize(
    avg_temp = mean(temp, na.rm = TRUE),
    avg_sst = mean(sa1_1, na.rm = TRUE)
  )
kawaihae_sst
#> # A tibble: 6 x 4
#> # Groups:   year [2]
#>    year month avg_temp avg_sst
#>           
#> 1  2017    12     24.0    25.7
#> 2  2018     1     23.8    25.2
#> 3  2018     2     23.7    25.1
#> 4  2018     3     23.8    25.0
#> 5  2018     4     25.6    26.3
#> 6  2018    12     26.5    25.9

Installation

The stationaRy package can be easily installed from CRAN.

install.packages("stationaRy")

To install the development version of stationaRy, use the following:

install.packages("devtools")
remotes::install_github("rich-iannone/stationaRy")

If you encounter a bug, have usage questions, or want to share ideas to make this package better, feel free to file an issue.

License

MIT © Richard Iannone

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.