The package GNRS
is designed to interact with the Geographic Name Resolution Service API (GNRS; https://gnrs.biendata.org/) of the Botanical Information and Ecology Network (BIEN; https://bien.nceas.ucsb.edu). The GNRS is a tool for resolving, standardizing, and indexing political division names. The GNRS resolves political division names against standard world political units in the Geonames (https://www.geonames.org/) and Global Administrative Areas (GADM; https://gadm.org/) databases. Names are resolved to three levels: country, state/province and county/parish. The GNRS uses both exact and fuzzy matching to match standard and alternative political division names in a variety of languages, as well as abbreviations and codes such as ISO and FIPS codes. Results returned by the GNRS include the original names submitted, the standard names and codes of the political units matched, unique identifiers from the Geonames and GADM databases, and additional fields describe how each name was resolved. An overall match score from 0-1 describes how closely the submitted names matches standard names, where 1 is a perfect match.
The current, stable version of the GNRS package is available on CRAN, while the development version can be installed from Github using devtools.
# To install the stable version from CRAN
install.packages("GNRS")
# To install the development version from Github
library(devtools)
install_github("EnquistLab/RGNRS")
In some cases, we may only want to standardize a single name. Say, we’d like to check what the standardized name for the United States of America is. Or perhaps we’d like to get the standardized name for the Canadian province of Quebec. We can use the function GNRS_super_simple
for this.
library(GNRS)
# Standardizing a single country
<- GNRS_super_simple(country = "United States of America")
USA_standardized
# Take a look at the columns returned
colnames(USA_standardized)
## [1] "poldiv_full" "country_verbatim"
## [3] "state_province_verbatim" "state_province_verbatim_alt"
## [5] "county_parish_verbatim" "county_parish_verbatim_alt"
## [7] "country" "state_province"
## [9] "county_parish" "country_id"
## [11] "state_province_id" "county_parish_id"
## [13] "country_iso" "state_province_iso"
## [15] "county_parish_iso" "geonameid"
## [17] "gid_0" "gid_1"
## [19] "gid_2" "match_method_country"
## [21] "match_method_state_province" "match_method_county_parish"
## [23] "match_score_country" "match_score_state_province"
## [25] "match_score_county_parish" "overall_score"
## [27] "poldiv_submitted" "poldiv_matched"
## [29] "match_status" "user_id"
# The most useful columns in this case are country and overall_score
c("country","overall_score","match_method_country")] USA_standardized[
## country overall_score match_method_country
## 1 United States 1.00 exact alternate name
In this case, the standardized name is just “United States”. We have high confidence in this name because it matched perfectly (overall_score = 1.00) to an alternate name for “United States of America”. Note that even though we didn’t supply any state/province or country/parish names, there are still fields returned for these. This is because, when resolving names, the output is always identical, but may be empty.
# Standardizing a single state
#First, we'll load the test data that are included with this package, gnrs_testfile
<- gnrs_testfile
gnrs_testfile
head(gnrs_testfile, n = 10)
## user_id country state_province
## 1 1 Russia Lipetsk
## 2 2 Mexico Sonora, Estado de
## 3 3 Guatemala Izabal
## 4 4 USA Arizona
## 5 5 U.S.A Arizona
## 6 6 USA Ilinois
## 7 7 Mexico Quintana Roo
## 8 8 Mexico Quintana Roo
## 9 9 Ukraine Kharkiv
## 10 10 Canada Province of Nova Scotia
## county_parish
## 1 Dobrovskiy rayon
## 2 Hua^sA(C)pac
## 3
## 4 Pima County
## 5 Pima
## 6
## 7 La^sA°zaro Ca^sA°rdenas
## 8 Municipio de La^sA°zaro Ca^sA°rdenas
## 9 Novovodolaz'kyi
## 10
As you can see, the sample data include spelling variants (USA vs U.S.A.) and non-standard characters that may cause problems. The GNRS will standardize these spelling variants and non-standard characters.
<- GNRS(gnrs_testfile)
gnrs_results
#The standardized names are found in these columns:
head(gnrs_results[c("country","state_province","county_parish")], n = 10)
## country state_province county_parish
## 1 Russia Lipetskaya Oblast' Dobrovskiy Rayon
## 2 Mexico Sonora
## 3 Guatemala Izabal
## 4 United States Arizona Pima
## 5 United States Arizona Pima
## 6 United States Illinois
## 7 Mexico Quintana Roo
## 8 Mexico Quintana Roo
## 9 Ukraine Kharkivs'ka Oblast' Novovodolaz'kyi
## 10 Canada Nova Scotia
The GNRS function expects 4 columns as input, but all are optional. If you ever forget, you can use the function GNRS_template as a quick look-up, or as a template to populate
head(GNRS_template())
## user_id country state_province county_parish
## 1 NA NA NA NA