R package: Convert country names and country codes. Assigns region descriptors.
countrycodestandardizes country names, converts them into ~40 different coding schemes, and assigns region descriptors. Scroll down for more details or visit the countrycode CRAN page
If you use
countrycodein your research, we would be very grateful if you could cite our paper:
Arel-Bundock, Vincent, Nils Enevoldsen, and CJ Yetman, (2018). countrycode: An R package to convert country names and country codes. Journal of Open Source Software, 3(28), 848, https://doi.org/10.21105/joss.00848
countryname: Convert country names from any language
Different data sources use different coding schemes to represent countries (e.g. CoW or ISO). This poses two main problems: (1) some of these coding schemes are less than intuitive, and (2) merging these data requires converting from one coding scheme to another, or from long country names to a coding scheme.
countrycodefunction can convert to and from 40+ different country coding schemes, and to 600+ variants of country names in different languages and formats. It uses regular expressions to convert long country names (e.g. Sri Lanka) into any of those coding schemes or country names. It can create new variables with various regional groupings.
From the R console, type:
To install the latest development version, you can use the
To get an up-to-date list of supported country codes, install the package and type
?codelist. These include:
Convert single country codes:
# ISO to Correlates of War countrycode('DZA', origin = 'iso3c', destination = 'cown')  615
English to ISO
countrycode('Albania', origin = 'country.name', destination = 'iso3c')  "ALB"
German to Arabic
countrycode(c('Algerien', 'Albanien'), origin = 'country.name.de', destination = 'un.name.ar')  "الجزائر" "ألبانيا"
> cowcodes countrycode(cowcodes, origin = "cowc", destination = "iso3c")  "DZA" "ALB" "GBR" "CAN" "USA"
Generate vectors and 2 data frames without a common id (i.e. can't merge the 2 df):
> isocodes var1 var2 df1 df2
Inspect the data:> df1 cowcodes var1 1 ALG 71 2 ALB 427 3 UKG 180 4 CAN 21 5 USA 383 > df2 isocodes var2 1 12 238 2 8 329 3 826 463 4 124 437 5 840 26
Create a common variable with the iso3c code in each data frame, merge the data, and create a country identifier:> df1$iso3c df2$iso3c df3 df3$country df3 iso3c cowcodes var1 isocodes var2 country 1 ALB ALB 113 8 245 ALBANIA 2 CAN CAN 373 124 197 CANADA 3 DZA ALG 254 12 295 ALGERIA 4 GBR UKG 351 826 57 UNITED KINGDOM 5 USA USA 241 840 85 UNITED STATES
Flagscountrycodecan convert country names and codes to unicode flags. For example, we can use thegtpackage to draw a table with countries and their corresponding flags:library(gt) library(countrycode)
Which produces this file:
Note that embedding unicode characters inRgraphics is possible, but it can be tricky. If your output looks like\U0001f1e6\U0001f1f6, then you could try feeding it to this function:utf8::utf8_print(). That should cover a lot of cases without dipping into the complexity of graphics devices. As a rule of thumb, if your output looks like□□□□(boxes), things tend to get more complicated. In that case, you'll have to think about different output devices, file viewers, and/or file formats (e.g., 'SVG' or 'HTML').
Since inserting unicode symbols intoRgraphics is not acountrycode-specific issue, we won't be able to offer any more support than this. Good luck!
Country names in 600+ different languages and formats
The Unicode organisation hosts the CLDR project, which publishes many variants of country names. For each language/culture locale, there is a full set of names, plus possible 'alt-short' or 'alt-variant' variations of specific country names.> countrycode('United States of America', origin = 'country.name', destination = 'cldr.name.en') >  "United States" > countrycode('United States of America', origin = 'country.name', destination = 'cldr.short.en') >  "US"
To see a full list of country name variants available, inspect this data.frame:> head(countrycode::cldr_examples) Code Example 1 cldr.name.af Franse Suidelike Gebiede 2 cldr.name.agq TF 3 cldr.name.ak TF 4 cldr.name.am የፈረንሳይ ደቡባዊ ግዛቶች 5 cldr.name.ar الأقاليم الجنوبية الفرنسية 6 cldr.name.ar_ly الأقاليم الجنوبية الفرنسية
custom_dict: American states
Since version 0.19, countrycode accepts user-supplied dictionaries via thecustom_dictargument. These dictionaries will override the built-in country code dictionary. For example, the countrycode Github repository includes a dictionary of regexes and abbreviations to work with US state names.
Load the library and download the custom dictionary data.frame:library(countrycode) url = "https://raw.githubusercontent.com/vincentarelbundock/countrycode/master/data/custom_dictionaries/us_states.csv" state_dict = read.csv(url, stringsAsFactors=FALSE)
Convert:countrycode('State of Alabama', origin = 'state', destination = 'abbreviation', custom_dict = state_dict, origin_regex = TRUE)  "AL" countrycode(c('MI', 'OH', 'Bad'), 'abbreviation', 'state', custom_dict=state_dict)  "Michigan" "Ohio" NA
Note that if you use a custom dictionary with country codes, you could easily merge it into thecountrycode::codelistorcountrycode::codelist_panelto gain access to all other codes.
custom_dict: theISOcodespackagecountrycodealready supports ISO4217 (currencies) and ISO3166 (country codes). TheISOcodespackage supplies other codes, including ISO15924 (language writing systems), ISO639 (languages), and ISO8859 (computer character encodings). Users can convert those codes usingcountrycode'scustom_dictargument.
For example, theISOcodes::ISO_639_2dataframe includes 4 columns:Alpha_3_B,Alpha_3_T,Alpha_2, andName. We can convert language names like this:> countrycode('abk', 'Alpha_3_B', 'Name', custom_dict = ISOcodes::ISO_639_2)  "Abkhazian"
TheISOcodes::ISO_8859dataset is a 3-dimensional array where the second dimension represents the character encoding. We take the subset ofISO_8859_1codes and convert the dict to a dataframe for use incountrycode'scustom_dictargument:library(ISOcodes) dict
The resulting dataframe has 3 columns:Code,Name,Character. We convert the code0x00fdlike this:> countrycode("0x00fd", "Code", "Name", custom_dict = dict)  "LATIN SMALL LETTER Y WITH ACUTE" > countrycode("0x00fd", "Code", "Character", custom_dict = dict)  "ý"
nomatch: Fill in missing codes manually
Use thenomatchargument to specify the value thatcountrycodeinserts where no match was found:> countrycode(c('DZA', 'USA', '???'), origin = 'iso3c', destination = 'country.name', nomatch = 'BAD CODE') >  "Algeria" "United States" "BAD CODE" > countrycode(c('Canada', 'Fake country'), origin = 'country.name', destination = 'iso3c', nomatch = 'BAD') >  "CAN" "BAD"
custom_match: Override default values
Since version 0.19,countrycodeaccepts a user supplied named vector of custom matches via thecustom_matchargument. Any match pairs in thecustom_matchvector will supercede the default results of the command. This allows the user to convert to an available country code and make minor post-edits all at once. The names of the named vector are used as the origin code, and the values of the named vector are used as the destination code.
For example, Eurostat uses a modified version of iso2c, with Greece (EL instead of GR) and the UK (UK instead of GB) being the only differences. Getting a proper result converting to Eurostat is easy to achieve using theiso2cdestination and the newcustom_matchargument. (Note: since version 0.19,countrycodealso includes aeurostatorigin/destination code, so while this is a good example, doing so for Eurostat is not necessary)
example: convert from country name to Eurostat coder library(countrycode) country_names
example: convert from Eurostat code to country namer library(eurostat) library(countrycode) df
warn: Silence warnings
Usewarn = TRUEto print out a list of source elements for which no match was found. When the source vector are long country names that need to be matched using regular expressions, there is always a risk that multiple regex will match a given string. When this is the case,countrycodeassigns a value arbitrarily, but thewarnargument allows the user to print a list of all strings that were matched many times.
countryname: Convert country names from any language
The functioncountrynametries to convert country names from any language. For example:> library(countrycode) > x countryname(x)  "Zimbabwe" "Afghanistan" "Barbados" "Sweden" "UK" "South Georgia & South Sandwich Islands" > countryname(x, 'iso3c')  "ZWE" "AFG" "BRB" "SWE" "GBR" "SGS"
Adding a new code
New country codes are created by two files:
- dictionary/get_*.Ris anRscript which can scrape the code from an original online source (e.g.,get_world_bank.R). This scripts only side effect is that it writes a CSV file to thedictionaryfolder.
- dictionary/data_*.csvis a CSV file with 1 column calledcountry, which includes the English country name, and 1 or more columns named after the codes you want to add (e.g.,iso3c,un.name.en,continent).
After creating those two files, you should:
If you need help with any of these steps, or if you just want to submit a CSV file, feel free to open an issue on Github or write an email to Vincent. I'll be happy to help you out!
countrycoderepository holds several custom dictionaries: https://github.com/vincentarelbundock/countrycode/tree/master/data/custom_dictionaries
To add your own custom dictionary, please make sure that:
Rcommands to produce such a file are shown below.
Using base write.csv:
write.csv(custom_dict, 'custom_dict.csv', quote = TRUE, na = '', row.names = FALSE, qmethod = 'double', fileEncoding = 'UTF-8')
readr::write_csv(custom_dict, 'custom_dict.csv', na = '')