Introduction

library(zctaCrosswalk)

This package is designed to help answer common analytical questions that arise when working with US ZIP Codes.

Note: the entity which maintains US ZIP Codes (the US Postal Service) does not release a map or crosswalk of that dataset. As a result, most analysts instead use ZIP Code Tabulation Areas (ZCTAs) which are maintained by the US Census Bureau. Census also provides Relationship Files that maps ZCTAs to other geographies.

This package provides the Census Bureau’s “2020 ZCTA to County Relationship File” as a tibble, combines it with useful publicly available metadata (such as State names) and provides convenience functions for querying it.

The main functions in this package are:

  • ?get_zctas_by_state
  • ?get_zctas_by_county
  • ?get_zcta_metadata

?get_zctas_by_state

?get_zctas_by_state takes a vector of states and returns the vector of ZCTAs in those states. Here are some examples:

# Not case sensitive when using state names
head(
  get_zctas_by_state("California")
)
#> Using column state_name
#> [1] "89010" "89019" "89060" "89061" "89439" "90001"

# USPS state abbreviations are also OK - but these *are* case sensitive
head(
  get_zctas_by_state("CA")
)
#> Using column state_usps
#> [1] "89010" "89019" "89060" "89061" "89439" "90001"

# Multiple states at the same time are also OK
head(
  get_zctas_by_state(c("CA", "NY"))
)
#> Using column state_usps
#> [1] "06390" "10001" "10002" "10003" "10004" "10005"

# Throws an error - you can't mix types in a single request
# get_zctas_by_state(c("California", "NY"))

A common problem when doing analytics with states is ambiguity around names. For example, most people write “Washington, DC”. But this dataset uses “District of Columbia”. The most common solution to this problem is to use FIPS Codes when doing analytics with states. And so ?get_zctas_by_state also supports FIPS codes.

Note that technically FIPS codes are characters and have a leading zero (e.g. California is “06”). But in practice people often use numbers (e.g. 6 for California) as well. As a result, ?get_zctas_by_state supports both:

ca1 = get_zctas_by_state("CA")
#> Using column state_usps
ca2 = get_zctas_by_state("06")
#> Using column state_fips
ca3 = get_zctas_by_state(6)
#> Using column state_fips_numeric
all(ca1 == ca2)
#> [1] TRUE
all(ca2 == ca3)
#> [1] TRUE

?get_zctas_by_county

?get_zctas_by_county works analogously to ?get_zctas_by_state. The primary difference is that it only accepts FIPS codes. This is because FIPS county codes are unique, but their names are not. (For example, 30 counties in this dataset are named “Washington County”!)

If you need to find the FIPS code for a particular county, I recommend simply googling it (e.g. “FIPS code for San Francisco County California”) or consulting this page.

Note that the FIPS codes can be either character or numeric.

# "06075" is San Francisco County, California
head(
  get_zctas_by_county("06075")
)
#> Using column county_fips
#> [1] "94102" "94103" "94104" "94105" "94107" "94108"

# 6075 (== as.numeric("06075")) works too
head(
  get_zctas_by_county(6075)
)
#> Using column county_fips_numeric
#> [1] "94102" "94103" "94104" "94105" "94107" "94108"

# Multiple counties at the same time are also OK
head(
  get_zctas_by_county(c("06075", "36059"))
)
#> Using column county_fips
#> [1] "11001" "11003" "11010" "11020" "11021" "11023"

?get_zcta_metadata

?get_zcta_metadata takes a vector of ZCTAs and returns all available metadata on them. The ZCTAs can be either character or numeric.

get_zcta_metadata("90210")
#> # A tibble: 1 × 9
#>   zcta  zcta_numeric state_name state_usps state_fips state_fips_numeric
#>   <chr>        <int> <chr>      <chr>      <chr>                   <int>
#> 1 90210        90210 california CA         06                          6
#> # ℹ 3 more variables: county_name <chr>, county_fips <chr>,
#> #   county_fips_numeric <int>

# Some ZCTAs span multiple counties
get_zcta_metadata(39573)
#> # A tibble: 6 × 9
#>   zcta  zcta_numeric state_name  state_usps state_fips state_fips_numeric
#>   <chr>        <int> <chr>       <chr>      <chr>                   <int>
#> 1 39573        39573 mississippi MS         28                         28
#> 2 39573        39573 mississippi MS         28                         28
#> 3 39573        39573 mississippi MS         28                         28
#> 4 39573        39573 mississippi MS         28                         28
#> 5 39573        39573 mississippi MS         28                         28
#> 6 39573        39573 mississippi MS         28                         28
#> # ℹ 3 more variables: county_name <chr>, county_fips <chr>,
#> #   county_fips_numeric <int>