Google's i18n address data packaged for Python.
This package contains a copy of Google’s i18n address metadata repository that contains great data but comes with no uptime guarantees.
Contents of this package will allow you to programmatically build address forms that adhere to rules of a particular region or country, validate local addresses, and format them to produce a valid address label for delivery.
The package also contains a Python interface for address validation.
The normalize_address
function checks the address and either returns its canonical form (suitable for storage and use in addressing envelopes) or raises an InvalidAddressError
exception that contains a list of errors.
Here is the list of recognized fields:
country_code
is a two-letter ISO 3166-1 country codecountry_area
is a designation of a region, province, or state. Recognized values include official names, designated abbreviations, official translations, and Latin transliterationscity
is a city or town name. Recognized values include official names, official translations, and Latin transliterationscity_area
is a sublocality like a district. Recognized values include official names, official translations, and Latin transliterationsstreet_address
is the (possibly multiline) street addresspostal_code
is a postal code or zip codesorting_code
is a sorting codename
is a person’s namecompany_name
is a name of a company or organizationAddress validation with only country code:
from i18naddress import InvalidAddressError, normalize_address
try:
address = normalize_address({'country_code': 'US'})
except InvalidAddressError as e:
print(e.errors)
Output:
{'city': 'required',
'country_area': 'required',
'postal_code': 'required',
'street_address': 'required'}
With correct address:
from i18naddress import normalize_address
address = normalize_address({
'country_code': 'US',
'country_area': 'California',
'city': 'Mountain View',
'postal_code': '94043',
'street_address': '1600 Amphitheatre Pkwy'
})
print(address)
Output:
{'city': 'MOUNTAIN VIEW',
'city_area': '',
'country_area': 'CA',
'country_code': 'US',
'postal_code': '94043',
'sorting_code': '',
'street_address': '1600 Amphitheatre Pkwy'}
Postal code/zip code validation example:
from i18naddress import InvalidAddressError, normalize_address
try:
address = normalize_address({
'country_code': 'US',
'country_area': 'California',
'city': 'Mountain View',
'postal_code': '74043',
'street_address': '1600 Amphitheatre Pkwy'
})
except InvalidAddressError as e:
print(e.errors)
Output:
{'postal_code': 'invalid'}
In some cases, it may be useful to display foreign addresses in a more accessible format. You can use the latinize_address
function to obtain a more verbose, Latinized version of an address.
This version is suitable for display and useful for full-text search indexing, but the normalized form is what should be stored in the database and used when printing address labels.
from i18naddress import latinize_address
address = {
'country_code': 'CN',
'country_area': '云南省',
'postal_code': '677400',
'city': '临沧市',
'city_area': '凤庆县',
'street_address': '中关村东路1号'
}
latinize_address(address)
Output:
{'country_code': 'CN',
'country_area': 'Yunnan Sheng',
'city': 'Lincang Shi',
'city_area': 'Lincang Shi',
'sorting_code': '',
'postal_code': '677400',
'street_address': '中关村东路1号'}
It will also return expanded names for area types that normally use codes and abbreviations such as state names in the US:
from i18naddress import latinize_address
address = {
'country_code': 'US',
'country_area': 'CA',
'postal_code': '94037',
'city': 'Mountain View',
'street_address': '1600 Charleston Rd.'
}
latinize_address(address)
Output:
{'country_code': 'US',
'country_area': 'California',
'city': 'Mountain View',
'city_area': '',
'sorting_code': '',
'postal_code': '94037',
'street_address': '1600 Charleston Rd.'}
You can use the format_address
function to format the address following the destination country’s post office regulations:
address = {
'country_code': 'CN',
'country_area': '云南省',
'postal_code': '677400',
'city': '临沧市',
'city_area': '凤庆县',
'street_address': '中关村东路1号'
}
print(format_address(address))
Output:
677400
云南省临沧市凤庆县
中关村东路1号
CHINA
You can also ask for a Latin-friendly version:
address = {
'country_code': 'CN',
'country_area': '云南省',
'postal_code': '677400',
'city': '临沧市',
'city_area': '凤庆县',
'street_address': '中关村东路1号'
}
print(format_address(address, latin=True))
Output:
中关村东路1号
凤庆县
临沧市
云南省, 677400
CHINA
You can use the get_validation_rules
function to obtain validation data useful for constructing address forms specific for a particular country:
from i18naddress import get_validation_rules
get_validation_rules({'country_code': 'US', 'country_area': 'CA'})
Output:
ValidationRules(
country_code='US',
country_name='UNITED STATES',
address_format='%N%n%O%n%A%n%C, %S %Z',
address_latin_format='%N%n%O%n%A%n%C, %S %Z',
allowed_fields={'street_address', 'company_name', 'city', 'name', 'country_area', 'postal_code'},
required_fields={'street_address', 'city', 'country_area', 'postal_code'},
upper_fields={'city', 'country_area'},
country_area_type='state',
country_area_choices=[('AL', 'Alabama'), ..., ('WY', 'Wyoming')],
city_type='city',
city_choices=[],
city_area_type='suburb',
city_area_choices=[],
postal_code_type='zip',
postal_code_matchers=[re.compile('^(\\d{5})(?:[ \\-](\\d{4}))?$'), re.compile('^9[0-5]|96[01]')],
postal_code_examples=['90000', '96199'],
postal_code_prefix=''
)
You can use the KNOWN_FIELDS
set, to render optional address fields as hidden elements of your form:
from i18naddress import get_validation_rules, KNOWN_FIELDS
rules = get_validation_rules({'country_code': 'US'})
KNOWN_FIELDS - rules.allowed_fields
Output:
{'city_area', 'sorting_code'}
Raw data is stored in a dict:
from i18naddress import load_validation_data
i18n_country_data = load_validation_data()
i18n_country_data['US']
Output:
{'fmt': '%N%n%O%n%A%n%C, %S %Z',
'id': 'data/US',
'key': 'US',
'lang': 'en',
'languages': 'en',
'name': 'UNITED STATES',
'posturl': 'https://tools.usps.com/go/ZipLookupAction!input.action',
'require': 'ACSZ',
'state_name_type': 'state',
'sub_keys': 'AL~AK~AS~AZ~AR~AA~AE~AP~CA~CO~CT~DE~DC~FL~GA~GU~HI~ID~IL~IN~IA~KS~KY~LA~ME~MH~MD~MA~MI~FM~MN~MS~MO~MT~NE~NV~NH~NJ~NM~NY~NC~ND~MP~OH~OK~OR~PW~
PA~PR~RI~SC~SD~TN~TX~UT~VT~VI~VA~WA~WV~WI~WY',
'sub_names': 'Alabama~Alaska~American Samoa~Arizona~Arkansas~Armed Forces (AA)~Armed Forces (AE)~Armed Forces (AP)~California~Colorado~Connecticut~Delaware~District of Columbia~Florida~Georgia~Guam~Hawaii~Idaho~Illinois~Indiana~Iowa~Kansas~Kentucky~Louisiana~Maine~Marshall Islands~Maryland~Massachusetts~Michigan~Micronesia~Minnesota~Mississippi~Missouri~Montana~Nebraska~Nevada~New Hampshire~New Jersey~New Mexico~New York~North Carolina~North Dakota~Northern Mariana Islands~Ohio~Oklahoma~Oregon~Palau~Pennsylvania~Puerto Rico~Rhode Island~South Carolina~South Dakota~Tennessee~Texas~Utah~Vermont~Virgin Islands~Virginia~Washington~West Virginia~Wisconsin~Wyoming',
'sub_zipexs': '35000,36999~99500,99999~96799~85000,86999~71600,72999~34000,34099~09000,09999~96200,96699~90000,96199~80000,81999~06000,06999~19700,19999~20000,20099:20200,20599:56900,56999~32000,33999:34100,34999~30000,31999:39800,39899:39901~96910,96932~96700,96798:96800,96899~83200,83999~60000,62999~46000,47999~50000,52999~66000,67999~40000,42799~70000,71599~03900,04999~96960,96979~20600,21999~01000,02799:05501:05544~48000,49999~96941,96944~55000,56799~38600,39799~63000,65999~59000,59999~68000,69999~88900,89999~03000,03899~07000,08999~87000,88499~10000,14999:06390:00501:00544~27000,28999~58000,58999~96950,96952~43000,45999~73000,74999~97000,97999~96940~15000,19699~00600,00799:00900,00999~02800,02999~29000,29999~57000,57999~37000,38599~75000,79999:88500,88599:73301:73344~84000,84999~05000,05999~00800,00899~20100,20199:22000,24699~98000,99499~24700,26999~53000,54999~82000,83199:83414',
'sub_zips': '3[56]~99[5-9]~96799~8[56]~71[6-9]|72~340~09~96[2-6]~9[0-5]|96[01]~8[01]~06~19[7-9]~20[02-5]|569~3[23]|34[1-9]~3[01]|398|39901~969([1-2]\\d|3[12])~967[0-8]|9679[0-8]|968~83[2-9]~6[0-2]~4[67]~5[0-2]~6[67]~4[01]|42[0-7]~70|71[0-5]~039|04~969[67]~20[6-9]|21~01|02[0-7]|05501|05544~4[89]~9694[1-4]~55|56[0-7]~38[6-9]|39[0-7]~6[3-5]~59~6[89]~889|89~03[0-8]~0[78]~87|88[0-4]~1[0-4]|06390|00501|00544~2[78]~58~9695[0-2]~4[3-5]~7[34]~97~969(39|40)~1[5-8]|19[0-6]~00[679]~02[89]~29~57~37|38[0-5]~7[5-9]|885|73301|73344~84~05~008~201|2[23]|24[0-6]~98|99[0-4]~24[7-9]|2[56]~5[34]~82|83[01]|83414',
'upper': 'CS',
'zip': '(\\d{5})(?:[ \\-](\\d{4}))?',
'zip_name_type': 'zip',
'zipex': '95014,22162-1010'}
Django forms will return only required address fields in form.cleaned_data
dict. So addresses in the database will be normalized.
from django import forms
from i18naddress import InvalidAddressError, normalize_address, get_validation_rules
class AddressForm(forms.Form):
COUNTRY_CHOICES = [
('PL', 'Poland'),
('AE', 'United Arab Emirates'),
('US', 'United States of America')
]
ERROR_MESSAGES = {
'required': 'This field is required',
'invalid': 'Enter a valid name'
}
name = forms.CharField(required=True)
company_name = forms.CharField(required=False)
street_address = forms.CharField(required=False)
city = forms.CharField(required=False)
city_area = forms.CharField(required=False)
country_code = forms.ChoiceField(required=True, choices=COUNTRY_CHOICES)
country_area = forms.CharField(required=False)
postal_code = forms.CharField(required=False)
def clean(self):
clean_data = super(AddressForm, self).clean()
validation_rules = get_validation_rules(clean_data)
try:
valid_address = normalize_address(clean_data)
except InvalidAddressError as e:
errors = e.errors
valid_address = None
for field, error_code in errors.items():
if field == 'postal_code':
examples = validation_rules.postal_code_examples
msg = 'Invalid value, use format like %s' % examples
else:
msg = self.ERROR_MESSAGES[error_code]
self.add_error(field, msg)
return valid_address or clean_data