Make it easier to compare and cross-reference the names of companies and people by applying strong normalisation.
This library helps with the generation of fingerprints for entity data. A fingerprint
in this context is understood as a simplified entity identifier, derived from it’s
name or address and used for cross-referencing of entity across different datasets.
import fingerprints
fp = fingerprints.generate('Mr. Sherlock Holmes')
assert fp == 'holmes sherlock'
fp = fingerprints.generate('Siemens Aktiengesellschaft')
assert fp == 'ag siemens'
fp = fingerprints.generate('New York, New York')
assert fp == 'new york'
A significant part of what fingerprints
does it to recognize company legal form
names. For example, fingerprints
will be able to simplify Общество с ограниченной ответственностью
to ООО
, or Aktiengesellschaft
to AG
. The required database
is based on two different sources:
Wikipedia also maintains an index of types of business entity.