How to Standardize a Person’s Name
At RoboTrackers, we take data quality very seriously. Data quality differentiates RoboTrackers from other services.
Name is one of the most critical components of a Person’s data. Here are some of the challenges we faced.
Challenge 1: Name Order
Order | Sequence |
---|---|
Western | first, middle, last |
Eastern | last, (middle), first |
E.g. Let’s look at Jack Ma
Column | Detail |
---|---|
English first name | Jack |
English middle name | - |
English last name | Ma |
Native first name | 云 (Yun) |
Native middle name | - |
Native last name | 马 |
English name order: Jack Ma Yun
Chinese name order: 马云
Transliteral name: Ma Yun
For the Chinese, there is no middle name. If a Chinese is not familiar with Western name order, he may write it as Ma Yun.
A westerner may identify Ma as the first name while Yun as the last name, which is wrong. Vice versa.
This example illustrates the challenge of standardizing a person’s name.
To avoid this confusion, this is how we do it
RoboTrackers name order: Jack Ma 云马
We will follow the western name order for both English and native name.
Challenge 2: Lack of Expert
We hired data entry experts (DEE) of different origins and we observed that they have difficulties with native names.
Our non-Chinese DEE is unable to differentiate the native first and last name. She will identify 马 as the native first name while 云 as the native last name, using western name order logic, which is wrong.
Challenge 3: Unique Standardization
If you Google 云马 vs 马云, you will understand that they are very different. This makes it very hard for us to merge web results with our data due to different standards.
Final
Even with these challenges, we do not give up and do our best.
Here is our vision for the future. We plan to hire 1 person from every country and they will handle the native names that they understand. Their job would be
- Ensuring all the native names are correct
- Built the world’s most comprehensive name library for every language
All people in our database have a UUID (Universally unique identifier), and we will make an API that everyone can use.
We will make an API that accepts any name, and return results with the correct name order.
We will set the industry standard that everyone follows.