How to Standardize a Person's Name

How to Standardize a Person’s Name

At RoboTrackers, we take data quality very seriously. Data quality differentiates RoboTrackers from other services.

Name is one of the most critical components of a Person’s data. Here are some of the challenges we faced.

Challenge 1: Name Order

Order Sequence
Western first, middle, last
Eastern last, (middle), first

E.g. Let’s look at Jack Ma

Column Detail
English first name Jack
English middle name -
English last name Ma
Native first name 云 (Yun)
Native middle name -
Native last name

English name order: Jack Ma Yun

Chinese name order: 马云

Transliteral name: Ma Yun

For the Chinese, there is no middle name. If a Chinese is not familiar with Western name order, he may write it as Ma Yun.

A westerner may identify Ma as the first name while Yun as the last name, which is wrong. Vice versa.

This example illustrates the challenge of standardizing a person’s name.

To avoid this confusion, this is how we do it

RoboTrackers name order: Jack Ma 云马

We will follow the western name order for both English and native name.

Challenge 2: Lack of Expert

We hired data entry experts (DEE) of different origins and we observed that they have difficulties with native names.

Our non-Chinese DEE is unable to differentiate the native first and last name. She will identify 马 as the native first name while 云 as the native last name, using western name order logic, which is wrong.

Challenge 3: Unique Standardization

If you Google 云马 vs 马云, you will understand that they are very different. This makes it very hard for us to merge web results with our data due to different standards.

Final

Even with these challenges, we do not give up and do our best.

Here is our vision for the future. We plan to hire 1 person from every country and they will handle the native names that they understand. Their job would be

  1. Ensuring all the native names are correct
  2. Built the world’s most comprehensive name library for every language

All people in our database have a UUID (Universally unique identifier), and we will make an API that everyone can use.

We will make an API that accepts any name, and return results with the correct name order.

We will set the industry standard that everyone follows.

Reference