In 2018 we were doing a small research project and we needed to know if a name was male or female. After Googling for hours for 'baby name list', 'name database' and 'name list' we discovered that there was not a complete list for all countries with names and gender. Most lists of names we found were different per country, incomplete and contained names that are not used. That is why we created this list with names, gender and popularity.
Looking back, it took an enormous effort and a lot of patience to create this name list. To compile it we reached out to governments, statistical agencies and gathered open data from different resources. We received all kinds of files with different layouts and files we found online were incomplete and had character encoding issues. We restructured the files and imported everything into a single name database.
If you want to use our Name list with gender and popularity metric in an online service, research or scientific project, it is important to know how our database was created. In the paragraphs below we try to make as much clear as possible about how we worked and on what we based the list of names and results.
A census is an enumeration of people, houses, firms, or other important items in a country or region at a particular time. That is the reason why we called our service Name Census. To get all the names we reached out to governments, statistical agencies and gathered open data from different resources. We used 9.252.364 social media profiles that where publicly available to cross-reference and count each name per country. This way we were sure that the names in our name list are actually used and we could create our popularity metric.
European and North American countries are well organised and have governmental agencies or statistical bureaus that register the names and gender of newborn babies. In most countries these baby name lists are publicly available in the form of open data. During the first year we reached out to many official institutions and requested the lists of names of new born babies and their gender. Many European and North American countries delivered the data within a week. Next to the governmental agencies or statistical bureaus there are also many open data initiatives like:
Unfortunately not every country in the world was able to deliver a lists of names of new born babies and their gender. But that was not necessary. We discovered that we didn't need to have all the official names lists from all countries. Spanish is an official language in many other countries like: Mexico, Colombia and Argentina. We were able to use the official Spanish name list for the name census in all Spanish speaking countries as well. We applied the same logic for French, English and Portuguese name lists as well.
By using the official name lists of dozens of European countries in combination with social media profiles from countries where we did not have the official name lists, we were able to derive the name census from many countries.
Validating on Social Media
Our name list is created by using baby name lists obtained from governments and cross-referencing it with millions of names from publicly available social media profiles. We received the official name list in 30 countries. We took all those names and used 58.474.940 social media profiles that where publicly available to cross-reference and count each name per country. This meant we needed to know at least three things from each social media profile. It needed to have a: first name, last name and a location that we could translate to an actual country code.
We only wanted to use social media profiles with complete names so we were certain that a profile was from a real person. We used our name parser software to split any name into components like: first name, middle names and last name.
Because we do a name census per country we also needed to know from what country a social media profile originated. In order to do that we create a "city parser" that could take in any location string and match it to an official location. For example, if a profile had "The World" then we could not map it to an actual location so we didn't use the name. If a profile had the location string "The bay area" we knew it was somebody from the United States (US) and if it was "Berlin und Frankfurt" we knew it had to be Germany (DE).
Eventually only 9.252.364 of the 58.474.940 social media profiles had a complete name and valid location.
Name list as CSV file
Our name list is available as a CSV file. A Comma Separated Values (CSV) file is a plain text file that contains a list of data. These file types are very often used for exchanging data between applications. Each row contains a name and has a few additional columns like gender and popularity. We use a semicolon (;) as a separation character in the CSV file. Download Name Census top 100 from Kaggle to get an idea of our database.
We received lists of names from different governments and sources in different formats with different content encodings. It was a real jungle and we decided to standardize everything. All our data is formatted in UTF-8 and each name is available in its original UTF-8 format and a ASCII format.
The following table shows an example of the columns we have available in the CSV file.
With 859.257 validated first names from 104 countries Name Census is the world's most complete name list including the gender and popularity that is available for download. Because the name list is a CSV file it is easy to import into any database. To get an idea of the possibilities of name list we listed a few typical use cases.
Auto complete forms
- Shorten contact, registration and order forms
- Get more conversions (people don't like to fill in forms)
- Get more information out of your form (gender and nationality)
Create a local name parser
Do you have a website where people can sign up, order or submit a question via a contact form? You can use our name parsing software to check if a name exists, is not made up or misspelled. If you don't want to depend on our API you can always buy our name list and create a local name parser.
- Split up the first and last name of your customers
- Make sure names are not misspelled
- Get more information from your customers (gender and nationality)
We always keep on improving our name list, software and website. This changelog gives a brief overview of all the changes we had and progress we made.