In 2018 we were doing a research project, and we needed to know if a name was male or female. After Googling for hours for 'baby name lists', 'name databases' and 'name datasets' we discovered that there wasn't a complete name database for all countries with first names and gender. Most name database layouts we found different per country, were incomplete or contained non-existing names. That is why we created Name Census, the most comprehensive name database in the world!

Looking back, it took an enormous effort and a lot of patience to create this name database. To compile it we reached out to governments, statistical agencies and gathered open data from different resources. We received all kinds of files with different layouts and files we found online were incomplete and had character encoding issues. We restructured the files and imported everything into this standardized first name and surname database.

If you want to use our Name database in an online service, research or scientific project, it is important to understand how our database was created. In the paragraphs below we try to make as much clear as possible about how we worked and on what we based the list of names and results.

Methodology

A census is an enumeration of people, houses, firms, or other important items in a country or region at a particular time. That is the reason why we called our service Name Census. To get all the first names and surnames we reached out to governments, statistical agencies and gathered open data from different resources. We used 22.055.118 social media profiles that where publicly available to cross-reference and count each first name and surname per country. This way we were sure that the names in our name database are actually used, and we could create our popularity metric.

Government agencies

European and North American countries are well organised and have governmental agencies or statistical bureaus that register the names and gender of newborn babies. In many countries these baby name databases are publicly available in the form of open data. During the first year we reached out to many official institutions and requested the lists of names of newborn babies and their gender. Many European and North American countries delivered the data within a week. Next to the governmental agencies or statistical bureaus there are also many open data initiatives like:

Unfortunately not every country in the world was able to deliver a lists of names of newborn babies and their gender. But that was not necessary. We discovered that we didn't need to have all the official first names and surnames from all countries. Spanish is an official language in many other countries like: Mexico, Colombia and Argentina. We were able to use the official Spanish first name and surname database for the name census in all Spanish-speaking countries as well. We applied the same logic for French, English and Portuguese name databases as well.

By using the official name databases of dozens of European countries and social media we were able to derive the name census for many other countries.

Validating on Social Media

Our name database is created by first names and surnames obtained from governments and cross-referencing with millions of names from publicly available social media profiles. We received the official name database in 31 countries. We took all those names and used 139.388.346 social media profiles that where publicly available to cross-reference and count each name per country. This meant we needed to know at least three things from each social media profile. It needed to have a:

  • First name
  • Surname
  • Location (e.g.: Paris, France)

We only wanted to use social media profiles with complete names so we were certain that a profile was from a real person. We used our name parsing software to split the complete name into components like: first name, middle names and surname.

Because we compiled a name database per country we also needed to know from what country a social media profile originated. In order to do that we create a "city parser" that could take in a location string and match it to an official location and country code. For example, if a profile had "The World" then we could not map it to an actual location, so we didn't use the name. If a profile had the location string "The bay area" we knew it was somebody from the United States (US) and if it was "Berlin und Frankfurt" we knew it had to be Germany (DE).

Eventually only 22.055.118 of the 139.388.346 social media profiles had a complete name and valid location.

Name database in CSV, SQL and JSON format

Our name database is available in CSV, SQL and JSON format. These file types are very often used for exchanging data between applications. The files are encoded using the UTF-8 character encoding standard. You can download the Name Census top 100 from Github or Kaggle to get a preview of the format. Under this table we have a short description for each column.

The following table shows an example of the first name database columns.

Name ASCII Country code Official*1 Gender*2 Unisex Frequency*3 Country Rank*4
Сергей Sergej RU Y M N 4 4.573
Анна Anna RU Y F N 6 7.214
Björn Bjorn DE Y M N 464 837
Jürgen Jurgen DE Y M N 39 1.355
Hélène Helene FR Y F N 11 829

The following table shows an example of the surname database columns.

Name ASCII Country code Official*1 Gender*2 Frequency*3 Country Rank*4
Nuñez nunez ES Y 1.903 45
Bhardwaj bhardwaj IN Y 1.826 48
иванов ivanov RU Y M 1.882 1
Jónsdóttir jonsdottir IS N 63 2
Genç genc TR Y 1.773 54

*1 We receive official lists of first and last names from governments and statistical agencies in many countries. If we find a name on social media in a country, and it matches a name on an official country list, we mark it as official in our records. However, there are some countries where we don't have access to the official name list. In those cases, if we find a name on social media that is an official name in another country, we count it but mark it as 'unofficial'.

*2 The field gender indicates if a name is for a male (M) or a female (F). Name-gender associations vary across countries and cultures. Some names are gender-specific, while others are gender-neutral. For example, Robin is often seen as a male name in some countries, a female name in others, and gender-neutral in yet others. In some countries surnames can provide a clue about the gender of the person. In Russian, for instance, many last names have distinct male and female forms.

*3 The frequency column shows how often we found a particular first or surname on social media in a country. We validated this by checking 22.055.118 social media profiles where we knew the country. Every time we found the name, we increased its frequency count by one. The frequency of a name indicates how many times we found that name in a specific country. The name "John" has a different frequency in the United States than in Germany.

*4 When you sort all the names from a country by frequency, you may notice large gaps between the names. This is because frequency is not a linear scale. To address this, we created the country rank to indicate how popular a name is in a particular country, based on its frequency. The most popular name in a country is assigned a rank of 1, while the least popular name is assigned the last rank.

Use cases

With 1.507.690 validated first names and 3.251.185 validated surnames from 139 countries Name Census is the world's most comprehensive name database that is available for download. Because the name database is available in CSV, SQL and JSON format it is easy to import it into any database. To get an idea of the possibilities of name list we listed a few typical use cases.

Auto complete forms

Almost every website has a some kind of form where you have to enter you name. By using our name database, a memory table and some JavaScript you can create a 'Google like' auto complete. When a visitor enters the first letters of his name you can show the most popular names. This reduces the number of misspelled names and tells you the gender.

  • Shorten contact, registration and order forms
  • Get more conversions (people don't like to fill in forms)
  • Get more information out of your form (gender and nationality)

Create a local name parser

Do you have a website where people can sign up, order or submit a question via a contact form? You can use our name parsing software to check if a name exists, is not made up or misspelled. If you don't want to depend on our API you can always buy our name list and create a local name parser.

  • Split up the first and last name of your customers
  • Make sure names are not misspelled
  • Get more information from your customers (gender and nationality)

FAQ

What is a maiden name?

A maiden name is the surname or last name that a woman has at birth, and before she marries. It is often used to refer to a woman's family name before she takes her husband's surname upon marriage.

What is a surname?

A surname is a family name that is passed down from generation to generation and is typically the last name that appears in a person's full name. It is used to identify a person's family and lineage. In many cultures, a person's surname is inherited from the father, but in some cases, it may be inherited from their mother or a combination of both.

Is surname the same as last name?

Yes, a surname is typically the same as a last name. It is the family name that is passed down from generation to generation and is usually the last name that appears in a person's full name.

What is the difference between surname and last name?

Surname and last name are often used interchangeably. Technically, a surname is a family name that is passed down from generation to generation. The last name is simply the name that appears last in a person's full name.

Can you change your surname?

Surnames can be changed through marriage, adoption, or legal name changes. Legally changing your surname involves filing a petition with the court and paying a fee. The process varies depending on your country and state.

Changelog

We always keep on improving our databases, software and website. This changelog gives a brief overview of all the changes we had and progress we made.

Version Date Changes
5 2023-03-13
  • Added additional file formats SQL and JSON in addition to CSV file.
  • Improved the content of all pages to make the name database easier to find.
  • Improved the order process.
  • Implemented recaptcha on every page that uses a form to block bots.
4 2023-02-27
  • Updated first name database containing 1.507.690 names from 139 countries.
  • Started exporting the surname database and adding it to our service.
  • Surname database created containing 3.251.185 names from 139 countries.
3 2022-10-23
  • Updated first name database containing 1.467.445 names from 141 countries.
2 2021-01-30
  • Improved the delivery of databases via visible link in confirmation email.
  • Updated first name database containing 995.718 names from 113 countries.
1 2020-08-31
  • Launched first version of the website offering official name lists.
  • Added Stripe as a payment method to offer an easy payment flow.
  • Updated first name database containing 859.257 names from 104 countries.