Skip to content

Database of diminutives (nicknames and shortened forms) of given names

Notifications You must be signed in to change notification settings

HaJongler/diminutives.db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This project is a database of common English diminutives (nicknames and shortened forms) of formal given names. It is useful whenever you need to search among lists of people's names for matches in a way that is tolerant to common colloquial variation. For example, "Daniel" may appear in databases as "Danny" or "Dan", and "Catherine" as "Cathie" or "Kate".

Methodology

The databases of diminutives, male_diminutives.csv and female_diminutives.csv, are manually-edited versions of data that was automatically extracted from Wiktionary by the PHP script bin/generate_diminutives_csv.php. The reason for the manual editing is that although the PHP script does a good job of extracting information from Wiktionary, it is not able to process all pages on diminutives and is confused by various capitalized words that it interprets as proper nouns (e.g. "Sometimes" and "Popular"). Also, the script is designed to stop processing an article and simply print the article's title to the console if it detects irregularities.

The output of generate_diminutives_csv.php is stored in the gen folder for reference. Each time that the script is executed, any changes to the "gen" files need to be applied to male_diminutives.csv and female_diminutives.csv.

Format of the CSV files

Each line of male_diminutives.csv and female_diminutives.csv consists of a formal given name followed by common diminutives of that name. For example, the following line from male_diminutives.csv indicates that "Nat" and "Nate" are common diminutives of the given name "Nathaniel":

Nathaniel,Nat,Nate

The CSV files are encoded in UTF-8.

Special exceptions

You should be aware of the following special case which cannot be added to the databases:

  • When a man's initials are J.E.B., he may go by Jeb.

License

Scripts are licensed under the terms of the GNU General Public License version 3 or any later version.

The data in male_diminutives.csv and female_diminutives.csv are Public Domain.

About

Database of diminutives (nicknames and shortened forms) of given names

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages