Did you say your name is “GHOSH”?

Within many applications we search for names, but often we mispell them. So, my name will often comes out as “Buchan”, “Buchann”, and…

Photo by Jon Tyson on Unsplash

Did you say your name is “GHOSH”?

“Okay, what is your name?”, “My name is GAUSS!”, “Did you say GHOSH?”, “No. GAUSS!!!”. “GHOST?”, “No. GAUSS$%#!”, … the telephone is hung up!

Within many applications, we search for names, but often we misspell them. So, my name will often come out as “Buchan”, “Buchann”, and “Buckanin”. With this, people will hear the word in their head, and then try to spell it phonetically. It is a method that has never left us since we learn it as a child. So a “Castle” then becomes “k-a-s-e-l”, or more formally as “kɑːs(ə)l”. These are the phonemes that we use in English:

Soundex uses a phonetic algorithm to classify a sound as it is pronounced. It focuses on matching phrases which have minor spelling errors. A Soundex code has a letter followed by three numbers, such as C253. The first letter is the first letter of the surname. The numbers represent the following codes:

Number 	Letters
1 B, F, P, V
2 C, G, J, K, Q, S, X, Z
3 D, T
4 L
5 M, N
6 R

We disregard the letters of A, E, I, O, U, H, W, and Y. For example, “Buchanan” becomes [here]:

B255 - "B" ... "C" ... "N" ... "N"

The name “Lee” becomes:

L000 = "L"

We can write a Go program to generate these codes [here]:

Here are some examples:

  • word1 =”Miller”, word2 =”Muller” Try!
  • word1 =”Buchanan”, word2 =”Buchann” Try!
  • word1 =”ASHCRAFT”, word2 =”ASHCROFT” Try!
  • word1 =”ROBERT”, word2 =”RUPERT” Try!
  • word1 =”GAUSS”, word2 =”GHOSH” Try!

We can also add other metrics for similarity such as for the Jaro-Winkler index and the edit distance index. A sample run for Mayer and Mire is [here]:

Mayer:	M600
Mire: M600
==Metrics==
String String Jaro Distance
Mayer Mire 67.00 40.00

Conclusions

If you are interested in learning more about similarity matching, read on here: