Soundex uses a phonetic algorithm to classify a sound as it is prounced. It focuses on matching phraes which have minor spelling errors.
Soundex |
Outline
Soundex uses a phonetic algorithm to classify a sound as it is prounced. It focuses on matching phraes which has minor spelling errors. A soundex code has a letter followed by three numbers, such as C253. First letter is the first letter of the surname. The numbers represent the following codes:
Number Represents the Letters 1 B, F, P, V 2 C, G, J, K, Q, S, X, Z 3 D, T 4 L 5 M, N 6 R
We disregard the letters of A, E, I, O, U, H, W, and Y.
For example, "Buchanan" becomes:
B255 - "B" ... "C" ... "N" ... "N"
The name "Lee" becomes:
L000 = "L"
package main import ( "fmt" "flag" "github.com/umahmood/soundex" "math" "github.com/toldjuuso/go-jaro-winkler-distance" "github.com/agnivade/levenshtein" "github.com/masatana/go-textdistance" "github.com/jjhendricks/nysiis" ) func main() { s1 := "coconut" s2 := "chocolate" flag.Parse() args := flag.Args() if len(args)>0 { s1=args[0] } if len(args)>1 { s2=args[1] } fmt.Printf("Soundex code for %v:\t%v\n",s1,soundex.Code(s1)) fmt.Printf("Soundex code for %v:\t%v\n\n",s2,soundex.Code(s2)) fmt.Printf("NYSIIS for %v:\t%v\n\n",s1,nysiis.NYSIIS(s1)) fmt.Printf("NYSIIS for %v:\t%v\n\n",s2,nysiis.NYSIIS(s2)) dist := float64(levenshtein.ComputeDistance(s1, s2)) largest:=math.Max(float64(len(s1)), float64(len(s2) )) edit_dist:=100*(largest-dist)/largest damera := float64(textdistance.DamerauLevenshteinDistance(s1, s2)) damera_dist:=100*(largest-damera)/largest jaro,_ :=textdistance.JaroDistance(s1, s2) jaro_dist:=100*(largest-jaro)/largest fmt.Printf("==Metrics==\nString\tString\tJaro W\tDistance\tDamerau\tJaro\n%s\t%s\t%.2f\t%.2f\t\t%.2f\t%.2f", s1,s2,jwd.Calculate(s1,s2)*100,edit_dist,damera_dist,jaro_dist) } }
We can also add other metrics for similarity such as for the Jaro-Winkler index and the edit distance index. A sample run for Mayer and Mire is:
Soundex code for Mayer: M600 Soundex code for Mire: M600 NYSIIS for Mayer: MAYAR NYSIIS for Mire: MAR ==Metrics== String String Jaro W Distance Damerau Jaro Mayer Mire 67.00 40.00 40.00 87.33
In this case we have also added in a coding for NYSIIS (New York State Identification and Intelligence System).