Damerau-Levenshtein distance
The Damerau-Levenshtein distance is a string metric used to measure the distance between two sequences of characters, typically strings. It is similar to the Levenshtein distance, which calculates the minimum number of single-character insertions, deletions, and substitutions required to transform one string into another. However, the Damerau-Levenshtein distance also allows for transpositions (i.e., swapping adjacent characters) as a valid operation.
The distance is computed by finding the minimum number of operations required to transform one string into the other, where each operation is either a deletion, insertion, substitution, or transposition. The distance is often used in spelling correction and approximate string matching applications.
The formula for calculating the Damerau-Levenshtein distance between two strings, a and b, is as follows:
D(a, b) = m - t
where m is the total number of operations required to transform string a into string b, and t is the number of transpositions. The operations allowed are deletion, insertion, substitution, and transposition. The transposition operation involves swapping two adjacent characters.
The Damerau-Levenshtein distance is a useful metric in various fields, including computer science, linguistics, and natural language processing. It has applications in spell checking, OCR, and DNA sequencing.
def damerau_levenshtein_distance(s1, s2): d = {} lenstr1 = len(s1) lenstr2 = len(s2) for i in range(-1,lenstr1+1): d[(i,-1)] = i+1 for j in range(-1,lenstr2+1): d[(-1,j)] = j+1 for i in range(lenstr1): for j in range(lenstr2): if s1[i] == s2[j]: cost = 0 else: cost = 1 d[(i,j)] = min( d[(i-1,j)] + 1, # deletion d[(i,j-1)] + 1, # insertion d[(i-1,j-1)] + cost, # substitution ) if i and j and s1[i] == s2[j-1] and s1[i-1] == s2[j]: d[(i,j)] = min(d[(i,j)], d[i-2,j-2] + 1) # transposition return d[lenstr1-1,lenstr2-1]
Additional links:
Расстояние Левенштейна в MySQL и алгоритмы нечёткого поиска средствами PHP

Комментарии
Отправить комментарий