EDIT DISTANCE AND LEVENSHTEIN EDIT DISTANCE
What is Edit Distance ?
The minimum edit distance between two string is the minimum number of editing operations needed to transform one string into another string. In edit distance each operation has cost of 1. There are two strings given. The first string is the source string and the second string is the target string.
Editing Operations are :
- Insert a character
- Delete a character
- Replace a character
The time complexity for all three of these operations is O(1).
Different types of edit distance
- Levenshtein Distance
- Operations Allowed : Deletion, Insertion, Substitution
- Longest Common Subsequence
- Operations Allowed : Deletion, Insertion
- Demerau Levenshtein Distance
- Operations Allowed : Insertion, Deletion, Substitution, Transportation
- Hamming Distance
- Operation Allowed : Substitution
- Jaro Distance
- Operation Allowed : Transportation
SOURCE CODE
Edit Distance
def editdistance(str1, str2, m, n):
if m==0:
return n
if n==0:
return m
if str1[m-1] == str2[n-1]:
return editdistance(str1, str2, m-1, n-1)
return 1+ min(editdistance(str1, str2, m, n-1),
editdistance(str1, str2, m-1, n),
editdistance(str1, str2, m-1, n-1)
)
str1 = input("Enter string1: ")
str2 = input("Enter string2: ")
print("Edit Distance is: ",editdistance(str1, str2, len(str1), len(str2)))
OUTPUT
Levenshtein Edit Distance
import numpy as np
def levenshteinDistance(seq1, seq2):
size_x = len(seq1) + 1
size_y = len(seq2) + 1
matrix = np.zeros((size_x, size_y))
for x in range(size_x):
matrix[x, 0] = x
for y in range(size_y):
matrix[0, y] = y
for x in range(1, size_x):
for y in range(1, size_y):
if seq1[x-1]==seq2[y-1]:
matrix[x, y]=min(
matrix[x-1, y] + 1,
matrix[x-1, y-1],
matrix[x, y-1] + 1
)
else:
matrix[x, y]=min(
matrix[x-1, y] + 1,
matrix[x-1, y-1] + 1,
matrix[x, y-1] + 1
)
print(matrix)
return (matrix[size_x - 1, size_y - 1])
levenshteinDistance("cat","dog")
Comments
Post a Comment