I want to calculate the number of overlapping characters in a comparison with two strings. Suppose you have these comparisons:
boel <-> baal boel <-> bol beestenboel <-> boelsten beestenboel <-> baastenb hallo <-> hello
The results must be like these:
BoeL } b matches, o does not match, BaaL } e does not match, l matches. Result: overlap = 2 BOeL } b matches, o matches, l matches BO L } e does not match (it's not present in the lower string). Result: overlap = 3 B EeSTENboel } b matches, e matches (because o is only present in the lower BoElSTEN } string), the second e is no longer present (since we have already consumed an e from the lower string, l does not match, s, t, e, n match successively. (Notice that b, e, o and l from the upper string will be ignored, since all characters from the lower string have already been consumed.) Result: overlap = 6 BeeSTENBoel } b matches, the two e's do not match with the two a's, and again, BaaSTENB } s, t, e, n match. Result: overlap = 6 HaLLO } h matches, a doesn't match HeLLO } l, l and o match. Result: overlap = 4
I suspect I'm thinking too complicated... How can I achieve above results in MySQL or PHP?
(I guess the levenshtein algorithm is related to this question.)
This description reminds me of all the DNA aligning algorithms I learned during my studies. I'm not exactly sure, that you need all of the stuff they are doing, but have a look at Needleman-Wunsch and Smith-Waterman.
Maybe that will work for you: http://pl.php.net/manual/en/function.levenshtein.php
Yes, you can use Levenshtein Distance, but since your column is in English and assuming that it is one word length, you can use Soundex it can be applied to get the matching strings see this :
SOUNDEX(Word) AS SoundTest,
DIFFERENCE(Word, 'textentered') As DiffTest
As you mentioned yourself, the levenshtein algorithm is probaly what you need, so i'd suggest you to try that out. Whether it will return exactly the results you are expecting im unsure of, but you should look into all the comments on the page. There is so much gold to be harvested in the comment section
If you have root access to your server, you can also install this on your mySQL server, thanks to Matthieu Aubry