Get number of overlapping characters in PHP or MySQL?

I want to calculate the number of overlapping characters in a comparison with two strings. Suppose you have these comparisons:

boel <-> baal boel <-> bol beestenboel <-> boelsten beestenboel <-> baastenb hallo <-> hello

The results must be like these:

BoeL } b matches, o does not match, BaaL } e does not match, l matches. Result: overlap = 2 BOeL } b matches, o matches, l matches BO L } e does not match (it's not present in the lower string). Result: overlap = 3 B EeSTENboel } b matches, e matches (because o is only present in the lower BoElSTEN } string), the second e is no longer present (since we have already consumed an e from the lower string, l does not match, s, t, e, n match successively. (Notice that b, e, o and l from the upper string will be ignored, since all characters from the lower string have already been consumed.) Result: overlap = 6 BeeSTENBoel } b matches, the two e's do not match with the two a's, and again, BaaSTENB } s, t, e, n match. Result: overlap = 6 HaLLO } h matches, a doesn't match HeLLO } l, l and o match. Result: overlap = 4

I suspect I'm thinking too complicated... How can I achieve above results in MySQL or PHP?

(I guess the levenshtein algorithm is related to this question.)

-------------Problems Reply------------

This description reminds me of all the DNA aligning algorithms I learned during my studies. I'm not exactly sure, that you need all of the stuff they are doing, but have a look at Needleman-Wunsch and Smith-Waterman.

Maybe that will work for you:

Yes, you can use Levenshtein Distance, but since your column is in English and assuming that it is one word length, you can use Soundex it can be applied to get the matching strings see this :

SOUNDEX(Word) AS SoundTest,
DIFFERENCE(Word, 'textentered') As DiffTest

As you mentioned yourself, the levenshtein algorithm is probaly what you need, so i'd suggest you to try that out. Whether it will return exactly the results you are expecting im unsure of, but you should look into all the comments on the page. There is so much gold to be harvested in the comment section

If you have root access to your server, you can also install this on your mySQL server, thanks to Matthieu Aubry

Category:php Views:0 Time:2011-11-30

Related post

  • Finding (number of) overlaps in a list of time ranges 2010-02-11

    Given a list of time ranges, I need to find the maximum number of overlaps. Following is a dataset showing a 10 minute interval of calls, from which I am trying to find the maximum number of active lines in that interval. ie. from the example below,

  • How do I get the number of visible characters from a UTF-8 encoded char*? 2010-06-07

    I have a UTF-8 encoded char*. Is there a standard function to calculate the number of visible characters represented by the byte array? I'm on Red Hat (RHEL 5). --------------Solutions------------- Check the iconv library: man iconv_open. One can con

  • Android TextView and how to determine the number of visible characters 2010-08-22

    How can I determine the number of visible characters that a TextView can display. For example if I change the orientation this number may change. If I change the resolution then also the number of visible characters changes. Thanks in advance Thank y

  • mbsctows to count the number of wide characters in an array 2011-01-24

    I am currently working on UNIX and COBOL and have hit an requirement where I need to provide the number of chinese and korean characters in the received message which I plan to accomplish in C program using mbstows. I am using the below code which is

  • php proximity script - how to calculate the number of words/characters between 2 given terms/words? 2011-04-07

    Basically - I want to calculate the "Proximity" of various terms. By "proximity" I means Specifically the number of spaces/characters/words that sit between them. Example: Terms = Word1 / Word2 Chunk = "blah Word1 blah blah blah blah blah Word2 blah"

  • How to count the number of uppercase characters in a NSString? 2011-08-21

    I'm trying to find out the best way to count the number of uppercase characters that are in a NSString. I know how to find out if a certain character is uppercase by using this code: NSString *s = @"This is a string"; BOOL isUppercase = [[NSCharacter

  • How to count the number of unique characters in a file? 2012-03-24

    Given a file in UTF-8, containing characters in various languages, how can I obtain a count of the number of unique characters it contains, while excluding a select number of symbols (e.g.: "!", "@", "#", ".") from this count? --------------Solutions

  • Large number of "space" characters hangs Word & Outlook 2012-01-09

    Hi, I really just wanted to report this as a minor bug to Microsoft but i can't find how to do so i thought i'd just post here and hopefully someone will see it. Found something a bit odd in Word & Outlook 2013. If i copy & paste a large numb

  • Maximum number of text characters in a cell? 2014-06-05

    Is there a max number of text characters that I can enter in a cell? I am using the worksheet to make a chart with no formulas, just text and whenever I type a certain number of characters, the cell gets saved as ########. Is there a setting that I d

  • How to find the active number of open database connections in H2/MySQL 2011-05-26

    How to find the active number of open database connections in H2/MySQL. We need this information to identify if there are any connection leaks. --------------Solutions------------- For H2, use: select * from information_schema.sessions; For MySQL, us

  • Find the number of active connections per database in MySQL 2011-07-18

    I have a drupal application that uses the database named db1. Each time a drupal request is sent, a new connection to database will be established.So after a certain number of conneections has reached the site turns offline showing the following erro

  • Display the number of the characters in a string 2009-11-28

    I have a Java question: I am writing a program to read a string and display the number of characters in that string. I found some example code but I don't quite understand the last part - can anyone help? int[] count = countLetters(line.toLowerCase()

  • Count number of overlapping intervals under memory constraints? 2010-11-06

    I need to maintain a list of intervals in the form of tuple (x, y) and answer queries which ask for the total number of intervals overlapping a point p. If there is no memory constraint i think the efficient solution would be to use a segment tree wh

  • Regular Expressions for number range and characters 2010-12-16

    I need a regular expression that matches a combination of a number (larger than 5, but smaller than 500) and a text string that comes after the number. For example, the following matches would return true: 6 Items or 450 Items or 300 Items Red (there

  • Total number of UTF16 Characters 2011-02-13

    Can you calculate that a UTF16 Encoding represents 1,112,064 numbers by permuations/commbinations? --------------Solutions------------- The UNICODE standard is section 3.9 says: Each encoding form maps the Unicode code points U+0000..U+D7FF and U+E00

  • Limiting number of form characters with Javascript validation 2011-04-14

    How can I limit the number of characters that an input form allows? I'm using a validation like this //Last name var x=document.forms["regForm"]["lname"].value if (x==null || x=="") { alert("Last name must be filled out"); return false; } -----------

  • How to remove last n number of numeric characters from a string in perl 2011-05-23

    I have a situation where I need to remove the last n numeric characters after a / character. For eg: /iwmout/sourcelayer/iwm_service/iwm_ear_layer/[email protected]@/main/lsr_int_vnl46a/61 After the last /, I need the number 61 stripped out of the line s

  • Java regex: Symbol for any number of any characters? 2011-06-22

    I'm wondering is there a symbol for any number (including zero) of any characters --------------Solutions------------- .* . is any char, * means repeated zero or more times. Do you mean .* . any character, except newline character, with dotall mode i

  • Change number input to characters 2012-04-02

    I built this simple script to input a certain number, Now I want to use this but with characters, how would I change the Num == input and what would I change it to? Many Thanks <script type="text/javascript"> function check_number() { var num =

Copyright (C), All Rights Reserved.

processed in 0.093 (s). 11 q(s)