org.basex.util
Class Levenshtein

java.lang.Object
  extended by org.basex.util.Levenshtein

public final class Levenshtein
extends java.lang.Object

This class assembles methods for fuzzy token matching.

Author:
Workgroup DBIS, University of Konstanz 2005-08, ISC License, Christian Gruen

Constructor Summary
Levenshtein()
          Constructor.
 
Method Summary
 boolean contains(byte[] tok, byte[] sub)
          Checks if the first token approximately contains the second fulltext term.
 boolean ftChar(byte ch)
          Checks if the specified character is a letter; special characters are converted to the standard ASCII charset.
 int ftNorm(int ch)
          Returns a lowercase ASCII character of the specified fulltext character.
 int ls(byte[] tok, int ts, int tl, byte[] sub, int k)
          Calculates a Levenshtein distance.
 boolean similar(byte[] tok, byte[] sub)
          Compares two character arrays for similarity.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Levenshtein

public Levenshtein()
Constructor.

Method Detail

similar

public boolean similar(byte[] tok,
                       byte[] sub)
Compares two character arrays for similarity.

Parameters:
tok - token to be compared
sub - second token to be compared
Returns:
true if the arrays are similar

ls

public int ls(byte[] tok,
              int ts,
              int tl,
              byte[] sub,
              int k)
Calculates a Levenshtein distance.

Parameters:
tok - token to be compared
ts - start position in token
tl - token length to be checked
sub - sub token to be compared
k - maximum number of accepted errors
Returns:
true if the arrays are similar

contains

public boolean contains(byte[] tok,
                        byte[] sub)
Checks if the first token approximately contains the second fulltext term.

Parameters:
tok - first token
sub - second token
Returns:
result of test

ftChar

public boolean ftChar(byte ch)
Checks if the specified character is a letter; special characters are converted to the standard ASCII charset. Note that this method does not support unicode characters.

Parameters:
ch - character to be converted
Returns:
converted character

ftNorm

public int ftNorm(int ch)
Returns a lowercase ASCII character of the specified fulltext character. Note that this method does not support unicode characters.

Parameters:
ch - character to be converted
Returns:
converted character