org.basex.util
Class Token

java.lang.Object
  extended by org.basex.util.Token

public final class Token
extends java.lang.Object

This class provides convenience operations for handling so-called 'Tokens'. Tokens in BaseX are nothing else than UTF8 encoded strings, stored in a byte array. Note that, to guarantee a consistent string representation, all string conversions should be done via the methods of this class.

Author:
Workgroup DBIS, University of Konstanz 2005-08, ISC License, Christian Gruen

Field Summary
static byte[] AMP
          Ampersand Entity.
static byte[] APOS
          Apostrophe Entity.
static byte[] EMPTY
          Empty token.
static byte[] FALSE
          False token.
static byte[] GT
          GreaterThan Entity.
static byte[] INF
          Infinity.
static java.text.DecimalFormatSymbols LOC
          US charset.
static byte[] LT
          LessThan Entity.
static int MAXLEN
          Maximum length for hash calculation and index terms.
static byte[] MZERO
          Zero token.
static byte[] NAN
          Infinity.
static byte[] NINF
          Infinity.
static byte[] NORM
          Normalize special characters.
static byte[] NULL
          Dots.
static byte[] ONE
          One token.
static byte[] QU
          Quote Entity.
static byte[] SPACE
          Space token.
static byte[] TRUE
          True token.
static java.lang.String UTF16BE
          UTF16 encoding string.
static java.lang.String UTF16LE
          UTF16 encoding string.
static java.lang.String UTF8
          UTF8 encoding string.
static java.lang.String UTF82
          UTF8 encoding string (variant).
static byte[] XML
          XML Token.
static byte[] XMLNS
          XMLNS Token.
static byte[] XMLNSC
          XMLNS Token with colon.
static byte[] ZERO
          Zero token.
 
Method Summary
static boolean ascii(byte[] text)
          Checks if the specified token only consists of ASCII characters.
static byte[] chopNumber(byte[] t)
          Finishes the numeric token, removing trailing zeroes.
static int cl(byte v)
          Returns the expected codepoint length of the specified byte.
static byte[] concat(byte[]... t)
          Concatenates the specified tokens.
static boolean contains(byte[] tok, byte[] sub)
          Checks if the first token contains the second token.
static boolean contains(byte[] tok, int c)
          Checks if the first token contains the specified character.
static boolean containslc(byte[] tok, byte[] sub)
          Checks if the first token contains the second token in lowercase.
static int cp(byte[] t, int p)
          Returns the codepoint (unicode value) of the specified token, starting at the specified position.
static byte[] dc(byte[] t)
          Removes diacritics from the specified token.
static byte[] delete(byte[] t, byte[] c)
          Deletes the specified characters out of the token.
static byte[] delete(byte[] t, int c)
          Deletes the specified character out of the token.
static int diff(byte[] tok, byte[] tok2)
          Calculates the difference of two character arrays.
static int diff(int tok, int tok2)
          Calculates the difference of two characters.
static boolean digit(int c)
          Checks if the specified character is a digit.
static boolean endsWith(byte[] tok, byte[] sub)
          Checks if the first token ends with the second token.
static boolean endsWith(byte[] tok, int c)
          Checks if the first token starts with the specified character.
static boolean eq(byte[] tok, byte[] tok2)
          Compares two character arrays for equality.
static boolean eq(byte tok, byte tok2)
          Compares two character arrays for equality.
static int hash(byte[] tok)
          Calculates a hash code for the specified token.
static int indexOf(byte[] tok, byte[] sub)
          Returns the position of the specified token or -1.
static int indexOf(byte[] tok, byte[] sub, int p)
          Returns the position of the specified token or -1.
static int indexOf(byte[] tok, int c)
          Returns the position of the specified character or -1.
static byte[] lc(byte[] t)
          Converts the specified token to lower case.
static int lc(int ch)
          Converts a character to lower case.
static int len(byte[] text)
          Returns the token length.
static boolean letter(int c)
          Checks if the specified character is a letter.
static boolean letterOrDigit(int c)
          Checks if the specified character is a letter or digit.
static byte[] ln(byte[] name)
          Returns the local name of the specified name.
static byte[] norm(byte[] tok)
          Normalizes all whitespace occurrences from the specified token.
static int numDigits(int x)
          Checks number of digits of the specified integer.
static byte[] pre(byte[] name)
          Returns the prefix of the specified token.
static byte[] replace(byte[] t, int s, int r)
          Replaces the specified character and returns the result token.
static byte[][] split(byte[] tok, int sep)
          Splits the token at all whitespaces and returns a array with all tokens.
static boolean startsWith(byte[] tok, byte[] sub)
          Checks if the first token starts with the second token.
static boolean startsWith(byte[] tok, int c)
          Checks if the first token starts with the specified character.
static java.lang.String string(byte[] text)
          Returns the specified token as string.
static java.lang.String string(byte[] text, int s, int l)
          Returns the specified token as string.
static byte[] substring(byte[] tok, int s)
          Returns a subtoken of the specified token.
static byte[] substring(byte[] tok, int s, int e)
          Returns a substring of the specified token.
static double toDouble(byte[] to)
          Converts the specified token into a double value.
static int toInt(byte[] to)
          Converts the specified token into an integer value.
static int toInt(byte[] to, int ts, int te)
          Converts the specified token into an integer value.
static int toInt(java.lang.String to)
          Converts the specified string into an integer value.
static byte[] token(boolean b)
          Creates a byte array representation of the specified boolean value.
static byte[] token(double d)
          Creates a byte array representation from the specified double value; inspired by Xavier Franc's Qizx.
static byte[] token(float f)
          Creates a byte array representation from the specified float value.
static byte[] token(int i)
          Creates a byte array representation of the specified integer value.
static byte[] token(long i)
          Creates a byte array representation from the specified long value, using Java's standard method.
static byte[] token(java.lang.String s)
          Converts a string to a byte array.
static long toLong(byte[] to)
          Converts the specified token into an long value.
static long toLong(byte[] to, int ts, int te)
          Converts the specified token into an long value.
static long toLong(java.lang.String to)
          Converts the specified string into an long value.
static int toSimpleInt(byte[] to)
          Converts the specified token into a positive integer value.
static byte[] translate(byte[] tok, byte[] srch, byte[] rep)
          Performs a translation on the specified token.
static byte[] trim(byte[] t)
          Removes leading and trailing whitespaces from the specified token.
static byte[] uc(byte[] t)
          Converts the specified token to upper case.
static int uc(int ch)
          Converts a character to upper case.
static byte[] utf8(byte[] s, java.lang.String enc)
          Converts a token from the input encoding to UTF8.
static boolean ws(byte[] tok)
          Checks if the specified token has only whitespaces.
static boolean ws(int ch)
          Checks if the specified character is a whitespace.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAXLEN

public static final int MAXLEN
Maximum length for hash calculation and index terms.

See Also:
Constant Field Values

XML

public static final byte[] XML
XML Token.


XMLNS

public static final byte[] XMLNS
XMLNS Token.


XMLNSC

public static final byte[] XMLNSC
XMLNS Token with colon.


TRUE

public static final byte[] TRUE
True token.


FALSE

public static final byte[] FALSE
False token.


NAN

public static final byte[] NAN
Infinity.


INF

public static final byte[] INF
Infinity.


NINF

public static final byte[] NINF
Infinity.


NULL

public static final byte[] NULL
Dots.


EMPTY

public static final byte[] EMPTY
Empty token.


SPACE

public static final byte[] SPACE
Space token.


ZERO

public static final byte[] ZERO
Zero token.


MZERO

public static final byte[] MZERO
Zero token.


ONE

public static final byte[] ONE
One token.


QU

public static final byte[] QU
Quote Entity.


AMP

public static final byte[] AMP
Ampersand Entity.


APOS

public static final byte[] APOS
Apostrophe Entity.


GT

public static final byte[] GT
GreaterThan Entity.


LT

public static final byte[] LT
LessThan Entity.


UTF8

public static final java.lang.String UTF8
UTF8 encoding string.

See Also:
Constant Field Values

UTF82

public static final java.lang.String UTF82
UTF8 encoding string (variant).

See Also:
Constant Field Values

UTF16LE

public static final java.lang.String UTF16LE
UTF16 encoding string.

See Also:
Constant Field Values

UTF16BE

public static final java.lang.String UTF16BE
UTF16 encoding string.

See Also:
Constant Field Values

LOC

public static final java.text.DecimalFormatSymbols LOC
US charset.


NORM

public static final byte[] NORM
Normalize special characters. To be extended for UTF8 support.

Method Detail

string

public static java.lang.String string(byte[] text)
Returns the specified token as string.

Parameters:
text - token
Returns:
string

string

public static java.lang.String string(byte[] text,
                                      int s,
                                      int l)
Returns the specified token as string.

Parameters:
text - token
s - start position
l - length
Returns:
string

ascii

public static boolean ascii(byte[] text)
Checks if the specified token only consists of ASCII characters.

Parameters:
text - token
Returns:
result of check

token

public static byte[] token(java.lang.String s)
Converts a string to a byte array. All strings should be converted by this function to guarantee a consistent character conversion.

Parameters:
s - string to be converted
Returns:
byte array

utf8

public static byte[] utf8(byte[] s,
                          java.lang.String enc)
Converts a token from the input encoding to UTF8.

Parameters:
s - token to be converted
enc - input encoding
Returns:
byte array

cp

public static int cp(byte[] t,
                     int p)
Returns the codepoint (unicode value) of the specified token, starting at the specified position.

Parameters:
t - token
p - character position
Returns:
current character

cl

public static int cl(byte v)
Returns the expected codepoint length of the specified byte.

Parameters:
v - first character byte
Returns:
character length

len

public static int len(byte[] text)
Returns the token length.

Parameters:
text - token
Returns:
length

token

public static byte[] token(boolean b)
Creates a byte array representation of the specified boolean value.

Parameters:
b - boolean value to be converted
Returns:
boolean value in byte array

token

public static byte[] token(int i)
Creates a byte array representation of the specified integer value.

Parameters:
i - int value to be converted
Returns:
integer value in byte array

numDigits

public static int numDigits(int x)
Checks number of digits of the specified integer.

Parameters:
x - number to be checked
Returns:
number of digits

token

public static byte[] token(long i)
Creates a byte array representation from the specified long value, using Java's standard method.

Parameters:
i - int value to be converted
Returns:
byte array

token

public static byte[] token(double d)
Creates a byte array representation from the specified double value; inspired by Xavier Franc's Qizx.

Parameters:
d - double value to be converted
Returns:
byte array

token

public static byte[] token(float f)
Creates a byte array representation from the specified float value.

Parameters:
f - float value to be converted
Returns:
byte array

chopNumber

public static byte[] chopNumber(byte[] t)
Finishes the numeric token, removing trailing zeroes.

Parameters:
t - token to be modified
Returns:
token

toDouble

public static double toDouble(byte[] to)
Converts the specified token into a double value. Double.NaN is returned if the input is invalid.

Parameters:
to - character array to be converted
Returns:
converted double value

toLong

public static long toLong(java.lang.String to)
Converts the specified string into an long value. Long.MIN_VALUE is returned when the input is invalid.

Parameters:
to - character array to be converted
Returns:
converted long value

toLong

public static long toLong(byte[] to)
Converts the specified token into an long value. Long.MIN_VALUE is returned when the input is invalid.

Parameters:
to - character array to be converted
Returns:
converted long value

toLong

public static long toLong(byte[] to,
                          int ts,
                          int te)
Converts the specified token into an long value. Long.MIN_VALUE is returned when the input is invalid.

Parameters:
to - character array to be converted
ts - first byte to be parsed
te - last byte to be parsed - exclusive
Returns:
converted long value

toInt

public static int toInt(java.lang.String to)
Converts the specified string into an integer value. Integer.MIN_VALUE is returned when the input is invalid.

Parameters:
to - character array to be converted
Returns:
converted integer value

toInt

public static int toInt(byte[] to)
Converts the specified token into an integer value. Integer.MIN_VALUE is returned when the input is invalid.

Parameters:
to - character array to be converted
Returns:
converted integer value

toInt

public static int toInt(byte[] to,
                        int ts,
                        int te)
Converts the specified token into an integer value. Integer.MIN_VALUE is returned when the input is invalid.

Parameters:
to - character array to be converted
ts - first byte to be parsed
te - last byte to be parsed (exclusive)
Returns:
converted integer value

toSimpleInt

public static int toSimpleInt(byte[] to)
Converts the specified token into a positive integer value. Integer.MIN_VALUE is returned if non-digits are found or if the input is longer than nine characters.

Parameters:
to - character array to be converted
Returns:
converted integer value

hash

public static int hash(byte[] tok)
Calculates a hash code for the specified token.

Parameters:
tok - specified token
Returns:
hash code

eq

public static boolean eq(byte[] tok,
                         byte[] tok2)
Compares two character arrays for equality.

Parameters:
tok - token to be compared
tok2 - second token to be compared
Returns:
true if the arrays are equal

eq

public static boolean eq(byte tok,
                         byte tok2)
Compares two character arrays for equality.

Parameters:
tok - token to be compared
tok2 - second token to be compared
Returns:
true if the arrays are equal

diff

public static int diff(byte[] tok,
                       byte[] tok2)
Calculates the difference of two character arrays.

Parameters:
tok - token to be compared
tok2 - second token to be compared
Returns:
0 if tokens are equal, negative if first token is smaller, positive if first token is bigger

diff

public static int diff(int tok,
                       int tok2)
Calculates the difference of two characters.

Parameters:
tok - token to be compared
tok2 - second token to be compared
Returns:
0 if tokens are equal, negative if first token is smaller, positive if first token is bigger

containslc

public static boolean containslc(byte[] tok,
                                 byte[] sub)
Checks if the first token contains the second token in lowercase.

Parameters:
tok - first token
sub - second token
Returns:
result of test

contains

public static boolean contains(byte[] tok,
                               byte[] sub)
Checks if the first token contains the second token.

Parameters:
tok - first token
sub - second token
Returns:
result of test

contains

public static boolean contains(byte[] tok,
                               int c)
Checks if the first token contains the specified character.

Parameters:
tok - first token
c - character
Returns:
result of test

indexOf

public static int indexOf(byte[] tok,
                          int c)
Returns the position of the specified character or -1.

Parameters:
tok - first token
c - character
Returns:
result of test

indexOf

public static int indexOf(byte[] tok,
                          byte[] sub)
Returns the position of the specified token or -1.

Parameters:
tok - first token
sub - second token
Returns:
result of test

indexOf

public static int indexOf(byte[] tok,
                          byte[] sub,
                          int p)
Returns the position of the specified token or -1.

Parameters:
tok - first token
sub - second token
p - start position
Returns:
result of test

startsWith

public static boolean startsWith(byte[] tok,
                                 int c)
Checks if the first token starts with the specified character.

Parameters:
tok - first token
c - character
Returns:
result of test

startsWith

public static boolean startsWith(byte[] tok,
                                 byte[] sub)
Checks if the first token starts with the second token.

Parameters:
tok - first token
sub - second token
Returns:
result of test

endsWith

public static boolean endsWith(byte[] tok,
                               int c)
Checks if the first token starts with the specified character.

Parameters:
tok - first token
c - character
Returns:
result of test

endsWith

public static boolean endsWith(byte[] tok,
                               byte[] sub)
Checks if the first token ends with the second token.

Parameters:
tok - first token
sub - second token
Returns:
result of test

substring

public static byte[] substring(byte[] tok,
                               int s)
Returns a subtoken of the specified token.

Parameters:
tok - token
s - start position
Returns:
subtoken

substring

public static byte[] substring(byte[] tok,
                               int s,
                               int e)
Returns a substring of the specified token.

Parameters:
tok - token
s - start position
e - end position
Returns:
substring

split

public static byte[][] split(byte[] tok,
                             int sep)
Splits the token at all whitespaces and returns a array with all tokens.

Parameters:
tok - token to be split
sep - separation character
Returns:
array

ws

public static boolean ws(byte[] tok)
Checks if the specified token has only whitespaces.

Parameters:
tok - token
Returns:
true if all characters are whitespaces

replace

public static byte[] replace(byte[] t,
                             int s,
                             int r)
Replaces the specified character and returns the result token.

Parameters:
t - token to be checked
s - the character to be replaced
r - the new character
Returns:
resulting token

trim

public static byte[] trim(byte[] t)
Removes leading and trailing whitespaces from the specified token.

Parameters:
t - token to be checked
Returns:
chopped array

concat

public static byte[] concat(byte[]... t)
Concatenates the specified tokens.

Parameters:
t - tokens
Returns:
resulting array

delete

public static byte[] delete(byte[] t,
                            int c)
Deletes the specified character out of the token.

Parameters:
t - token to be checked
c - character to be removed
Returns:
new instance

delete

public static byte[] delete(byte[] t,
                            byte[] c)
Deletes the specified characters out of the token.

Parameters:
t - token to be checked
c - characters to be removed
Returns:
new instance

norm

public static byte[] norm(byte[] tok)
Normalizes all whitespace occurrences from the specified token.

Parameters:
tok - token
Returns:
normalized token.

translate

public static byte[] translate(byte[] tok,
                               byte[] srch,
                               byte[] rep)
Performs a translation on the specified token.

Parameters:
tok - token
srch - characters to be found
rep - characters to be replaced
Returns:
translated token.

ws

public static boolean ws(int ch)
Checks if the specified character is a whitespace.

Parameters:
ch - the letter to be checked
Returns:
result of comparison

letter

public static boolean letter(int c)
Checks if the specified character is a letter. Note that this method does not support unicode characters.

Parameters:
c - the letter to be checked
Returns:
result of comparison

digit

public static boolean digit(int c)
Checks if the specified character is a digit.

Parameters:
c - the letter to be checked
Returns:
result of comparison

letterOrDigit

public static boolean letterOrDigit(int c)
Checks if the specified character is a letter or digit. Note that this method does not support unicode characters.

Parameters:
c - the letter to be checked
Returns:
result of comparison

uc

public static byte[] uc(byte[] t)
Converts the specified token to upper case.

Parameters:
t - token to be converted
Returns:
the converted token

uc

public static int uc(int ch)
Converts a character to upper case. Note that this method does not support unicode characters.

Parameters:
ch - character to be converted
Returns:
converted character

lc

public static byte[] lc(byte[] t)
Converts the specified token to lower case.

Parameters:
t - token to be converted
Returns:
the converted token

lc

public static int lc(int ch)
Converts a character to lower case. Note that this method does not support unicode characters.

Parameters:
ch - character to be converted
Returns:
converted character

dc

public static byte[] dc(byte[] t)
Removes diacritics from the specified token. Note that this method does only support the first 256 unicode characters.

Parameters:
t - token to be converted
Returns:
converted token

pre

public static byte[] pre(byte[] name)
Returns the prefix of the specified token.

Parameters:
name - name
Returns:
prefix or empty token if no prefix exists

ln

public static byte[] ln(byte[] name)
Returns the local name of the specified name.

Parameters:
name - name
Returns:
local name