|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.basex.util.Tokenizer
public final class Tokenizer
Full-text tokenizer.
Nested Class Summary | |
---|---|
static class |
Tokenizer.FTUnit
Units. |
Field Summary | |
---|---|
boolean |
cs
Sensitivity flag. |
boolean |
dc
Diacritics flag. |
boolean |
fast
Fast evaluation flag. |
boolean |
fz
Fuzzy flag. |
boolean |
lc
Lowercase flag. |
int |
p
Current character position. |
int |
para
Current paragraph. |
int |
pm
Last punctuation mark. |
int |
pos
Current token. |
StemDir |
sd
Stemming dictionary. |
int |
sent
Current sentence. |
boolean |
st
Stemming flag. |
byte[] |
text
Text. |
boolean |
uc
Uppercase flag. |
boolean |
wc
Wildcard flag. |
Constructor Summary | |
---|---|
Tokenizer(byte[] txt,
FTOpt fto,
boolean f,
Prop pr)
Constructor. |
|
Tokenizer(byte[] txt,
Prop pr)
Constructor. |
|
Tokenizer(Prop pr)
Empty constructor. |
Method Summary | |
---|---|
int |
count()
Counts the number of tokens. |
byte[] |
get()
Returns the current token. |
byte[] |
get(byte[] tok)
Returns a normalized version of the specified token. |
static int[][] |
getInfo(byte[] t)
Gets full-text info for the specified token; needed for visualizations. |
void |
init()
Initializes the iterator. |
void |
init(byte[] txt)
Sets the text. |
boolean |
more()
Checks if more tokens are to be returned. |
byte[] |
orig()
Returns the original token. |
int |
pos(int w,
Tokenizer.FTUnit u)
Calculates a position value, dependent on the specified unit. |
String |
toString()
|
Data.Type |
type()
Returns the index type. |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
public StemDir sd
public boolean st
public boolean dc
public boolean cs
public boolean uc
public boolean lc
public boolean wc
public boolean fz
public boolean fast
public int sent
public int para
public int pos
public byte[] text
public int p
public int pm
Constructor Detail |
---|
public Tokenizer(Prop pr)
pr
- (optional) database propertiespublic Tokenizer(byte[] txt, Prop pr)
pr
- (optional) database propertiestxt
- textpublic Tokenizer(byte[] txt, FTOpt fto, boolean f, Prop pr)
txt
- textfto
- full-text optionsf
- fast evaluationpr
- database propertiesMethod Detail |
---|
public Data.Type type()
IndexToken
type
in interface IndexToken
public void init(byte[] txt)
txt
- textpublic void init()
public boolean more()
public byte[] get()
IndexToken
get
in interface IndexToken
public byte[] get(byte[] tok)
tok
- input token
public byte[] orig()
public int count()
public int pos(int w, Tokenizer.FTUnit u)
w
- word positionu
- unit
public static int[][] getInfo(byte[] t)
t
- text to be parsed
public String toString()
toString
in class Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |