org.deepfs.fsml
Class DeepFile

java.lang.Object
  extended by org.deepfs.fsml.DeepFile

public final class DeepFile
extends Object

Storage for metadata information and contents for a single file.

A DeepFile can represent a regular file in the file system or a "subfile" that is stored inside another file, e.g. a file in a ZIP-file or a picture that is included in an ID3 tag.

Author:
Workgroup DBIS, University of Konstanz 2005-10, ISC License, Bastian Lemke

Nested Class Summary
static class DeepFile.Content
          File content.
 
Constructor Summary
DeepFile(BufferedFileChannel f)
           Constructor.
DeepFile(File file)
           Constructor.
DeepFile(ParserRegistry parserRegistry, BufferedFileChannel bufferedFileChannel, Context ctx)
           Constructor.
DeepFile(String file)
           Constructor.
 
Method Summary
 void addMeta(MetaElem elem, byte[] value)
          Adds a metadata key-value pair for the current file.
 void addMeta(MetaElem elem, double value)
          Adds a metadata key-value pair for the current file.
 void addMeta(MetaElem elem, Duration value)
          Adds a metadata key-value pair for the current file.
 void addMeta(MetaElem elem, int value)
          Adds a metadata key-value pair for the current file.
 void addMeta(MetaElem elem, long value)
          Adds a metadata key-value pair for the current file.
 void addMeta(MetaElem elem, short value)
          Adds a metadata key-value pair for the current file.
 void addMeta(MetaElem elem, String value)
          Adds a metadata key-value pair for the current file.
 void addMeta(MetaElem elem, XMLGregorianCalendar xgc)
          Adds a metadata key-value pair for the current file.
 void addText(long position, int byteCount, String text)
           Adds a text section.
 void addXML(long position, int byteCount, Data data)
          Adds a xml document or fragment to the DeepFile.
 void addXML(long pos, int byteCount, String xml)
          Adds a xml document or fragment to the DeepFile.
 void debug(String str, Object... ext)
          Verbose debug message.
 void extract()
          Extracts metadata and text/xml contents from the associated file.
 boolean extractMeta()
          Returns true, if metadata should be extracted.
 boolean extractText()
          Returns true, if text contents should be extracted.
 boolean extractXML()
          Returns true, if xml contents should be extracted.
 void fallback()
          Calls the fallback parser for the associated file to extract text contents.
 void finish()
          Finishes the deep file.
 void finishMetaExtraction()
          Finishes the extraction of metadata and extracts the file system attributes.
 BufferedFileChannel getBufferedFileChannel()
          Returns the associated BufferedFileChannel that links this DeepFile with a file in the file system.
 DeepFile[] getContent()
          Returns all subfiles.
 Context getContext()
          Returns the database context.
 Atts getFSAtts()
          Returns the file system attributes for the deep file.
 TreeMap<MetaElem,ArrayList<String>> getMeta()
          Returns all metadata key-value pairs or null if metadata extraction is disabled.
 long getOffset()
          Returns the offset of the deep file inside the regular file in the file system.
 long getSize()
          Returns the size of the deep file.
 DeepFile.Content[] getTextContents()
          Returns all text sections or null if no text content extraction is disabled.
 String[] getValues(MetaElem elem)
          Returns the string values for the MetaElem.
 DeepFile.Content[] getXMLContents()
          Returns all xml sections or null if xml extraction is disabled.
 boolean isFileTypeSet()
          Returns true, if the file type is set for the current deep file.
 boolean isMetaSet(MetaElem elem)
          Returns true, if a value is set for the given metadata element.
 int maxTextSize()
          Returns the number of bytes that should be extracted from text and xml contents.
 DeepFile newContentSection(long position)
           Creates a new content section for the current file, beginning at the given position with an unknown size.
 DeepFile newContentSection(String title, long position, int contentSize)
           Creates a new content section for the current file, beginning at the given position with an unknown size.
 void setFileFormat(MimeType format)
          Sets the MIME type of the file.
 void setFileType(FileType type)
          Sets the type of the file (e.g. audio, video, ...).
 void setSize(long contentSize)
          Sets the size value for the DeepFile.
 DeepFile subfile(int contentSize)
          Clones the DeepFile to map only a part of the file.
 DeepFile subfile(String fileName, int fileSize, String... suffix)
           Creates a new "subfile" inside the current DeepFile with the given size, beginning at the current position of the file channel.
 String toString()
           
 String toXML()
          Returns the xml representation for this deep file.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

DeepFile

public DeepFile(String file)
         throws IOException

Constructor.

Creates a DeepFile object for a file that can be used to extract metadata and text/xml contents. By default, metadata and all contents will be extracted.

This constructor should only be used to parse a single file. Use DeepFile(ParserRegistry, BufferedFileChannel, Context) for parsing several files for better performance.

Parameters:
file - the name of the associated file in the file system
Throws:
IOException - if any I/O error occurs
See Also:
IFileParser.extract(DeepFile)

DeepFile

public DeepFile(File file)
         throws IOException

Constructor.

Creates a DeepFile object for a file that can be used to extract metadata and text/xml contents. By default, metadata and all contents will be extracted.

This constructor should only be used to parse a single file. Use DeepFile(ParserRegistry, BufferedFileChannel, Context) for parsing several files for better performance.

Parameters:
file - the associated file in the file system
Throws:
IOException - if any I/O error occurs
See Also:
IFileParser.extract(DeepFile)

DeepFile

public DeepFile(BufferedFileChannel f)
         throws IOException

Constructor.

Creates a DeepFile object for a buffered file channel that can be used to extract metadata and text/xml contents. By default, metadata and all contents will be extracted.

This constructor should only be used to parse a single file. Use DeepFile(ParserRegistry, BufferedFileChannel, Context) for parsing multiple files for better performance.

Parameters:
f - the BufferedFileChannel
Throws:
IOException - if any error occurs

DeepFile

public DeepFile(ParserRegistry parserRegistry,
                BufferedFileChannel bufferedFileChannel,
                Context ctx)
         throws IOException

Constructor.

Uses the given parser registry to retrieve the corresponding parser for the file which is represented by the buffered file channel. Depending on the properties of the context, metadata, text content and xml content is extracted.

The properties can be set as follows:
ctx.prop.set(Prop.FSMETA, true); // extract metadata
ctx.prop.set(Prop.FSCONT, true); // extract text content
ctx.prop.set(Prop.FSXML, true); // extract xml content
ctx.prop.set(Prop.FSTEXTMAX, 10240); // amount of text/xml content to extract (in bytes)

Parameters:
parserRegistry - a reference to the parser registry that can be used to parse file fragments
bufferedFileChannel - BufferedFileChannel to access the file
ctx - the database context
Throws:
IOException - if any error occurs
Method Detail

extract

public void extract()
             throws IOException
Extracts metadata and text/xml contents from the associated file.

Throws:
IOException - if any error occurs while reading from the file

fallback

public void fallback()
              throws IOException,
                     ParserException
Calls the fallback parser for the associated file to extract text contents.

Throws:
IOException - if any error occurs while reading from the file
ParserException - if the fallback parser could not be loaded

extractMeta

public boolean extractMeta()
Returns true, if metadata should be extracted.

Returns:
true, if metadata should be extracted

extractText

public boolean extractText()
Returns true, if text contents should be extracted.

Returns:
true, if text contents should be extracted

extractXML

public boolean extractXML()
Returns true, if xml contents should be extracted.

Returns:
true, if xml contents should be extracted

maxTextSize

public int maxTextSize()
Returns the number of bytes that should be extracted from text and xml contents.

Returns:
the maximum number of bytes to extract

getContext

public Context getContext()
Returns the database context.

Returns:
the database context

getOffset

public long getOffset()
Returns the offset of the deep file inside the regular file in the file system.

Returns:
the offset

getSize

public long getSize()
Returns the size of the deep file.

Returns:
the size

getFSAtts

public Atts getFSAtts()
Returns the file system attributes for the deep file. The file system attributes are extracted after finishing the metadata extraction.

Returns:
the file system attributes or null if the metadata extraction was not finished yet
See Also:
finishMetaExtraction()

getMeta

public TreeMap<MetaElem,ArrayList<String>> getMeta()
Returns all metadata key-value pairs or null if metadata extraction is disabled.

Returns:
the metadata

getContent

public DeepFile[] getContent()
Returns all subfiles.

Returns:
the subfiles

getTextContents

public DeepFile.Content[] getTextContents()
Returns all text sections or null if no text content extraction is disabled.

Returns:
all text sections

getXMLContents

public DeepFile.Content[] getXMLContents()
Returns all xml sections or null if xml extraction is disabled.

Returns:
all xml sections

getBufferedFileChannel

public BufferedFileChannel getBufferedFileChannel()
Returns the associated BufferedFileChannel that links this DeepFile with a file in the file system.

Returns:
the BufferedFileChannel

setFileType

public void setFileType(FileType type)
Sets the type of the file (e.g. audio, video, ...).

Parameters:
type - the file type

setFileFormat

public void setFileFormat(MimeType format)
Sets the MIME type of the file. A previously set value will be replaced.

Parameters:
format - the MIME type

addMeta

public void addMeta(MetaElem elem,
                    byte[] value)
Adds a metadata key-value pair for the current file.

Parameters:
elem - metadata element (the key). Must be a string attribute
value - string value as byte array

addMeta

public void addMeta(MetaElem elem,
                    String value)
Adds a metadata key-value pair for the current file.

Parameters:
elem - metadata element (the key)
value - string value

addMeta

public void addMeta(MetaElem elem,
                    short value)
Adds a metadata key-value pair for the current file.

Parameters:
elem - metadata element (the key)
value - integer value

addMeta

public void addMeta(MetaElem elem,
                    int value)
Adds a metadata key-value pair for the current file.

Parameters:
elem - metadata element (the key)
value - integer value

addMeta

public void addMeta(MetaElem elem,
                    long value)
Adds a metadata key-value pair for the current file.

Parameters:
elem - metadata element (the key)
value - long value

addMeta

public void addMeta(MetaElem elem,
                    double value)
Adds a metadata key-value pair for the current file.

Parameters:
elem - metadata element (the key)
value - double value

addMeta

public void addMeta(MetaElem elem,
                    Duration value)
Adds a metadata key-value pair for the current file.

Parameters:
elem - metadata element (the key)
value - duration value

addMeta

public void addMeta(MetaElem elem,
                    XMLGregorianCalendar xgc)
Adds a metadata key-value pair for the current file.

Parameters:
elem - metadata element (the key)
xgc - calendar value

finishMetaExtraction

public void finishMetaExtraction()
                          throws IOException
Finishes the extraction of metadata and extracts the file system attributes.

Throws:
IOException - if any error occurs while extracting the file system attributes

getValues

public String[] getValues(MetaElem elem)
Returns the string values for the MetaElem.

Parameters:
elem - the metadata element
Returns:
the metadata values as Strings

isMetaSet

public boolean isMetaSet(MetaElem elem)
Returns true, if a value is set for the given metadata element.

Parameters:
elem - the metadata element
Returns:
true, if a value is set

isFileTypeSet

public boolean isFileTypeSet()
Returns true, if the file type is set for the current deep file.

Returns:
true, if the file type is set

addText

public void addText(long position,
                    int byteCount,
                    String text)

Adds a text section. text MUST contain only valid UTF-8 characters! Otherwise the generated XML document may be not well-formed.

Parameters:
position - the absolute position of the first byte of the file fragment represented by this content element inside the current file. A negative value stands for an unknown offset
byteCount - the size of the content element
text - the text to add

addXML

public void addXML(long position,
                   int byteCount,
                   Data data)
            throws IOException
Adds a xml document or fragment to the DeepFile.

Parameters:
position - offset of the xml document/fragement inside the file
byteCount - number of bytes of the xml document/fragment
data - the xml document/fragment
Throws:
IOException - if any error occurs

addXML

public void addXML(long pos,
                   int byteCount,
                   String xml)
Adds a xml document or fragment to the DeepFile.

Parameters:
pos - offset of the xml document/fragement inside the file
byteCount - number of bytes of the xml document/fragment
xml - the xml document/fragment

finish

public void finish()
            throws IOException
Finishes the deep file.

Throws:
IOException - if any error occurs
See Also:
BufferedFileChannel.finish()

subfile

public DeepFile subfile(String fileName,
                        int fileSize,
                        String... suffix)
                 throws IOException

Creates a new "subfile" inside the current DeepFile with the given size, beginning at the current position of the file channel. This method is intended to be used if the content of the subfile has to be parsed with a different parser implementation than the "main" file. The name of the subfile is set as title metadata.

If the content can be parsed with the same parser and this parser can use the same BufferedFileChannel, then the method newContentSection(long) can be used instead.

Parameters:
fileName - the name of the subfile
fileSize - the size of the subfile
suffix - the file suffix(es). More than one suffix means that the file type is unknown. All given suffixes will be tested
Returns:
the subfile
Throws:
IOException - if any error occurs
See Also:
newContentSection(long)

subfile

public DeepFile subfile(int contentSize)
                 throws IOException
Clones the DeepFile to map only a part of the file. The returned DeepFile uses an underlying BufferedFileChannel that starts at the current byte position. The cloned DeepFile must be finished after usage.

Parameters:
contentSize - the size of the file fragment (the size of the BufferedFileChannel)
Returns:
the new DeepFile
Throws:
IOException - if any error occurs
See Also:
finish()

newContentSection

public DeepFile newContentSection(String title,
                                  long position,
                                  int contentSize)
                           throws IOException

Creates a new content section for the current file, beginning at the given position with an unknown size.

The returned DeepFile instance uses a subchannel to read from the file. The subchannel has to be finished after usage ( BufferedFileChannel.finish()).

Parameters:
title - the title of the content section
position - the offset in the regular file where the section starts
contentSize - the size of the content section
Returns:
the DeepFile instance representing the content section
Throws:
IOException - if any error occurs
See Also:
subfile(String, int, String...), BufferedFileChannel.subChannel(String, int)

newContentSection

public DeepFile newContentSection(long position)

Creates a new content section for the current file, beginning at the given position with an unknown size. The size must be set afterwards with setSize(long).

The returned DeepFile instance uses the same underlying BufferedFileChannel as the current DeepFile.

Parameters:
position - the offset in the regular file where the section starts
Returns:
the DeepFile instance representing the content section
See Also:
subfile(String, int, String...), setSize(long)

setSize

public void setSize(long contentSize)
Sets the size value for the DeepFile. If the current DeepFile instance is not a content section, or if the size value was set before, this method does nothing.

Parameters:
contentSize - the size value to set for the content section
See Also:
newContentSection(long)

toXML

public String toXML()
             throws IOException
Returns the xml representation for this deep file.

Returns:
the xml representation as string
Throws:
IOException - if any error occurs

toString

public String toString()
Overrides:
toString in class Object

debug

public void debug(String str,
                  Object... ext)
Verbose debug message.

Parameters:
str - debug string
ext - text optional extensions