9. Assignment - XML Technologies - Winter Term 2015 (Release date: Jan 07 - Date due: Jan 13, 8:00 am)

We will build a basic XML processing infrastructure with Java.
We start with scanning and parsing XML documents to an internal storage representation (shredding). Given our internal representation (our XML database) we reconstruct the well-formed XML file again (serialization).

  • First, download the project files: xmldb.zip.
  • Make yourself familiar with the different classes by taking a look at the code and comments.
  • The TODO tags show you where you have to append code to solve the individual exercises.
  • input.xml is the default input file (see Constants.INPUT) that you should use for testing - ultimately your finished implementation should be able to handle the factbook.xml document.
1. Task - Parsing XML

The lecture introduced you to the concept of a separate scanning and parsing process to simplify the interpretation of computer languages. While the project is already equipped with a fully functional Scanner, the implementation of the Parser is not yet complete (see TODO: Exercise 1 comments in Parser.java). It should support the following (simplified) LL1 grammar:

document       : element
element        : element1  element2
element1       : L_BR  NAME  attributeList1
element2       : R_BR  content  endTag | CLOSE_R_BR
attributeList1 : ε | SPACE  attributeList2
attributeList2 : ε | attribute  attributeList1
attribute      : NAME  space  EQ  space  QUOTE  ATT_VAL  QUOTE
endTag         : L_BR_CLOSE  NAME  space  R_BR
content        : ε | TEXT  content | element  content
optionalSpace  : ε | SPACE
      
  • All names written in CAPITALS are the tokens (so-called terminal symbols) which are returned by the Scanner.
  • The tokens are processed in the consume(…) function, and the non-terminal symbols have to be implemented via own functions.
  • Don’t forget to include some error handling; the method optionalSpace() shows you how to react on these events.
  • The Parser class has a main method which you can use to test your code. Parser_input.log shows the expected result for the default input document (set VERBOSE = true). Modify your xml test files and add errors to see if your parser reacts on it!
2. Task - Creating an XML Database

In this exercise you will add some basic functionality to our main memory XML database. Here is a quick review of some important classes:

  • XMLDB is the main database class, storing all XML nodes in a flat table.
  • A Node represents a single table entry with the fields content, kind, atts and parent. The pre value is implicitly given by the table position and is equivalent to the position of the node if we traverse the document in document order (or pre-order). The parent value equals the pre value of the parent node.
  • The ArrayIndex stores all textual content in an index structure. Index provides some methods for alternative index implementations.
  • The ParserEvents interface is implemented by XMLDB and defines the XML callback functions.

To solve exercise 2 locate all TODO: Exercise 2 tasks and add your code:

  • Interpret the tokens in the Parser and call the ParserEvents methods for all opening and closing elements and text nodes.
  • Complete the ParserEvents methods in XMLDB and build the table structure. Check if opening and closing tags match and if all attribute names are unique. XMLDB_input.log shows the expected result if you set VERBOSE = true.
  • You will find additional hints as comments directly in the code.
  • The main method in XMLDB dumps the XML table representation to disk to simplify debugging. In the file XMLDB_input.xml.table you can find the expected table content for the input.xml document.
3. Task - Printing XML

The writeDoc() method in XMLDB is used to output the table’s document nodes in the well-known XML representation. Implement this method and

  • Make sure that the output will be well-formed
  • Add some indentation to subordinate nodes
  • Print empty tags in their compact representation (e.g., <br/> instead of <br></br>)

Some more comments on how to accomplish this are inside the source templates.

Discussion of 9. Assignment - XML Technologies - Winter Term 2015