11. Assignment - XML Technologies - Winter Term 2015 (Release date: Jan 21 - Date due: Jan 27, 8:00 am) 1. Task - Value Index Bit Compression: compress values bitwise Compressed Distances: 1. order values 2. calculate distances between values 3. compress distances bitwise 499 Bit Compression: 0100 0100 1001 1001, 0x4499 Compressed Distance: • 499-0=0x499 • Compressed: 0100 0100 1001 1001, 0x4499 3FF7AB Bit Compression: 1000 0000 0011 1111 1111 0111 1010 1011, 0x803FF7AB Compressed Distance: • 3FF7AB - 499 = 0x3FF312 • Compressed: 1000 0000 0011 1111 1111 0011 0001 0010, 0x803FF312 35ABC283 Bit Compression: 1011 0101 1010 1011 1100 0010 1000 0011, 0xB5ABC283 Compressed Distance: • 35ABC283 - 3FF7AB = 0x356BCAD8 • Compressed: 1011 0101 0110 1011 1100 1010 1101 1000, 0xB56BCAD8 77A9E3AC Bit Compression: 1100 0000 0111 0111 1010 1001 1110 0011 1010 1100, 0xC077A9E3AC Compressed Distance: • 77A9E3AC - 35ABC283 = 0x41FE2129 • Compressed: 1100 0000 0100 0001 1111 1110 0010 0001 0010 1001, 0xC041FE2129 77A9E3CC Bit Compression: 1100 0000 0111 0111 1010 1001 1110 0011 1100 1100, 0xC077A9E3CC Compressed Distance: • 77A9E3CC - 77A9E3AC = 0x20 • Compressed: 0010 0000, 0x20 Memory Saves naive: 5 * 4 bytes = 20 bytes bit-compressed: 2+4+4+5+5=20bytes compresseddistances: 2+4+4+5+1=16bytes Why does Bit Compression save us memory? • FF FF FF FF -> 4 294 967 295, 4.2 billion –> too large for integer • if even larger, store it as string -> UTF8 encoded! 8bit/character 2. Task - Index Application 2.1. Which index structures can be utilized? 2.1.1. //city [x] Name Index [-] Path Summary (find out if there are city elements and where they are. information can be used to optimize query. for instance rewrite to /region/city ) [ ] Value Index [ ] Full-Text Index 2.1.2 //city[@name = "Munich"] [x] Name [x] Path [xx] Value [ ] Full-Text Index 2.1.2. //chapter/title[. contains text "foo"] [ ] Name Index [x] Path Summary [ ] Value Index [x] Full-Text Index 2.1.3. /book/chapter[1]/section/title [ ] Name Index [x] Path Summary [ ] Value Index [ ] Full-Text Index 2.1.4. count(//chapter) [ ] Name Index [x] Path Summary (if metadata is attached, e.g., references to nodes that could be counted) [ ] Value Index [ ] Full-Text Index -> database statistics could be leveraged 2.2. Which indexes get outdated by the following updates? 2.2.1. replace node //chapter[@author = "ben"]/@modification-date with current-dateTime() [ ] Name Index [ ] Path Summary [x] Value Index [ ] Full-Text Index 2.2.2. rename node /book as "article" [x] Name Index [x] Path Summary [ ] Value Index [ ] Full-Text Index 2.2.3. delete node //paragraph[. contains text "bar"] [x] Name Index [x] Path Summary [x] Value Index [x] Full-Text Index