3. Assignment - XML Technologies - Winter Term 2015 (Release date: Nov 5 - Date due: Nov 11, 8:00 am) 1. Task - Unicode UNICODE encodes characters of most languages + symbols etc. UTF (Unicode Transformation Format) describes how UNICODE character is mapped to 8,16,32 bits 000000-00FFFF: BMP, basic multi-lingual plane, most frequently used characters 1.1 bit string -> Unicode code points UTF8 000000-00007F: xxxxxxxx 8bits 000080-0007FF: 110xxxxx 10xxxxxx 11bits 000800-00FFFF: 1110xxxx 10xxxxxx 10xxxxxx 16bits 001000-10FFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 21bits 01010011 -> 53 01101101 -> 6D 11000011 10110110 -> F6 01110010 -> 72 01100111 -> 67 11000011 10100101 -> E5 01110011 -> 73 01100010 -> 62 01101111 -> 6F 01110010 -> 72 01100100 -> 64 00100000 -> 20 11110000 10011101 10000100 10011110 -> 1D11E 53 6D F6 72 67 E5 73 62 6F 72 64 20 1D11E 1.2 code points -> UTF16 UTF16BE 000000-00FFFF: xxxxxxxxxxxxxxxx (basic multilingual plane, 16bits) 010000-10FFFF: 110110xxxxxxxxxx 110111xxxxxxxxxx (subtract 0x010000 from code point first) (supplementary planes) (20bits) 53 -> 00000000 01010011 6D -> 00000000 01101101 F6 -> 00000000 01100110 72 -> 00000000 01110010 67 -> 00000000 01100111 E5 -> 00000000 11100101 73 -> 00000000 01110011 62 -> 00000000 01100010 6F -> 00000000 01101111 72 -> 00000000 01110010 64 -> 00000000 01100100 20 -> 00000000 00100000 1D11E -> 11011000 00110100 11011101 00011110 (subtract 0x010000 first!) / clef Big Endian / Little Endian, BOM (byte order mark) 0x1A2B BE: 00011010 00101011 0x1A2B LE: 00101011 00011010 1.3 encoded -> string Smörgåsbord 턞 2. Task - XPath Rewritings 2.1. ./child::jamie/child::jerry/child::jeremie[text() = "joe"]/../.. ./jamie[jerry/jeremie[text() = "joe"]] 2.2. /descendant-or-self::node()/child::james /descendant::james 2.3. ./child::node()/parent::node() .[child::node()] 2.4. ./child::jim/preceding-sibling::jack ./child::jack[following-sibling::jim] 2.5. ./descendant::jason/preceding::jasper - why not possible to rewrite? simple answer: if only fwd axes are allowed and jasper is result item there's no way additional: need two tests 1) jason following jasper 2) same jason descendant of given context ... partly possible by using for loop but can't test identity of jason nodes (only name test) 3. Task - XQuery Semantics 3.3 100 + 1{max((1,2))} 112 - arithmetic operation '+' expects numbers (or date/duration) - nodes reduced to atomic values 100, 12 3.2 (1, 2) > (3, 2, 1) true - compares all items of both sequences to each other, if one pair yields true -> true 3.3 () != 1 false - no element in the seq1 is not equal 1 -> false (because seq1 is empty) 3.4 /descendant-or-self::a all a nodes ... 3 results 3.5 (0,2,1,2)[.] 2 - here: positional predicate (0,2,1,2)[position() = .] 3.6 "foo"["false"] foo - "foo" is the result, predicate evaluates to true() as "false" is a non-empty string 3.7 eq true "" eq "" 3.8 ('a','bc') ! string-length(.) ! (. = 1) true false map operator ...