3. Assignment - XML Technologies - Winter Term 2015 (Release date: Nov 5 - Date due: Nov 11, 8:00 am)
1. Task - Unicode
UNICODE encodes characters of most languages + symbols etc.
UTF (Unicode Transformation Format)
describes how UNICODE character is mapped to 8,16,32 bits
000000-00FFFF: BMP, basic multi-lingual plane, most frequently used characters
1.1 bit string -> Unicode code points
UTF8
000000-00007F: xxxxxxxx 8bits
000080-0007FF: 110xxxxx 10xxxxxx 11bits
000800-00FFFF: 1110xxxx 10xxxxxx 10xxxxxx 16bits
001000-10FFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 21bits
01010011 -> 53
01101101 -> 6D
11000011 10110110 -> F6
01110010 -> 72
01100111 -> 67
11000011 10100101 -> E5
01110011 -> 73
01100010 -> 62
01101111 -> 6F
01110010 -> 72
01100100 -> 64
00100000 -> 20
11110000 10011101 10000100 10011110 -> 1D11E
53 6D F6 72 67 E5 73 62 6F 72 64 20 1D11E
1.2 code points -> UTF16
UTF16BE
000000-00FFFF: xxxxxxxxxxxxxxxx (basic multilingual plane, 16bits)
010000-10FFFF: 110110xxxxxxxxxx 110111xxxxxxxxxx (subtract 0x010000 from code point first) (supplementary planes) (20bits)
53 -> 00000000 01010011
6D -> 00000000 01101101
F6 -> 00000000 01100110
72 -> 00000000 01110010
67 -> 00000000 01100111
E5 -> 00000000 11100101
73 -> 00000000 01110011
62 -> 00000000 01100010
6F -> 00000000 01101111
72 -> 00000000 01110010
64 -> 00000000 01100100
20 -> 00000000 00100000
1D11E -> 11011000 00110100 11011101 00011110 (subtract 0x010000 first!) / clef
Big Endian / Little Endian, BOM (byte order mark)
0x1A2B BE: 00011010 00101011
0x1A2B LE: 00101011 00011010
1.3 encoded -> string
Smörgåsbord 턞
2. Task - XPath Rewritings
2.1. ./child::jamie/child::jerry/child::jeremie[text() = "joe"]/../..
./jamie[jerry/jeremie[text() = "joe"]]
2.2. /descendant-or-self::node()/child::james
/descendant::james
2.3. ./child::node()/parent::node()
.[child::node()]
2.4. ./child::jim/preceding-sibling::jack
./child::jack[following-sibling::jim]
2.5. ./descendant::jason/preceding::jasper - why not possible to rewrite?
simple answer:
if only fwd axes are allowed and jasper is result item there's no way
additional:
need two tests
1) jason following jasper
2) same jason descendant of given context
... partly possible by using for loop but can't test identity of jason nodes (only name test)
3. Task - XQuery Semantics
3.3 100 + 1{max((1,2))}
112
- arithmetic operation '+' expects numbers (or date/duration)
- nodes reduced to atomic values 100, 12
3.2 (1, 2) > (3, 2, 1)
true
- compares all items of both sequences to each other, if one pair yields true -> true
3.3 () != 1
false
- no element in the seq1 is not equal 1 -> false (because seq1 is empty)
3.4 /descendant-or-self::a
all a nodes ... 3 results
3.5 (0,2,1,2)[.]
2
- here: positional predicate (0,2,1,2)[position() = .]
3.6 "foo"["false"]
foo
- "foo" is the result, predicate evaluates to true() as "false" is a non-empty string
3.7 eq
true
"" eq ""
3.8 ('a','bc') ! string-length(.) ! (. = 1)
true false
map operator ...