Answer the following queries using the XQuery Update Facility. All queries are based on the addressbook.xml document, which you find in the repository. The results should be persistent. For each query the original document (as it is stored in the repository) may be used, however.
Example:
delete node fn:doc("addressbook")//address
(: after :) <address state="sync"> <myID>address0</myID> <name>Johannes Schmid</name> ... </address>
<org> <address id="address0" state="sync"> <name>Johannes Schmid</name> <street>Badstrasse 13</street> <code>80327</code> <city>80327 Munich</city> <country>Germany</country> <twitternames> <twitter>jo</twitter> <twitter>mo</twitter> <twitter>momo</twitter> </twitternames> </address> </org> <cpy> <address id="address0" state="sync"> <name>Johannes Schmid</name> <country>Germany</country> </address> </cpy> ...
Take a look at the lecture slides about document ranking and the TF-IDF measure. Consider the following three tweets as our input documents with di in D:
For the query Q=(concert, berlin, live) and t in Q:
==================== TASK 1 XQuery Update ==================== - Error messages important! -------------------- 1.1 -------------------- for $idattr in doc("addressbook")//address/@id return ( delete node $idattr, insert node element myID {string($idattr)} as first into $idattr/.. ) -------------------- 1.2 -------------------- declare updating function local:insert-twitter($pid as xs:string, $tw as xs:string) { for $a in doc("addressbook")//address[@id = $pid] return if(empty($a/twitternames)) then insert node element twitternames { element twitter {$tw}} into $a else if ($a/twitternames/twitter/text() = $tw) then () else insert node <twitter> { $tw } </twitter> as last into $a/twitternames }; local:insert-twitter("address3", "jo2") -------------------- 1.3 -------------------- let $new := for $e in doc("addressbook")//address[@state = "sync"] return copy $je := $e modify delete node $je/street union $je/code union $je/city union $je/twitternames return ( element org { $e }, element cpy { $je }) return put(document { $new }, "/synced.xml") ======================= TASK 2 Precision/Recall ======================= ----------------------- 2.1 ----------------------- Precision = |{relevant documents} intersection {retrieved documents}| / |{retrieved documents}| Recall = |{relevant documents} intersection {retrieved documents}| / |{relevant documents}| For the given example: Precision = 8/(8+2) = 8/10 = 0.8 Recall = 8/(8+5) = 8/13 = 0.61 ----------------------- 2.2 ----------------------- Precision 1.0, recall approaches 0.0 falseNeg=unlimited, truePos=1, falsePos=0: pure but incomplete result Recall 1.0, precision approaches 0.0 falseNeg=0, truePos=1, falsePos=unlimited: result resembles complete db content ================================ TASK 3 Ranking Documents: TF/IDF ================================ Term Frequency (Tf) = f(t,d) Normalized Term Frequency (NTf) = f(t,d)/max{f(w,d):w->d} Inverse Document Frequency (IDf)= log(|N|/{1+|{d->D:t->d}|}) -------------------------------- 3.1 -------------------------------- For 'concert': f(concert,d1) = 0 => NTf = 0 f(concert,d2) = 1 => NTf = 1/2 (max{f(w,d2):w->d2} = 2 for 'soo' or 'is') f(concert,d3) = 0 => NTf = 0 For 'berlin': f(berlin,d1) = 2 => NTf = 2/2 = 1 (max{f(w,d1):w->d1} = 2 for 'berlin') f(berlin,d2) = 0 => NTf = 0 f(berlin,d3) = 1 => NTf = 1/2 (max{f(w,d3):w->d3} = 2 for 'of') For 'live': f(live,d1) = 0 => NTf = 0 f(live,d2) = 1 => NTf = 1/2 (max{f(w,d2):w->d2} = 2 for 'berlin') f(live,d3) = 1 => NTf = 1/2 (max{f(w,d3):w->d3} = 2 for 'of') -------------------------------- 3.2 -------------------------------- IDf(concert,D) = log(3/1+1) = log(3/2) IDf(berlin,D) = log(3/1+2) = 0 IDf(live,D) = log(3/1+2) = 0 -------------------------------- 3.3 -------------------------------- score(Q,d1) = Tf(concert,d1).IDf(concert,D) + Tf(berlin,d1).IDf(berlin,D) + Tf(live,d1).IDf(live,D) = 0.log(3/2) + 2.0 + 0.0 = 0 score(Q,d2) = Tf(concert,d2).IDf(concert,D) + Tf(berlin,d2).IDf(berlin,D) + Tf(live,d2).IDf(live,D) = 1.log(3/2) + 0.0 + 1.0 = log(3/2) score(Q,d3) = Tf(concert,d3).IDf(concert,D) + Tf(berlin,d3).IDf(berlin,D) + Tf(live,d3).IDf(live,D) = 0.log(3/2) + 1.0 + 1.0 = 0