Abstract ........................................................ 1
1 Introduction ................................................. 3
1.1 Basic XML retrieval concepts ............................ 4
1.2 INEX .................................................... 8
1.3 Challenges in XML retrieval ............................ 13
1.3.1 How can information retrieval and database
techniques be combined for effective XML
retrieval? ...................................... 13
1.3.2 What does user and assessor experience suggest
about how relevance should be defined in XML
retrieval? ...................................... 14
1.3.3 How should the effectiveness of XML retrieval
be evaluated? ................................... 15
1.3.4 How effective is XML retrieval in different
application scenarios? .......................... 17
1.4 Book structure ......................................... 13
2 XML Information Retrieval ................................... 23
2.1 XML retrieval approaches ............................... 24
2.1.1 Query languages ................................. 24
2.1.2 Pull-text information retrieval approaches ...... 25
2.1.3 Native XML database approaches .................. 29
2.1.4 Scoring approaches .............................. 30
2.2 Relevance in information retrieval ..................... 38
2.2.1 Definitions and dimensions ...................... 38
2.2.2 INEX relevance .................................. 39
2.3 Evaluation approaches .................................. 44
2.3.1 Assumptions ..................................... 44
2.3.2 Metrics and measures ............................ 45
2.3.3 Significance, fidelity, and reliability ......... 64
2.4 Methodology of XML element retrieval ................... 66
2.5 Summary ................................................ 70
3 Hybrid XML Retrieval ........................................ 73
3.1 Technological aspects .................................. 74
3.1.1 A full-text information retrieval approach ...... 74
3.1.2 A native XML database approach .................. 76
3.1.3 A hybrid approach to XML retrieval .............. 77
3.2 Retrieval modelling aspects ............................ 78
3.2.1 Identifying the appropriate answer
granularity ..................................... 78
3.2.2 Ranking the final answers ....................... 80
3.2.3 Tuning the retrieval parameters ................. 82
3.3 Experiments on INEX 2003 and 2004 test collections ..... 87
3.3.1 Evaluation methodology .......................... 87
3.3.2 INEX 2003 experiments ........................... 94
3.3.3 INEX 2004 experiments .......................... 105
3.4 Summary ............................................... 113
4 Relevance in XML Retrieval ................................. 115
4.1 Analysis of INEX 2004 relevance ....................... 116
4.1.1 INEX 2004 relevance dimensions ................. 116
4.1.2 Methodology .................................... 116
4.1.3 Assessor behaviour analysis for INEX 2004 CO
topics ......................................... 120
4.1.4 User behaviour analysis for INEX 2004
Interactive topics ............................. 123
4.1.5 Analysis of the level of agreement ............. 125
4.1.6 Concluding remarks on INEX 2004 relevance ...... 130
4.2 Analysis of INEX 2005 relevance ....................... 132
4.2.1 INEX 2005 relevance dimensions ................. 132
4.2.2 Assessor behaviour analysis for INEX 2005
topics ......................................... 132
4.2.3 Analysis of the level of agreement ............. 135
4.2.4 Concluding remarks on INEX 2005 relevance ...... 139
4.3 A topical-hierarchical relevance definition ........... 140
4.3.1 Relevance dimensions ........................... 140
4.3.2 Relevance scale ................................ 142
4.3.3 User satisfaction .............................. 142
4.4 Experiments with the new relevance definition ......... 144
4.4.1 Comparison to the INEX 2004 relevance
definition ..................................... 144
4.4.2 Comparison to the INEX 2005 relevance
definition ..................................... 147
4.5 Summary ............................................... 153
5 Evaluation of XML Retrieval ................................ 157
5.1 A taxonomy of retrieval tasks ......................... 158
5.1.1 Retrieval answers .............................. 158
5.1.2 Task assumptions ............................... 160
5.2 HiXEval: Highlighting XML retrieval evaluation ........ 160
5.2.1 Evaluation assumptions ......................... 161
5.2.2 Measures for linear result presentation ........ 162
5.2.3 Measures for group result presentation ......... 165
5.3 Fidelity tests ........................................ 170
5.3.1 Linear result presentation ..................... 170
5.3.2 Discussion and concluding comments ............. 183
5.3.3 Group result presentation ...................... 188
5.3.4 Discussion and concluding comments ............. 195
5.4 HiXEval versus XCG in XML retrieval experiments ....... 197
5.4.1 Comparison of run orderings .................... 197
5.4.2 Reliability tests .............................. 202
5.5 Summary ............................................... 207
6 Scenarios of XML Retrieval ................................. 209
6.1 Ad-hoc retrieval scenario ............................. 210
6.1.1 XML retrieval approach ......................... 210
6.1.2 INEX 2005 CO and +S sub-tasks .................. 216
6.1.3 INEX 2005 CAS sub-task ......................... 221
6.2 Multimedia retrieval scenario ......................... 223
6.2.1 XML retrieval approach ......................... 226
6.2.2 INEX 2005 MM task .............................. 230
6.3 Summary ............................................... 234
7 Conclusions and Future Work ................................ 237
7.1 Hybrid approach for effective XML retrieval ........... 237
7.2 New relevance definition for XML retrieval ............ 239
7.3 Highlighting XML retrieval evaluation ................. 241
7.4 Application scenarios of XML retrieval ................ 243
7.5 Conclusion summary .................................... 245
A A similarity framework for XML retrieval ................... 247
A.l Motivation ............................................ 247
A.2 TF/IDF ranking model .................................. 248
A.3 Similarity framework .................................. 252
A.4 Modelling scoring approaches .......................... 263
B Measuring overlap .......................................... 265
B.l Level of overlap for background topic Bl .............. 268
B.2 Level of overlap for comparison topic C2 .............. 269
С INEX 2005 MM track experiments ............................. 271
C.l TRECeval analysis ..................................... 272
C.2 HiXEval analysis ...................................... 272
C.3 Comparison of run orderings ........................... 276
Bibliography .................................................. 279
|