MWAND: A New Early Termination Algorithm for Fast and Efficient Query Evaluation

Authors

DOI:

https://doi.org/10.9781/ijimai.2019.04.002

Keywords:

Evaluation, Information Retrieval, Inverted List, WAND, Query Processing, Top-k

Abstract

Nowadays, current information systems are so large and maintain huge amount of data. At every time, they process millions of documents and millions of queries. In order to choose the most important responses from this amount of data, it is well to apply what is so called early termination algorithms. These ones attempt to extract the Top-K documents according to a specified increasing monotone function. The principal idea behind is to reach and score the most significant less number of documents. So, they avoid fully processing the whole documents. WAND algorithm is at the state of the art in this area. Despite it is efficient, it is missing effectiveness and precision. In this paper, we propose two contributions, the principal proposal is a new early termination algorithm based on WAND approach, we call it MWAND (Modified WAND). This one is faster and more precise than the first. It has the ability to avoid unnecessary WAND steps. In this work, we integrate a tree structure as an index into WAND and we add new levels in query processing. In the second contribution, we define new fine metrics to ameliorate the evaluation of the retrieved information. The experimental results on real datasets show that MWAND is more efficient than the WAND approach.

Downloads

Download data is not yet available.

References

J. Zobel and A. Moffat, “Inverted Files for Text Search Engines,” ACM Computing Surveys, vol. 38, no. 2, article 6, July 2006.

J. Zobel, A. Moffat and R. S. Davis, “An Efficient Indexing Technique for Full-Text Database Systems,” Proceedings of the 18th VLDB Conference, Vancouver, British Columbia, Canada, 1992.

C. Dimopoulos, S. Nepomnyachiy and T. Suel, “Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes,” WSDM’13, Rome, Italy, February 4–8, 2012.

M. Persin, “Document filtering for fast ranking,” in W. Croft and C. Van Rijsbergen (eds.), Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, Dublin, Ireland, 1994, pp. 339–348.

M. Persin, J. Zobel and R. Sacks-Davis, “Filtered document retrieval with frequency-sorted indexes,” Journal of the American Society for Information Science, vol. 47, no. 10, pp. 749–764, 1996.

V. Anh and A. Moffat, “Index Compression using Fixed Binary Codewords,” Proceedings of the Fifteenth Australasian Database Conference (ADC 2004), Dunedin, New Zealand, 2004.

I. H. Witten, A. Moffat and T. C. Bell, Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann Publishers, Los Altos, CA, USA, 2nd edition, 1999.

S. Ding and T. Suel, “Faster Top-k Document Retrieval Using Block-Max Indexes,” SIGIR’11, 2011.

H. Turtle and J. Flood, “Query evaluation: Strategies and optimizations,” Information Processing and Management, vol. 31, no. 6, pp. 831–850, 1995.

V. Anh and A. Moffat, “Pruned query evaluation using pre-computed impacts,” Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006, pp. 372–379.

T. Strohman and W. Bruce Croft, “Efficient Document Retrieval in Main Memory,” SIGIR’07, Amsterdam, The Netherlands, July 23–27, 2007.

H. Wu and H. Fang, “An Incremental Approach to Efficient Pseudo-Relevance Feedback,” SIGIR’13, Dublin, Ireland, July 28–August 1, 2013.

C. Buckley and A. F. Lewit, “Optimization of Inverted Searches,” Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’85), 1985, pp. 97–110.

R. Fagin, A. Lotem and M. Naor, “Optimal aggregation algorithms for middleware,” Symposium on Principles of Database Systems (PODS), 2001.

O. Rojas, V. Gil-Costa and M. Marin, “Distributing efficiently the Block-Max WAND algorithm,” Proceedings of the Conference on Computational Science, 2005.

A. Moffat and J. Zobel, “Self-indexing inverted files for fast text retrieval,” ACM Transactions on Information Systems, vol. 14, no. 4, pp. 349–379, October 1996.

R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 1st edition, 1999.

C. D. Manning, P. Raghavan and H. Schütze, An Introduction to Information Retrieval, Cambridge University Press, Cambridge, 2009.

O. Rojas, V. Gil-Costa and M. Marin, “Efficient Parallel Block-Max WAND Algorithm,” in Euro-Par 2013 Parallel Processing: 19th International Conference, Aachen, Germany, 2013, pp. 26–30. Lecture Notes in Computer Science, vol. 8097, Springer, 2013.

A. Z. Broder, M. Herscovici, A. Soffer and J. Zien, “Efficient query evaluation using a two-level retrieval process,” CIKM’03, 2003.

S. E. Robertson, S. Walker and M. Beaulieu, “Okapi at TREC–7: automatic ad hoc, filtering, VLC and filtering tracks,” in Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST Special Publication 500-242, July 1999, pp. 253–264.

V. Anh and A. Moffat, “Simplified Similarity Scoring Using Term Ranks,” SIGIR’05, Salvador, Brazil, August 15–19, 2005.

F. Scholer, H. Williams, J. Yiannis and J. Zobel, “Compression of inverted indexes for fast query evaluation,” Proceedings of the 25th Annual SIGIR, Tampere, Finland, 2002, pp. 222–229.

P. Elias, “Universal codeword sets and representations of the integers,” IEEE Transactions on Information Theory, vol. 21, no. 2, pp. 194–203, 1975.

S. W. Golomb, “Run-length encoding,” IEEE Transactions on Information Theory, vol. 12, no. 3, pp. 399–401, 1966.

H. E. Williams and J. Zobel, “Compressing integers for fast file access,” The Computer Journal, vol. 42, no. 3, pp. 193–201, 1999.

V. N. Anh and A. Moffat, “Inverted index compression using word-aligned binary codes,” Information Retrieval, vol. 8, no. 1, pp. 151–166, 2005.

A. Moffat and L. Stuiver, “Binary interpolative coding for effective index compression,” Information Retrieval, vol. 3, no. 1, pp. 25–47, 2000.

M. Zukowski, S. Heman, N. Nes and P. A. Boncz, “Super-scalar ram-cpu cache compression,” ICDE, 2006, p. 59.

H. Yan, S. Ding and T. Suel, “Inverted index compression and query processing with optimized document ordering,” WWW, 2009, pp. 401–410.

H. Yan, S. Ding and T. Suel, “Compressing term positions in web indexes,” SIGIR, 2009, pp. 147–154.

A. Trotman, “Compressing Inverted Files,” Information Retrieval, vol. 6, pp. 5–19, 2003.

R. Fagin, “Combining fuzzy information: an overview,” SIGMOD Record, vol. 31, 2002.

H. Bast, D. Majumdar, R. Schenkel, M. Theobald and G. Weikum, “IOTop-K: Index-access optimized top-k query processing,” Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), 2006.

P. Cao and Z. Wang, “Efficient Top-k Query Calculation in Distributed Networks,” Proceedings of the 23rd PODC, 2004.

H. W. Wong and D. Lee, “Implementations of partial document ranking using inverted files,” Information Processing and Management, vol. 29, no. 5, pp. 647–669, 1993.

R. Blanco and A. Barreiro, “Probabilistic static pruning of inverted files,” ACM Transactions on Information Systems, vol. 28, no. 1, 2010.

A. Soffer, D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici and Y. S. Maarek, “Static index pruning for information retrieval systems,” SIGIR, 2001, pp. 43–50.

A. Ntoulas and J. Cho, “Pruning policies for two-tiered inverted index with correctness guarantee,” SIGIR, 2007, pp. 191–198.

C. Buckley and A. F. Lewit, “Optimization of inverted vector searches,” Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1985.

T. Strohman, H. Turtle and W. B. Croft, “Optimization strategies for complex queries,” Proceedings of SIGIR, 2005, pp. 219–225.

M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu and J. Zien, “Evaluation strategies for top-k queries over memory-resident inverted indexes,” Proceedings of the VLDB Endowment, vol. 4, no. 12, pp. 1213–1224, 2011.

M. Kaszkiel and J. Zobel, “Term-ordered query evaluation versus document-ordered query evaluation for large document databases,” Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, August 1998, pp. 343–344.

F. Zhang, S. Shi, H. Yan and J. Wen, “Revisiting Globally Sorted Indexes for Efficient Document Retrieval,” WSDM’10, New York City, New York, USA, February 4–6, 2010.

J. Mackenzie, F. Scholer and J. Shane, “Early Termination Heuristics for Score-at-a-Time Index Traversal,” ADCS 2017, Brisbane, QLD, Australia, December 7–8, 2017.

J. Lin and A. Trotman, “Anytime Ranking for Impact-Ordered Indexes,” ICTIR’15, Northampton, MA, USA, September 27–30, 2015.

A. Elbagoury, M. Crane and J. Lin, “Rank-at-a-Time Query Processing,” ICTIR’16, Newark, DE, USA, September 12–16, 2016.

H. He and A. K. Singh, “Graphs-at-a-time: Query Language and Access Methods for Graph Databases,” SIGMOD’08, Vancouver, BC, Canada, June 9–12, 2008.

A. Anagnostopoulos, A. Z. Broder and D. Carmel, “Sampling Search-Engine Results,” WWW 2005, Chiba, Japan, May 10–14, 2005.

N. Tonellotto, C. Macdonald and I. Ounis, “Efficient and Effective Retrieval using Selective Pruning,” WSDM’13, Rome, Italy, February 4–8, 2013.

N. Tonellotto, C. Macdonald and I. Ounis, “Efficient Dynamic Pruning with Proximity Support,” LSDS-IR 2010, Geneva, Switzerland, July 23, 2010.

N. Tonellotto, C. Macdonald and I. Ounis, “Effect of Different Docid Orderings on Dynamic Pruning Retrieval Strategies,” SIGIR’11, Beijing, China, July 24–28, 2011.

N. Asadi and J. Lin, “Effectiveness/Efficiency Tradeoffs for Candidate Generation in Multi-Stage Retrieval Architectures,” SIGIR’13, Dublin, Ireland, July 28–August 1, 2013.

J. S. Culpepper and A. Moffat, “Efficient set intersection for inverted indexing,” ACM Transactions on Information Systems (TOIS), vol. 29, no. 1, 2010.

N. Asadi and J. Lin, “Fast Candidate Generation for Real-Time Tweet Search with Bloom Filter Chains,” ACM Transactions on Information Systems, vol. 31, no. 3, article 13, July 2013.

M. Petri, J. S. Culpepper and A. Moffat, “Exploring the Magic of WAND,” ADCS ’13, Brisbane, QLD, Australia, December 05–06, 2013.

Yubin Kim, Jamie Callan, J. Shane Culpepper and Alistair Moffat, “Does Selective Search Benefit from WAND Optimization,” ECIR 2016, Lecture Notes in Computer Science, vol. 9626, pp. 145–158, 2016.

A. Mallia, G. Ottaviano, E. Porciani, N. Tonellotto and R. Venturini, “Faster BlockMax WAND with Variable-sized Blocks,” SIGIR’17, Shinjuku, Tokyo, Japan, 2017.

E. Bortnikov, D. Carmel and G. Golan-Gueta, “Top-k Query Processing with Conditional Skips,” WWW 2017 Companion, Perth, Australia, April 3–7, 2017.

C. M. Daoud, E. Silva de Moura, D. Fernandes, A. Soares da Silva, C. Rossi and A. Carvalho, “Waves: a fast multi-tier top-k query processing algorithm,” Information Retrieval Journal, vol. 20, no. 3, March 2017.

D. Shan, S. Ding, J. He, H. Yan and X. Li, “Optimized Top-K Processing with Global Page Scores on Block-Max Indexes,” WSDM’12, Seattle, Washington, USA, February 8–12, 2012.

M. Petri, J. S. Culpepper and A. Moffat, “Exploring the Magic of WAND,” ADCS ’13, Brisbane, QLD, Australia, December 05–06, 2013.

A. Broder, L. Garcia-Pueyo and V. Josifovski, “Scalable K-Means by Ranked Retrieval,” WSDM’14, New York, New York, USA.

A. Kane and F. W. Tompa, “Split-Lists and Initial Thresholds for WAND-based Search,” SIGIR ’18, Ann Arbor, MI, USA, July 8–12, 2018.

M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie and A. Trotman, “A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation,” WSDM 2017, Cambridge, United Kingdom, February 06–10, 2017.

J. Mackenzie, J. S. Culpepper, R. Blanco, M. Crane, C. L. A. Clarke and J. Lin, “Query Driven Algorithm Selection in Early Stage Retrieval,” WSDM 2018, Marina Del Rey, CA, USA, February 5–9, 2018.

L. Wang, J. Lin and D. Metzler, “A cascade ranking model for efficient ranked retrieval,” Proceedings of SIGIR, 2011.

D. Gusfield, “An increment-by-one approach to suffix arrays and trees,” Technical Report CSE-90-39, UC Davis, Department of Computer Science, 1990.

R. Akbarinia, E. Pacitti and P. Valduriez, “Best position algorithms for top-k queries,” Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB ’07), 2007, pp. 495–506.

H. Yu, H. Li, P. Wu, D. Agrawal and A. Abbadi, “Efficient Processing of Distributed Top-k Queries,” Proceedings of the 16th DEXA, 2005.

S. Büttcher, C. L. A. Clarke and G. V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010, p. 68.

D. Lewis, Y. Yang, G. Rose and F. Li, “RCV1: A new benchmark collection for text categorization,” Journal of Machine Learning Research, 2004, pp. 361–397.

K. Mechach, L. Zekri and M. K. Abdi, “Collection and Selection Based Relevant Degrees Of Documents,” Journal of Digital Information Management (JDIM), vol. 13, no. 2, pp. 110–119, 2015.

Downloads

Published

2019-12-01
Metrics
Views/Downloads
  • Abstract
    61
  • PDF
    18

How to Cite

Mansouria, Z. I., Lougmiri, Z., and Mohamed, S. (2019). MWAND: A New Early Termination Algorithm for Fast and Efficient Query Evaluation. International Journal of Interactive Multimedia and Artificial Intelligence, 5(7), 57–69. https://doi.org/10.9781/ijimai.2019.04.002