Implementing Block-Max Pruning in Rust: Faster Learned Sparse Retrieval for Modern Search
- Track: Search
- Room: UB4.136
- Day: Sunday
- Start: 13:20
- End: 13:50
- Video only: ub4136
- Chat: Join the conversation!
Learned sparse retrieval models such as SPLADE, uniCOIL, and other transformer-based sparse encoders have become popular for delivering neural-level relevance while preserving the efficiency of inverted indexes. But these models also produce indexes with statistical properties radically different from classic BM25: longer queries, compressed vocabularies, and posting lists with unusual score distributions. As a result, traditional dynamic pruning algorithms like WAND and Block-Max WAND often fail to exploit their full potential.
This talk presents Block-Max Pruning (BMP) from a systems and Rust-engineering perspective. We will walk through how BMP restructures query processing by partitioning document space into small, contiguous blocks and maintaining lightweight, on-the-fly score upper bounds that guide safe or approximate early termination.
The talk is aimed at developers building retrieval engines, Rust-based data systems, or ML-powered search pipelines who want to push sparse retrieval performance further. Attendees will leave with a clear understanding of how BMP works, why learned sparse models require new pruning strategies, and how to integrate these ideas into modern, high-performance Rust codebases.
Code and resources: BMP GitHub repository: https://github.com/pisa-engine/BMP/ Paper (SIGIR 2024): https://www.antoniomallia.it/uploads/SIGIR24.pdf
Speakers
| Ferdinand Schlatt | |
| Antonio Mallia |