Efficient Approximate Nearest Neighbor (ANN) Search
26 Mar 2025
[ \mathcal{X} = {x_i \in \mathbb{R}^d}_{i=1}^N ]
and a query $q \in \mathbb{R}^d$, the task is to find $x^* \in \mathcal{X}$ such that:
[ x^* = \arg\min_{x \in \mathcal{X}} |x - q|_2 ]
Given a set $\mathcal{X} = {x_i \in \mathbb{R}^d}{i=1}^N$ and a query $q \in \mathbb{R}^d$, the task is to find $x^* \in \mathcal{X}$ such that: $x^* = \arg\min{x \in \mathcal{X}} |x - q|_2$.
Background and Traditional SOTA
[] Approximate Nearest Neighbor (ANN) search aims to efficiently retrieve data points from a high-dimensional space that are closest to a given query point. Formally, for a dataset ( \mathcal{X} = {x_i \in \mathbb{R}^d}_{i=1}^N ), and a query ( q \in \mathbb{R}^d ), the task is to find ( x^* \in \mathcal{X} ) such that:
[ x^* = \arg\min_{x \in \mathcal{X}} |x - q|_2 ]
However, due to the curse of dimensionality, exact search is zoften impractical at scale. ANN allows small inaccuracies in return for drastic improvements in speed.
Traditional SOTA Methods (pre-2020)
- Hashing-based Methods
- Locality Sensitive Hashing (LSH) [Indyk and Motwani, 1998]
- Projects points using hash functions ( h(x) = \lfloor \frac{a^T x + b}{r} \rfloor ), where ( a \sim \mathcal{N}(0, I) ), and ( b \sim U[0, r] ).
- Guarantees: High probability of collision for close points.
- Locality Sensitive Hashing (LSH) [Indyk and Motwani, 1998]
- Graph-based Methods
- Hierarchical Navigable Small World (HNSW) [Malkov & Yashunin, 2018]
- Builds a multi-layer proximity graph.
- High recall and fast search due to small-world properties.
- Complexity: ( O(\log N) ) per query.
- Hierarchical Navigable Small World (HNSW) [Malkov & Yashunin, 2018]
- Quantization-based Methods
- Product Quantization (PQ) [Jégou et al., 2011]
- Vector ( x \in \mathbb{R}^d ) is split into ( m ) sub-vectors and each sub-vector is quantized.
- Storage-efficient; used in FAISS.
- Product Quantization (PQ) [Jégou et al., 2011]
Key Challenges
- Scalability to billions of vectors.
- Adaptability to dynamic datasets.
- Latency vs. Recall trade-off.
- Hardware-accelerated search (e.g., GPU/TPU optimization).
Recent Trends and Innovations (2024 and beyond)
🔥 1. Neural ANN Indexes: Neural-Augmented Search
Key Works:
- “Zoom: SSD-based vector search for billion-scale datasets” (Xiao et al., 2023)
- “DSPANN: Distilled Small Parallel ANN” (NeurIPS 2024)
Technical Overview
Neural networks are used to learn the partitioning or routing in the vector space, combined with traditional indexing methods.
Mathematics: Given a query ( q ), a learned function ( f(q) ) predicts a subset ( C \subset \mathcal{X} ) where the true nearest neighbor is likely to be. ANN is then performed within ( C ).
Architectural Changes:
- Learned hash functions or routing networks.
- Often use transformer or MLP encoders for learning index mappings.
Training Paradigm:
- Supervised or self-supervised learning on pairwise similarity data.
- Contrastive loss (e.g., InfoNCE):
[ \mathcal{L}{contrastive} = -\log \frac{\exp(\text{sim}(q, x^+)/\tau)}{\sum{x \in \mathcal{X}} \exp(\text{sim}(q, x)/\tau)} ]
Evaluation:
- Datasets: Deep1B, GIST1M, MS MARCO, LAION-5B
- Metrics: Recall@k, QPS (queries per second), Latency, Index Size
⚙️ 2. Vector Databases with Learned Compression
Key Work:
- “ColBERTv2” (Santhanam et al., 2023)
- “Q-Mix: Fine-grained Vector Compression with Quantized Mixtures” (ICLR 2024)
Innovations:
- Combine neural encoders (BERT/ViT) with learned quantization (e.g., codebooks trained end-to-end).
- Supports asymmetric distance computation (ADC): the query remains in float while database vectors are quantized.
Compression: Each vector ( x ) is represented as: [ x \approx \sum_{j=1}^m c_{j,k_j}, \quad \text{where } c_{j,k_j} \in \mathcal{C}_j ]
- ( m ): number of codebooks
- ( k_j ): index in the ( j )-th codebook
⚡ 3. GPU-native ANN and In-memory Vector Search
Key Projects:
- FAISS (Facebook AI Similarity Search) – updated with GPU kernels.
- KNN-Accelerator in NVIDIA RAPIDS
Trends:
- Vector search libraries optimized for CUDA and cuBLAS.
- Use of mixed precision (FP16 / INT8) for faster dot products.
- Fast quantization/dequantization pipelines.
🧠 4. Multi-modal ANN Search
Key Work:
- “Dual-Encoder + Cross-Encoder Reranking Pipelines” (OpenAI, LAION)
- “SEED-BERT: Efficient Cross-modal Search” (CVPR 2024)
Architecture:
- Dual encoder maps query and document/image/audio into a shared space.
- Use ANN to retrieve candidates, followed by a reranking model.
[ \text{ANN: } \arg\max_{x \in \mathcal{X}} \langle f_q(q), f_d(x) \rangle \quad \text{then rerank with: } \text{score}(q, x) = \text{CrossEncoder}(q, x) ]
Training Paradigm:
- Massive pretraining on noisy multi-modal pairs (e.g., CLIP, ALIGN).
- Contrastive learning with hard negatives.
Comparison Table
Method | Index Type | Hardware Friendly | Recall | Latency | Notes |
---|---|---|---|---|---|
HNSW | Graph | CPU | 95%+ | ~ms | SOTA classical |
PQ + IVF | Quantization | GPU (FAISS) | 90%+ | ~ms | Memory-efficient |
Zoom / DSPANN | Learned Routing | SSD + CPU | 98%+ | <1ms | End-to-end learnable |
Q-Mix | Learned Compression | GPU | 96%+ | low | High compression ratio |
SEED-BERT | Dual Encoder | GPU | varies | varies | Multimodal search |
🔭 Future Directions and Open Challenges
- Training-Free / Few-Shot Adaptation:
- Can we dynamically adapt ANN structures to new domains without retraining?
- Unified Multi-modal Indexes:
- Indexes that seamlessly support text, image, video, and audio.
- Edge & On-device ANN:
- Optimizing vector search for limited hardware (e.g., WebAssembly, iOS).
- Privacy-preserving ANN:
- Incorporating techniques like Federated Learning, Differential Privacy, or Homomorphic Encryption into vector retrieval.
- Probabilistic Index Verification:
- Building confidence intervals for nearest neighbor results; trust scores.
📚 Key References
- Malkov, Yu. A., & Yashunin, D. A. (2018). “Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs.” IEEE TPAMI.
- Jégou, H., Douze, M., & Schmid, C. (2011). “Product quantization for nearest neighbor search.” IEEE TPAMI.
- Santhanam, K., et al. (2023). “ColBERTv2: Efficient and Effective Retrieval via Lightweight Late Interaction.” arXiv.
- Xiao, Y. et al. (2023). “Zoom: SSD-based vector search for billion-scale datasets.” VLDB.
- Q-Mix (ICLR 2024): Blog summary by Meta AI
- FAISS: https://github.com/facebookresearch/faiss