Efficient Approximate Nearest Neighbor (ANN) Search

[ \mathcal{X} = {x_i \in \mathbb{R}^d}_{i=1}^N ]

and a query $q \in \mathbb{R}^d$, the task is to find $x^* \in \mathcal{X}$ such that:

[ x^* = \arg\min_{x \in \mathcal{X}} |x - q|_2 ]

Given a set $\mathcal{X} = {x_i \in \mathbb{R}^d}{i=1}^N$ and a query $q \in \mathbb{R}^d$, the task is to find $x^* \in \mathcal{X}$ such that: $x^* = \arg\min{x \in \mathcal{X}} |x - q|_2$.

Background and Traditional SOTA

[] Approximate Nearest Neighbor (ANN) search aims to efficiently retrieve data points from a high-dimensional space that are closest to a given query point. Formally, for a dataset ( \mathcal{X} = {x_i \in \mathbb{R}^d}_{i=1}^N ), and a query ( q \in \mathbb{R}^d ), the task is to find ( x^* \in \mathcal{X} ) such that:

[ x^* = \arg\min_{x \in \mathcal{X}} |x - q|_2 ]

However, due to the curse of dimensionality, exact search is zoften impractical at scale. ANN allows small inaccuracies in return for drastic improvements in speed.


Traditional SOTA Methods (pre-2020)

  1. Hashing-based Methods
    • Locality Sensitive Hashing (LSH) [Indyk and Motwani, 1998]
      • Projects points using hash functions ( h(x) = \lfloor \frac{a^T x + b}{r} \rfloor ), where ( a \sim \mathcal{N}(0, I) ), and ( b \sim U[0, r] ).
      • Guarantees: High probability of collision for close points.
  2. Graph-based Methods
    • Hierarchical Navigable Small World (HNSW) [Malkov & Yashunin, 2018]
      • Builds a multi-layer proximity graph.
      • High recall and fast search due to small-world properties.
      • Complexity: ( O(\log N) ) per query.
  3. Quantization-based Methods
    • Product Quantization (PQ) [Jégou et al., 2011]
      • Vector ( x \in \mathbb{R}^d ) is split into ( m ) sub-vectors and each sub-vector is quantized.
      • Storage-efficient; used in FAISS.

Key Challenges

  • Scalability to billions of vectors.
  • Adaptability to dynamic datasets.
  • Latency vs. Recall trade-off.
  • Hardware-accelerated search (e.g., GPU/TPU optimization).

Key Works:

  • “Zoom: SSD-based vector search for billion-scale datasets” (Xiao et al., 2023)
  • “DSPANN: Distilled Small Parallel ANN” (NeurIPS 2024)

Technical Overview

Neural networks are used to learn the partitioning or routing in the vector space, combined with traditional indexing methods.

Mathematics: Given a query ( q ), a learned function ( f(q) ) predicts a subset ( C \subset \mathcal{X} ) where the true nearest neighbor is likely to be. ANN is then performed within ( C ).

Architectural Changes:

  • Learned hash functions or routing networks.
  • Often use transformer or MLP encoders for learning index mappings.

Training Paradigm:

  • Supervised or self-supervised learning on pairwise similarity data.
  • Contrastive loss (e.g., InfoNCE):

[ \mathcal{L}{contrastive} = -\log \frac{\exp(\text{sim}(q, x^+)/\tau)}{\sum{x \in \mathcal{X}} \exp(\text{sim}(q, x)/\tau)} ]

Evaluation:

  • Datasets: Deep1B, GIST1M, MS MARCO, LAION-5B
  • Metrics: Recall@k, QPS (queries per second), Latency, Index Size

⚙️ 2. Vector Databases with Learned Compression

Key Work:

  • “ColBERTv2” (Santhanam et al., 2023)
  • “Q-Mix: Fine-grained Vector Compression with Quantized Mixtures” (ICLR 2024)

Innovations:

  • Combine neural encoders (BERT/ViT) with learned quantization (e.g., codebooks trained end-to-end).
  • Supports asymmetric distance computation (ADC): the query remains in float while database vectors are quantized.

Compression: Each vector ( x ) is represented as: [ x \approx \sum_{j=1}^m c_{j,k_j}, \quad \text{where } c_{j,k_j} \in \mathcal{C}_j ]

  • ( m ): number of codebooks
  • ( k_j ): index in the ( j )-th codebook

Key Projects:

  • FAISS (Facebook AI Similarity Search) – updated with GPU kernels.
  • KNN-Accelerator in NVIDIA RAPIDS
  • Vector search libraries optimized for CUDA and cuBLAS.
  • Use of mixed precision (FP16 / INT8) for faster dot products.
  • Fast quantization/dequantization pipelines.

Key Work:

  • “Dual-Encoder + Cross-Encoder Reranking Pipelines” (OpenAI, LAION)
  • “SEED-BERT: Efficient Cross-modal Search” (CVPR 2024)

Architecture:

  • Dual encoder maps query and document/image/audio into a shared space.
  • Use ANN to retrieve candidates, followed by a reranking model.

[ \text{ANN: } \arg\max_{x \in \mathcal{X}} \langle f_q(q), f_d(x) \rangle \quad \text{then rerank with: } \text{score}(q, x) = \text{CrossEncoder}(q, x) ]

Training Paradigm:

  • Massive pretraining on noisy multi-modal pairs (e.g., CLIP, ALIGN).
  • Contrastive learning with hard negatives.

Comparison Table

Method Index Type Hardware Friendly Recall Latency Notes
HNSW Graph CPU 95%+ ~ms SOTA classical
PQ + IVF Quantization GPU (FAISS) 90%+ ~ms Memory-efficient
Zoom / DSPANN Learned Routing SSD + CPU 98%+ <1ms End-to-end learnable
Q-Mix Learned Compression GPU 96%+ low High compression ratio
SEED-BERT Dual Encoder GPU varies varies Multimodal search

🔭 Future Directions and Open Challenges

  1. Training-Free / Few-Shot Adaptation:
    • Can we dynamically adapt ANN structures to new domains without retraining?
  2. Unified Multi-modal Indexes:
    • Indexes that seamlessly support text, image, video, and audio.
  3. Edge & On-device ANN:
    • Optimizing vector search for limited hardware (e.g., WebAssembly, iOS).
  4. Privacy-preserving ANN:
    • Incorporating techniques like Federated Learning, Differential Privacy, or Homomorphic Encryption into vector retrieval.
  5. Probabilistic Index Verification:
    • Building confidence intervals for nearest neighbor results; trust scores.

📚 Key References

  • Malkov, Yu. A., & Yashunin, D. A. (2018). “Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs.” IEEE TPAMI.
  • Jégou, H., Douze, M., & Schmid, C. (2011). “Product quantization for nearest neighbor search.” IEEE TPAMI.
  • Santhanam, K., et al. (2023). “ColBERTv2: Efficient and Effective Retrieval via Lightweight Late Interaction.” arXiv.
  • Xiao, Y. et al. (2023). “Zoom: SSD-based vector search for billion-scale datasets.” VLDB.
  • Q-Mix (ICLR 2024): Blog summary by Meta AI
  • FAISS: https://github.com/facebookresearch/faiss