The Renaissance of Information Retrieval

Intelligence Summit 2026
10-12 February 2026, SwissTech Convention Center in Lausanne. Photos by Samuel Devantery
I came back from AMLD Intelligence Summit 2026 (February 10–12, 2026, at the SwissTech Convention Center / EPFL in Lausanne) with a take that’s simple to say, but hard to execute well: we’re in an Information Retrieval renaissance, and “having vector search” is no longer a differentiator. 
Search is a task that saw many improvements in the last decade; what used to be a single search box is now the retrieval engine behind multiple product surfaces.
Retrieval Augmented Generation (RAG), the techniques used to enhance a language model with external context like business documents, is a clear example of this trend. However, other services show this pattern too. 
Take Recommendations: real-world recommender systems often use a two-stage process. As detailed in Google’s YouTube recommender paper they first perform candidate generation by retrieving a small subset from a massive corpus, and then rank the results. 
This change is even visible in government projects, such as in the Swiss federal law ecosystem, where Fedlex is piloting an "AI search on Fedlex data" project.
So here’s the main takeaway I got from the conference:
In 2026, the moat isn’t “we have embeddings” or “we use vector search”. It’s whether your retrieval stack is domain-specialized, evaluation-driven, and cost-efficient enough to scale. 
The research community has been telling us this in plain language: even large benchmarks like MTEB exist because no single embedding method dominates across all tasks. 
That’s also why “Domain Focus” is where real state-of-the-art behavior shows up: legal language is not just “English with longer sentences”; domain-specific pretraining and adaptation repeatedly outperforms generic models across these areas.

The evolution of Information Retrieval

Traditional Information Retrieval methods, such as TF-IDF (Term Frequency–Inverse Document Frequency) and the particularly successful BM25, rely on lexical overlap.
BM25 became one of the most widely deployed and effective ranking functions in classical search. Powering many real-world search engines in the 2010s, it’s very likely that if you used a search bar 10 years ago, it was using these techniques (or a close variation of them).
The reason why research on this field happened, is because lexical search has a structural problem: vocabulary mismatch (aka the “lexical gap”). 
The simplest example is still the most intuitive: if you have a query containing the word “automobile” you will likely miss documents only having the word “car” unless the search engine introduced an additional layer of complexity to handle pre-defined synonyms. It has even been proved that this wasn’t a corner case. 
Now let’s focus only on Switzerland, and the lexical gap becomes painfully concrete. Switzerland’s federal legislation, for instance, is published in German, French, and Italian. Those language versions are treated as equally authoritative. 
That means a naive keyword engine can be legally blind: searching for a German legal concept may miss the French phrasing in the federal statute (or vice versa) unless you build translation, multilingual indexing, or meaning-aware retrieval into the system.
So, even if BM25 is interpretable and cheap, it fundamentally does not “understand” meaning; in multilingual, citation-heavy, abbreviation-heavy domains (law, finance), meaning is precisely what decides whether results are usable.

How search engine can really capture the meaning of a query

A few years ago, the field of Information Retrieval (IR) witnessed an important paradigm shift with the introduction of the "Dense Retrieval" concept. 
It revolutionized the game by focusing on learned representations rather than treating text as a simple "bag of words", where the order and semantic relationships are largely ignored.
For example, consider the sentences “The cat chased the mouse” and “The mouse chased the cat”. They contain exactly the same words and refer to the same entities and actions. However, the meaning is completely different because word order determines who is performing the action and who is receiving it.
In a modern setting, a user searching for “Which animal chased the mouse?” expects to retrieve the first sentence (“the cat”). A system that ignores word order or semantic roles might consider both sentences highly similar and return the wrong one (since both are about cats, mice, and chasing). 
The way dense retrieval methods solve this issue is by embedding both queries and documents into a continuous, high-dimensional space called vector space.
Here is the high-level idea: imagine a space populated by thousands of vectors, each representing a document. When we run a search, we convert the query into its own vector, map it into that same space, and retrieve the documents whose vectors are closest to the query vector because they are the most similar.
This vector-based approach allows the model to capture deeper semantic meanings and relationships inherent in the text, something that term-matching methods like TF-IDF or BM25 often struggle with.
Dense Passage Retrieval (DPR) is a prime example of the modern "dense retrieval" framework. It utilizes a dual-encoder neural network architecture, featuring two separate encoders: one for generating the query embedding and another for the document passage embedding. 
The key innovation is that these two encoders are trained together. This joint training maximizes the similarity (proximity) of vector representations for relevant question-passage pairs, while simultaneously increasing the distance for irrelevant pairs. 
Essentially, this process teaches the model to anticipate the appropriate types of documents for any given query. And this is what makes large-scale retrieval highly efficient. 
Once the document embeddings are pre-calculated and indexed, the retrieval process for a new query is extremely fast: the query is encoded once, and its resulting vector is used to perform a rapid nearest-neighbor search against the massive static index of document vectors. 
This ability to perform retrieval at scale in vector space is a key advantage, often allowing DPR and similar dense retrieval models to significantly outperform strong, established baselines like BM25 in terms of top-k retrieval accuracy, especially on Question Answering (QA) benchmarks where understanding context is vital. 
The architecture combining embeddings and top-k retrieval has become the standard for implementing "LLMs with business knowledge". To efficiently implement a RAG system:
  • You don’t paste an entire document library into an LLM prompt.
  • You retrieve the top‑k chunks (or pages, or sections) of documents that are most relevant and feed only those into the model.
This technique offers multiple benefits: it reduces costs by using fewer tokens for answer generation, increases speed due to less context being processed, and improves accuracy by mitigating the effects of long context decay.
However, retrieving data from a vector database is not a simple activity, especially when we’re on a big scale. You are constantly forced to navigate the strict trade-off between blazing-fast efficiency and perfect accuracy, a balancing act managed by Approximate Nearest Neighbor (ANN) algorithms.
While older CPU-based standards like HNSW used to be the default, the massive scale of GenAI in 2026 requires specialized approaches. Two critical techniques dominating modern retrieval stacks are:
  • CAGRA (CUDA ANNS Graph-based): A state-of-the-art graph algorithm built by NVIDIA from the ground up for extreme GPU acceleration. Instead of bottlenecking on a CPU, it leverages massive parallelization to achieve ultra-high throughput on billion-scale datasets.
  • DiskANN: A storage-based approach designed to break the "memory wall." Instead of forcing you to store all your heavy vector indexes in expensive RAM, it cleverly utilizes modern, fast NVMe SSDs to search billion-scale datasets with barely any latency penalty.

Credits