Build a search engine, not a vector DB

If you want to make a good RAG tool that uses your documentation, you should start by making a search engine over those documents that would be good enough for a human to use themselves. This is exactly what I’ve been trying to communicate in my org in the past few months. It’s 2024 and we still can’t have a proper search engine in organizations to find relevant information from various sources. While this problem remains to be solved, organizations are adapting RAG and AI into their tooling, but are missing the important R of the RAG: Retrieval. I’ve been an advocate of prioritizing search engines over any AI related tool in the past few months, and I found it refreshing to read about this somewhere else: ...

2024-12-04 · 2 min

To Chunk or Not to Chunk With the Long Context Single Embedding Models

In his excellent write up on state of the art embedding models, Aapo Tanskanen compares the retrieval score for when the source documents are split into chunks and when they’re not: Transformer-based single embedding models have traditionally had 512 token context windows because of the usage of the original BERT encoder. Newer models, like the BGE-M3, have expanded the token window to much larger scales. It could be tempting to forget chunking and just embed long texts as they are. However, that would mean mashing many topics and entities you might want to search for into a single vector representation, which doesn’t sound like a good idea. ...

2024-06-02 · 2 min