Embeddings are compression. We have collectively forgotten this, and our retrieval pipelines have gotten worse for it.
This is a short article arguing that the field’s enthusiasm for ever-larger embedding models has obscured the more interesting question: how much information is actually packed into a 1024-dimensional float vector, and how much can we get away with throwing away?
With measurements on a 200M-document news corpus, we show that aggressive dimensionality reduction (PCA to 128 dims) loses less retrieval quality than most teams assume — about 2 nDCG points at the top of the ranking, and almost nothing in the tail.