Towards Data Science00 bình luận21 phút đọc2 ngày trước

Retrieval Is Filtering, Not Search: A Mental Model for Enterprise RAG

A mental model for enterprise RAG that reframes retrieval as a filtering problem on two structured DataFrames — line_df (document text) and toc_df (table of contents) — rather than a vector similarity search. The core insight is separating anchor scope (small, precise: where the signal is detected) from context scope (large, sufficient: what gets passed to the LLM for generation). The article walks through four question types that require different retrieval strategies, explains why cosine top-k fails on most of them, and introduces three context expansion strategies: paragraph, section, and window expansion. It also covers the anchor/context distinction, how the two DataFrames collaborate via section_id joins, and when to use LLM calls versus deterministic rules for section boundary detection.

Đọc bài gốc

#llm #rag #pandas

Nguồn: https://towardsdatascience.com/retrieval-is-filtering-not-search-a-mental-model-for-enterprise-rag. 8sync News chỉ tóm tắt và dẫn link; bản quyền nội dung thuộc tác giả và nguồn gốc.

Đề xuất cho bạn

Rust222 phút2 giờ trướcAI

The many journeys of learning Rust

Nghiên cứu định tính từ nhóm Rust về cách các nhà phát triển học ngôn ngữ Rust thông qua phỏng vấn và khảo sát, nổi bật các con đường học tập (tò mò, chuyển đổi công việc, áp dụng tổ chức), khó khăn thường gặp (quên thói quen OOP, 'clone guilt'), vai trò của borrow checker và trợ lý AI (LLMs), cũng như chiến lược đào tạo nhóm. Bài viết cũng đề cập đến tình trạng 'bỏ cuộc thầm lặng' và ảnh hưởng của cộng đồng đến sự gắn bó lâu dài, đồng thời đưa ra khuyến nghị cải thiện tài liệu học tập.

Những kinh nghiệm thực tế từ các lập trình viên học Rust sẽ giúp bạn hiểu rõ cách vượt qua thách thức từ bản chất mới của ngôn ngữ và xây dựng chiến lược học tập hiệu quả.

#llm

Retrieval Is Filtering, Not Search: A Mental Model for Enterprise RAG

Đề xuất cho bạn

The many journeys of learning Rust

Letting an LLM Pick the Right RAG Page: The Arbiter Pattern at the End of Retrieval

The Mirror You Trained

Anthropic’s Mythos found flaws in classified US systems during a government test

Don’t Let the Model Grade its Own Homework

Knowledge graph RAG: structured retrieval for AI agents

My 7-year-old GPU runs local AI perfectly, and I don't need my cloud subscriptions anymore

Sub-agents: splitting context across specialized AI agents