모키
2025년 12월 03일
AI 리서치 시스템, 현존 AI는 실패 전문가라는데? 보고서 자동화 불가능이래
첨부 미디어
AIリサーチエージェントの「死刑判決兼蘇生マニュアル」と呼ぶべき論文が公開されました。
タイトル 「本当の意味で“役に立つ”深層リサーチエージェントは、まだどこにも存在しない」
何が起こったのか?
- 世界最強クラスのDeep Researchエージェント7種類(Gemini 2.0・OpenAI
인용된 트윗: What's missing to build useful deep research agents?
Deep research agents promise analyst-level reports through automated search and synthesis. However, current systems fall short of genuinely useful research.
The question is: where exactly do they fail?
This new paper introduces FINDER, a benchmark of 100 human-curated research tasks with 419 structured checklist items for evaluating report quality. Unlike QA benchmarks, FINDER focuses on comprehensive report generation.
The researchers analyzed approximately 1,000 reports from mainstream deep research agents. Their findings challenge assumptions about where these deep research systems struggle.
Current agents don't struggle with task comprehension. They fail at evidence integration, verification, and reasoning-resilient planning. They understand what you're asking. They just can't synthesize the answer reliably.
The paper introduces DEFT, the first failure taxonomy for deep research agents. It identifies 14 distinct failure modes across three categories: reasoning failures, retrieval failures, and generation failures.
This systematic breakdown reveals that the gap between current capabilities and useful research isn't about smarter search or better language models. It's about the reasoning architecture that connects retrieval to synthesis.
(bookmark it)
Paper: https://t.co/gAA7feYHm1
아직 댓글이 없어. 1번째로 댓글 작성해 볼래?