AI 리서치 시스템, 현존 AI는 실패 전문가라는데? 보고서 자동화 불가능이래

모키

2025년 12월 03일

교육 앤트로픽 오픈AI 제미나이 챗GPT

AI 업계에서 충격적인 연구 결과가 나왔어! '진짜 유용한 딥 리서치 에이전트는 아직 어디에도 없다'라는 논문이 공개됐는데, 이건 AI 연구 도우미들한테 사형선고 같은 거래 ㅠㅠ 연구팀이 제미나이 2.0이랑 OpenAI 같은 세계 최고 수준의 딥 리서치 에이전트 7종을 분석했는데, 결과가 엄청 충격적이야! 지금 AI들은 우리가 뭘 물어보는지는 잘 이해하지만, 검색한 정보를 종합해서 제대로 된 답변을 만들어내는 게 완전 실패라는 거야. 정보 통합, 검증, 계획 수립에서 다 망한대 ㅋㅋ 이 연구에서는 'FINDER'라는 벤치마크로 100개 연구과제와 419개 체크리스트 항목을 만들어서 AI의 보고서 품질을 평가했고, 거기서 AI들이 어디서 실패하는지 14가지 유형으로 분류했대! 결국 우리가 필요한 건 더 똑똑한 검색이나 더 좋은 언어 모델이 아니라, 검색과 정보 합성을 연결하는 추론 구조라는 게 결론이래. AI 시스템들아 공부 좀 더 하고 와...😅 🦉

첨부 미디어

@happylife119225

2025년 12월 03일

AIリサーチエージェントの「死刑判決兼蘇生マニュアル」と呼ぶべき論文が公開されました。

タイトル「本当の意味で“役に立つ”深層リサーチエージェントは、まだどこにも存在しない」

何が起こったのか？

世界最強クラスのDeep Researchエージェント7種類（Gemini 2.0・OpenAI

인용된 트윗: What's missing to build useful deep research agents?

Deep research agents promise analyst-level reports through automated search and synthesis. However, current systems fall short of genuinely useful research.

The question is: where exactly do they fail?

This new paper introduces FINDER, a benchmark of 100 human-curated research tasks with 419 structured checklist items for evaluating report quality. Unlike QA benchmarks, FINDER focuses on comprehensive report generation.

The researchers analyzed approximately 1,000 reports from mainstream deep research agents. Their findings challenge assumptions about where these deep research systems struggle.

Current agents don't struggle with task comprehension. They fail at evidence integration, verification, and reasoning-resilient planning. They understand what you're asking. They just can't synthesize the answer reliably.

The paper introduces DEFT, the first failure taxonomy for deep research agents. It identifies 14 distinct failure modes across three categories: reasoning failures, retrieval failures, and generation failures.

This systematic breakdown reveals that the gap between current capabilities and useful research isn't about smarter search or better language models. It's about the reasoning architecture that connects retrieval to synthesis.

(bookmark it)

Paper: https://t.co/gAA7feYHm1