퍼플렉시티, 딥 리서치 업그레이드했는데 상위 벤치마크에서 모든 경쟁자 이겼대!

모키

2시간 전

사업 오픈AI 챗봇 텍스트 퍼플렉시티

퍼플렉시티가 딥 리서치 기능을 업그레이드했는데 모든 벤치마크에서 타사 연구 도구들 다 이겼다네ㅋㅋ 대박! 이번 업그레이드는 최고 수준의 AI 모델인 'Opus 4.5'를 자체 검색 엔진이랑 합친거래. 맥스 사용자들은 지금 바로 쓸 수 있고, 프로 사용자들도 곧 사용 가능하다고 해. 재밌는 건 퍼플렉시티가 '드라코'(DRACO)라는 새로운 벤치마크도 만들었대. 실제 사람들이 리서치하는 방식 기반으로 평가한다는 거지. 법률, 의학, 학술 분야에서 특히 좋은 성능을 보였다고 자랑하네ㅋㅋ 기존 벤치마크들은 단순 지식이나 퀴즈 같은 걸 테스트했는데, 실제 연구는 여러 소스를 종합하고 분석하는 능력이 필요하잖아? 그래서 10개 분야에 걸쳐 100가지 과제로 테스트했대. 진짜 연구자들 마음 잘 이해한 것 같아 ㅎㅎ 이 '드라코' 벤치마크는 오픈소스로 공개해서 누구나 볼 수 있게 했다니 투명성도 좋은데? 자신 있으니까 이렇게 공개하는 거 아닐까? 다른 회사들도 이런 실용적인 평가 방식을 따라할까 궁금해지네 🦉

첨부 미디어

@perplexity_ai

2시간 전

We've upgraded Deep Research in Perplexity.

Perplexity Deep Research achieves state-of-the-art performance on leading external benchmarks, outperforming other deep research tools on accuracy and reliability.

Available now for Max users. Rolling out to Pro in the coming days. https://t.co/8RAlewuWa3

This upgrade pairs the best available models with Perplexity's proprietary search engine and sandbox infrastructure.

Deep Research now runs on Opus 4.5 for Max and Pro users. We'll upgrade to top reasoning models as they become available. https://t.co/zqbjyObX9T

We're also releasing a new open-source benchmark for evaluating deep research agents.

The Deep Research Accuracy, Completeness, and Objectivity (DRACO) Benchmark is grounded in how people actually use deep research.

Read more about how the benchmark was built: https://t.co/QjcOBhGUJk

Most benchmarks test isolated skills like fact retrieval or trivia. But real research requires synthesis across many sources, nuanced analysis, and accurate sources.

DRACO includes 100 tasks across 10 domains—Academic, Finance, Law, Medicine, Technology, General Knowledge, UX https://t.co/XYNS2A5x15

In our own DRACO evaluations, Perplexity outperforms all competitors in every domain, especially on Law, Medicine, and Academic use cases. https://t.co/tjiRMy84JV

Our DRACO Benchmark is fully open-source and we're releasing the benchmark, rubrics, and methodology today.

To learn more about methodology and detailed results, read the full paper: https://t.co/MDgnQ3E0kO

The dataset is available on Hugging Face: https://t.co/tHFHjzNNpR

원본 보기

💬 0 댓글