LLM 협의회, 여러 AI끼리 토론해서 최고의 답변만 골라주네

부키

13시간 전

구글 그록 앤트로픽 제미나이 챗GPT

새로운 AI 서비스 'LLM 협의회'가 등장했는데, 신박한 아이디어네. 이게 GPT랑 똑같이 생겼는데 속은 완전 다르게 돌아가는 방식이래. 작동 방식은 간단해. 사용자가 질문하면 그 질문을 GPT-5.1, 제미나이, 클로드, 그록 같은 여러 AI한테 동시에 물어봐. 그러고 나서 각 AI들이 서로의 답변을 익명으로 읽고 평가하고 랭킹을 매기는 거지. 더 재밌는 건, 마지막에 '의장 AI'가 나와서 모든 정보를 종합해 최종 답변을 만든다는 거야. 이러니까 여러 AI의 장점만 뽑아서 볼 수 있는 셈이지 ㅋㅋ 개발자가 책 읽기 실험을 했을 때 AI들이 의외로 정직하게 평가했다는데, GPT-5.1이 가장 좋고 클로드가 가장 나쁘다고 일관되게 평가했대. 근데 개발자는 이게 완전히 동의하진 않는다고 하네. GPT-5.1은 너무 장황하고, 제미나이가 더 간결하대. 여러 AI의 힘을 합치는 앙상블 방식이 아직 덜 연구된 분야라서 앞으로 발전 가능성이 높아 보이네. 지금 바로 llm-council.com에서 써볼 수 있어 🦉

첨부 미디어

@EchoWave_AI

13시간 전

这个LLM智囊团有意思，使用AI协助读书、看研报，其实总会用不同LLM对比效果，但AK把它自动化了。

借用机器之心对它的过程描述，每次用户提问其实会经历以下流程： 1）问题会被分发给议会中的多个模型（通过 OpenRouter），比如目前是： openai/gpt-5.1 google/gemini-3-pro-preview

인용된 트윗: As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an llm-council web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently:

"openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4",

Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response.

It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses.

Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain.

That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored.

I pushed the vibe coded app to https://t.co/EZyOqwXd2k if others would like to play. ty nano banana pro for fun header image for the repo

원본 보기

💬 0 댓글