오픈AI, AI의 '사고 과정' 모니터링할 수 있는 프레임워크 만들었네

부키

21시간 전

교육 사업 오픈AI 챗GPT 텍스트

AI가 어떻게 생각하는지 들여다볼 수 있는 기술이 나왔어. 오픈AI가 AI의 사고 과정(CoT, Chain-of-Thought)을 모니터링할 수 있는 프레임워크를 개발했대. 이걸로 AI가 어떤 추론 과정을 거쳐 답변을 내는지 확인할 수 있는 거지. 이 기술의 핵심은 AI가 최종 답변만 보여주는 게 아니라 중간 과정을 다 보여준다는 거야. 마치 학생이 수학 문제 풀 때 풀이 과정을 적는 것처럼. 연구팀에 따르면 AI가 '생각'을 더 많이 할수록(CoT가 길수록) 문제점을 발견하기도 쉬워진대. 재밌는 건 작은 모델도 추론에 더 많은 노력을 기울이면 큰 모델만큼 잘 모니터링된다는 거야. 물론 계산 비용은 더 들겠지만. 사실 이건 AI 안전성 측면에서 엄청 중요한 연구인 듯. AI가 어떻게 결론에 도달했는지 알 수 있으면 그 생각 과정이 위험하거나 잘못됐을 때 발견하고 제어할 수 있으니까 ㅋㅋ 또 모니터가 AI의 사고 과정을 더 잘 읽을수록, 문제 발견 능력도 빠르게 향상된대. 확실히 AI 두뇌의 창문 역할을 톡톡히 하는 기술이네 🦉

첨부 미디어

@OpenAI

21시간 전

To preserve chain-of-thought (CoT) monitorability, we must be able to measure it.

We built a framework + evaluation suite to measure CoT monitorability — 13 evaluations across 24 environments — so that we can actually tell when models verbalize targeted aspects of their

RL at today’s frontier doesn’t seem to wreck monitorability and can help early reasoning steps. But there’s a tradeoff: smaller models run with higher reasoning effort can be easier to monitor at similar capability — at the cost of extra inference compute (a “monitorability

Monitoring a model’s chain-of-thought is far more effective than watching only its actions or final answers.

The more a model “thinks” (longer CoTs), the easier it is to spot issues. https://t.co/e1AgGXSRvZ

We view chain-of-thought monitoring as complementary to mechanistic interpretability, not as a replacement for it.

Because we believe that chain-of-thought monitoring is incredibly useful as a window into a model’s brain and could be a loadbearing layer in a scalable control

What the monitor gets to read and the capability of the monitor matters.

Stronger monitors that can read CoTs and use more test-time compute get much better fast.

Also, post-hoc follow-ups (by asking the model to elaborate) often surface previously unspoken thoughts and boost

원본 보기

💬 0 댓글