OpenAI o1 Likely Uses RL over Chains of Thought to Build System 2 LLMs
In what’s been seen as a potentially substantial leap in artificial intelligence, or AI, technology, two new AI models are capable of reasoning by using what are termed chain of thoughts and reasoning tokens. It opens the possibility that smaller AI models can achieve effective reasoning capabilities. The new models use what is called reinforcement learning over auto-generated chains of thought, says Subbarao Kambhampati, a professor in the School of Computing and Augmented Intelligence, part of the Fulton Schools. Kambhampati also points out one possible drawback — that these models could make it difficult to check the reasoning behind the solutions they generate.