Shanghai AI Lab Releases OREAL-7B and OREAL-32B: Advancing Mathematical Reasoning with Outcome Reward-Based Reinforcement Learning MarkTechPostRead More
This AI Paper Explores Long Chain-of-Thought Reasoning: Enhancing Large Language Models with Reinforcement Learning and Supervised Fine-Tuning MarkTechPostRead More