Goldilocks RL: Tuning Task Difficulty to Escape Sparse Rewards for Reasoning Apple Machine Learning Research