Toot

Written by ☮ ♥ ♬ 🧑‍💻 on 2025-02-03 at 06:17

“We reproduced #DeepSeek R1-Zero in the CountDown game, and it just works

Through RL, the 3B base LM develops #SelfVerification and #search abilities all on its own

You can experience the Ahah moment yourself for < $30” — #JiayiPan

Beginning to see replication of Deepseek … “learns to allocate more thinking time to a problem by re-evaluating its approach” … this is described as the “Ah Ha Moment”.

[#]AI / #ReenforcementLearning https://github.com/Jiayi-Pan/TinyZero / https://x.com/jiayi_pirate/status/1882839370505621655?s=46 / https://youtube.com/watch?v=e659KrxxN5w

=> More informations about this toot | View the thread | More toots from peterrenshaw@ioc.exchange

Toot

Written by ☮ ♥ ♬ 🧑‍💻 on 2025-02-03 at 06:17

Mentions

Tags