Toot

Written by ☮ ♥ ♬ 🧑‍💻 on 2025-02-03 at 06:17

“We reproduced #DeepSeek R1-Zero in the CountDown game, and it just works

Through RL, the 3B base LM develops #SelfVerification and #search abilities all on its own

You can experience the Ahah moment yourself for < $30” — #JiayiPan

Beginning to see replication of Deepseek … “learns to allocate more thinking time to a problem by re-evaluating its approach” … this is described as the “Ah Ha Moment”.

[#]AI / #ReenforcementLearning https://github.com/Jiayi-Pan/TinyZero / https://x.com/jiayi_pirate/status/1882839370505621655?s=46 / https://youtube.com/watch?v=e659KrxxN5w

=> More informations about this toot | View the thread | More toots from peterrenshaw@ioc.exchange

Mentions

Tags

=> View deepseek tag | View selfverification tag | View search tag | View jiayipan tag | View ai tag | View reenforcementlearning tag

Proxy Information
Original URL
gemini://mastogem.picasoft.net/toot/113938495868521819
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
226.596419 milliseconds
Gemini-to-HTML Time
0.993052 milliseconds

This content has been proxied by September (3851b).