Toot

Written by David on 2024-12-13 at 18:48

Excited about the new xLSTM model release. There are many well-though designs compared to transformers: recurrence (which should allows composability), gating (like Mamba & LSTM which is based on, which allows time complexity independent of the input size), state tracking (unlike Mamba & transformers). For now, these advantage aren’t apparent on benchmarks, but most training techniques are secrets, and the recent advances of LLMs evidenced that they matter a lot.

=> More informations about this toot | View the thread | More toots from theawely

Mentions

Tags

Proxy Information
Original URL
gemini://mastogem.picasoft.net/toot/113647007631353153
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
222.432402 milliseconds
Gemini-to-HTML Time
0.223325 milliseconds

This content has been proxied by September (3851b).