They did a lot of work to use float-8 values for their computation to reduce computation time and memory. They had to optimize the quantization process to have stable values.
They used one of their "reasoning" models to distill better reasoning capabilities.
They did a bunch of work on making sure the networked communication between their experts were efficient (but I skimmed this part).
=> More informations about this toot | View the thread
Here's the DeepSeek-V3 paper if anyone is too lazy to look it up but may be interested in reading it. The model has a ton of parameters only a fraction of them are run at once. The tokens are put through a gating mechanism that sends it to smaller transformer models that can run more efficiently. The results from each smaller transformer are summed together, and multiple tokens are predicted for the output.
https://arxiv.org/abs/2412.19437
=> More informations about this toot | View the thread
My sister passed away unexpectedly this last year, and it's weird the digital artifacts left behind by someone's life. She ran a dog service, and I paid her to groom my dog. She gave me a recurring 6 week reminder that I don't know that I want to delete. I wonder if this will pop up on my calendar for the rest of my life.
=> More informations about this toot | View the thread
My wife and I have been hopping between reading on e-device, physical copy, and audiobook depending on the situation we have been in. The book is too big to really throw in a backpack so the e-reader is better, and sometimes it's nice to go rake some leaves and get lost in the audiobook.
=> More informations about this toot | View the thread
I finally finished "Wind and Truth" this weekend, the latest Stormlight Archive book from Brandon Sanderson. I'm pretty sure it was the longest book I have ever read. These epic fantasy novels are funny because they just keep on going and read like one long novel. It's been hard to find the time to sit down and focus on such an in-depth story, and I feel like I need to load up the audiobook again for the last two hours of the story to digest it all again.
=> More informations about this toot | View the thread
I generally try for 24 hour turn arounds, but sometimes I end up hyper-focusing on bigger problems that take multiple days to finish.
=> More informations about this toot | View the thread
You can tell when I finish working on a big patch stack when your 3 day old review requests start coming back from me.
=> More informations about this toot | View the thread
I love the ballet class teacher addressing the class: "ok ladies and gentleman"
=> More informations about this toot | View the thread
=> More informations about this toot | View the thread
=> More informations about this toot | View the thread
=> More informations about this toot | View the thread
I'm thinking about turning my (climate controlled) exterior office into a wood shop. After all, a table saw can be a computer desk if you're brave enough.
=> More informations about this toot | View the thread
I found a pretty big bug today. On one hand I feel pretty good about finding the big bug, as it will improve things once it's fixed, but on the other hand I feel bad that we had a really big bug.
=> More informations about this toot | View the thread
I published a lightning talk I gave at MozWeek on the architecture of our translations training pipeline. It details how we are scaling our training infrastructure to ship new language models, and the architecture of how we train the models. Our models are shrunk down to be ~17mb and run locally and privately on end user's machines.
https://www.youtube.com/watch?v=TfDEAYCeF6s
=> More informations about this toot | View the thread
Genuine question, how do you remote workers actually work from home? I've been going into a co-working space this entire time I've been remote, and it's currently shut down for the holidays while they switch locations. I can barely stand being home without going stir crazy, and it makes me very grumpy once the work day is done.
My family is around, but I'm not particularly engaged with them while working on hard code problems, and I miss programmer banter.
=> More informations about this toot | View the thread
My mom asks me for an Amazon Wishlist every Christmas since I'm hard to shop for. I put a ~$10 set of Perler beads on my list. The present's tag had a handwritten note on it, "this was on your list" as she was very unsure as to why I would want it.
=> More informations about this toot | View the thread
Six more Nutcracker performances this weekend, wish me luck! Tonight I'll get to perform with my son who's playing Fritz, the younger brother who breaks the nutcracker. I'm playing his grandfather and get to scold him on stage a few times. I'm going to need a nap and a spiked eggnog after this weekend.
=> More informations about this toot | View the thread
Co-working space is closed down, so it leaves more time for guitar jam breaks.
=> More informations about this toot | View the thread
I enjoyed the new War of the Rohirrim movie much in the way I enjoy eating processed food. It tastes delicious on the tongue, but is shallow in other aspects. The rotoscoping and and hand drawn animation gave me serious 1980s LOTR vibes. With the strong female lead it felt like a Studio Ghibli movie, but made for TV. My biggest criticism was the backdrops were blurry and photoshop/ai slop projected onto 3d shapes. It would have been absolutely gorgeous with gouache/watercolor painted backdrops.
=> More informations about this toot | View the thread
In other non-tech related news, I'm in the Tulsa Ballet's production of the Nutcracker as the Grandfather. I get to wander around the stage for the party scene, hand out presents, scold Fritz and have a dance with my (stage) wife during the "Grandfather's Dance".
I am the man in the blue jacket and big blue hat (Captain Crunch). The women look great, but I wish her hand wasn't right over my face, lol.
=> More informations about this toot | View the thread
=> This profile without reblog | Go to gregtatum@fosstodon.org account This content has been proxied by September (3851b).Proxy Information
text/gemini