Tux Machines

Large Language Models and Openwashing

Posted by Roy Schestowitz on May 14, 2023

=> 'Linux' Foundation Leftovers | Programming Leftovers

Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs

=> ↺ Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs

Introducing MPT-7B, the latest entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k. Starting today, you can train, finetune, and deploy your own private MPT models, either starting from one of our checkpoints or training from scratch. For inspiration, we are also releasing three finetuned models in addition to the base MPT-7B: MPT-7B-Instruct, MPT-7B-Chat, and MPT-7B-StoryWriter-65k+, the last of which uses a context length of 65k tokens!

Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models

=> ↺ Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models

The RedPajama project aims to create a set of leading open-source models and to rigorously understand the ingredients that yield good performance. A few weeks ago we released the RedPajama base dataset based on the LLaMA paper, which has galvanized the open-source community. The 5 terabyte dataset has been downloaded hundreds of times and used to train models like MPT, OpenLLaMA, OpenAlpaca. Today we are excited to release RedPajama-INCITE models, including instruct-tuned and chat versions.

Large Language Models can Simulate Everything

=> ↺ Large Language Models can Simulate Everything

TL; DR: Simulation is the only way to forecast how future complex / AI systems will misbehave.

This is post #1 in a series of 3 outlining my current views on AI. Part 1 focuses on the need for improving how people think, rather than improving their leverage over the world. Part 2 gives “no brainer,” objective strategies helpful for improving the safety of ML systems on the margin. Part 3 focuses on the best ways to measure and empirically evaluate ML systems as they are deployed in the world.

A hot take: the #2 most important use case for AI in the next decade will be performing large-scale, in-silico sociological simulations.

This has huge potential for safety; in a world where 99% of AI innovations make us more productive with less oversight (giving us a bigger hammer), it’s important to better understand where to point that hammer. Simulation and forecasting techniques can help us improve institutional decision-making, provide plausible tail scenarios with natural language feedback, and help us run instant, virtual A/B tests to iterate faster on all levels of policy and design.

=> gemini.tuxmachines.org

Proxy Information

Original URL: gemini://gemini.tuxmachines.org/n/2023/05/14/Large_Language_Models_and_Openwashing.gmi
Status Code: Success (20)
Meta: text/gemini;lang=en-GB
Capsule Response Time: 139.875814 milliseconds
Gemini-to-HTML Time: 0.540975 milliseconds

This content has been proxied by September (ba2dc).