It is such a stupid and obvious market failure that nobody has made a consumer AI LLM product that is 1. trained on consensually-acquired material 2. powered with renewable energy 3. genuinely open about its weights and models. Just achieving these things and being creator-friendly would be massive.
=> More informations about this toot | More toots from anildash@me.dm
@anildash Especially one that you could then train on your own data. Just the "run on renewable energy" alone would make it huge.
=> More informations about this toot | More toots from aburtch@triangletoot.party
@anildash I'd hoped that Apple was doing this, when they signed that licensing deal with the New York Times. Perhaps they'll do it yet.
=> More informations about this toot | More toots from waldoj@mastodon.social
@anildash Sorry, #2 doesn't help if vast amounts of energy are required. The reason is that at any given time, there's only so much renewable energy available, and if a wealthy company buys it up for AI use, the electricity grid will have to burn more fossil fuel to keep up with total demand. That means that lower power approaches are urgently needed.
=> More informations about this toot | More toots from not2b@sfba.social
@not2b @anildash
We need to define “runs on renewable energy” to mean that you have to finance and install your own renewable energy infrastructure, and export enough energy to offset the times you have to consume from the grid.
=> More informations about this toot | More toots from KimSJ@mastodon.social
@anildash i was actually thinking, as an art project, of getting a solar panel and doing this with a collection of CC0 content. i decided not to do pursue this after seeing how things developed with OpenAI, on the grounds that if a true-open model existed, the proponents of closed/stolen models would point to my open model to go "see? AI doesn't have to be based on stolen content!" then continue using the stolen content.
=> More informations about this toot | More toots from mcc@mastodon.social
@anildash Put a different way, I think one reason this doesn't exist is that the presence of stolen material in LLM models is not a flaw, but the primary attraction. Copyright laundering is the core product.
If the users did not want to do copyright laundering, then the product might not even need the machine learning model at all, in that world a simple tag system might be adequate. The purpose the model serves in the system is to randomize the inputs enough to disguise the sources.
=> More informations about this toot | More toots from mcc@mastodon.social
@mcc @anildash i think its very easy to argue that people who licensed their work under CC0 (or any CC for that matter) did not actively consent to having their work used to train LLMs, a technology that didn't exist (at least in the mainstream) at the time of licensing.
=> More informations about this toot | More toots from kim@social.gfsc.studio
@kim @anildash i don't think this is a serious argument.
=> More informations about this toot | More toots from mcc@mastodon.social
@mcc @anildash excuse me?
=> More informations about this toot | More toots from kim@social.gfsc.studio
@kim @anildash With regard to CC0, I mean. CC0 is categorically different from the others.
=> More informations about this toot | More toots from mcc@mastodon.social
@kim @mcc @anildash doesn’t pre-OpenAI produced content licensed CC0 assume that another person would be remixing or using it?
Iirc Creative Commons looked at adding llm exclusion clauses, but ultimately decided against since they didn’t think it would be enforceable
=> More informations about this toot | More toots from django@social.coop
@mcc @anildash it’s why i call it plagiarism-as-a-service. because it is plagiarism.
=> More informations about this toot | More toots from blogdiva@mastodon.social
@mcc @anildash As an absolute layperson, it appears that there's this weird legal situation where if you cause harm to so very many people that it's impossible to tell exactly who is hurt by your actions, you basically get away with it because no one can prove in court that they in particular were hurt.
LLMs appear to be popular with capital owners largely on the basis that they can efficiently exploit this hack.
=> More informations about this toot | More toots from xgranade@wandering.shop
@xgranade @mcc @anildash
Pollution is like this: Very hard to prove exactly which toxic molecule made you sick and what factory it came from.
=> More informations about this toot | More toots from unikitty@kolektiva.social
@unikitty @xgranade @mcc @anildash
And the health insurance industry (in the US)
=> More informations about this toot | More toots from naught101@mastodon.social
@mcc They should be called Plagiarism Machines. That's all they are--they steal everyone else's words, and restate them just enough to try not to get sued.
@anildash
=> More informations about this toot | More toots from zenheathen@mstdn.ca
@zenheathen @mcc I have a local one I trained on my own words so I can make art things for myself. It’s valid.
=> More informations about this toot | More toots from anildash@me.dm
@anildash @zenheathen @mcc Yes. This is informed consent.
=> More informations about this toot | More toots from svgeesus@mastodon.scot
@anildash I would love to learn about the technical details
=> More informations about this toot | More toots from nelson@tech.lgbt
@nelson @anildash ditto
=> More informations about this toot | More toots from bruceoberg@xoxo.zone
@mcc I think about this a lot. The “then they’ll use it to justify the bad thing”. But they do that anyway, and we end up without the ethical thing. Like… we’re on Mastodon. You know who literally forked it to make a fascist network. They would have done that anyway! But this is still a thing of value.
=> More informations about this toot | More toots from anildash@me.dm
@anildash probably because they know it's simply is not tenable when applied to generic search.
All the more ethical ones seem limited in scope and breadth from what I've noticed, such as in academia and science, using smaller, more focused data sets.
All the ones that are too ambitious in scope are all bad because they're targeting everyone, and for any kind of use case. I think it's basically greed.
=> More informations about this toot | More toots from 990000@mstdn.social
@anildash Not many would consent unless they could get a share of the profits. That requires reducing profts of shareholders, and attribution (which is a hard problem for LLMs).
Data centres use enormous amounts of power and water. Both are limited resources so they'll interfere with decarbonisation elsewhere.
The profit model is based on taking jobs away from people. So consensual training and renewable power won't make it ethical, or stop them from being glorified word salad generators.
=> More informations about this toot | More toots from xironwu@mastodon.social
@anildash the corporate models that are spending billions of dollars to harness a country's worth of power and boils the oceans to train on unethically sourced data results in a service that isn't appropriate for deployment as anything more sophisticated than a toy (despite how people are actually using them). An "organic" version would perform even worse than the corpo ones that are already unpopular and failing.
=> More informations about this toot | More toots from smn@l3ib.org
@smn I don’t believe that’s true. I believe it might enable purpose-specific smaller models that are useful.
=> More informations about this toot | More toots from anildash@me.dm
@anildash Can they be open about their weights and models? My (limited) understanding of LLMs is that the models are iterated over kind of genetically and no one actually understands the specifics of a given model.
=> More informations about this toot | More toots from smitty@dice.camp
@anildash if that model existed, it’d be worse than GPT-1, you need insane amount of data for good performance
=> More informations about this toot | More toots from piyuv@techhub.social
@piyuv depends what it’s for. I’m not sure that’s correct.
=> More informations about this toot | More toots from anildash@me.dm
@anildash there's some related work here, https://www.fairlytrained.org/certified-models
they certify models trained on consensual data
=> More informations about this toot | More toots from kris@ghostly.garden
@anildash I was just thinking the other day that it would be nice for text-based games if they had a LLM-driven way to generate a picture of your character from its text description of all the items you're carrying, only with the LLM trained on works produced by the game's own development team.
(Just idle speculation, I have no idea of the technical feasibility or power demands)
=> More informations about this toot | More toots from Hyperlynx@aus.social
@anildash I'm sorry, AI blows chunks except in very, very narrow use categories - NONE of which include search engines used by the general public. AI is an information wrecking ball.
=> More informations about this toot | More toots from Axomamma@mastodon.online
@Axomamma I’m not suggesting a search engine?
=> More informations about this toot | More toots from anildash@me.dm
@anildash I hate AI with heat of a thousand suns. It comes up as an issue for me primarily with searches. Does it matter whether you suggested search engines? It's a blight.
=> More informations about this toot | More toots from Axomamma@mastodon.online
@Axomamma you should try arguing with whoever is advocating for that, then? like, it's good you're self-aware that you're not engaging in this conversation at an intellectual level.
=> More informations about this toot | More toots from anildash@me.dm
@anildash Fuck you.
=> More informations about this toot | More toots from Axomamma@mastodon.online
@anildash It is not a market failure since there is no demand for such a product.
=> More informations about this toot | More toots from jairajdevadiga@mastodon.social
@anildash I wish this 3-point standard was really famous, like, 2 years ago
=> More informations about this toot | More toots from whophd@ioc.exchange
@anildash With whose money? What’s the business case?
=> More informations about this toot | More toots from NewtonMark@eigenmagic.net
@anildash Well ... there's one single word that explains all, no, two: greed + profit.
=> More informations about this toot | More toots from NatureMC@mastodon.online This content has been proxied by September (ba2dc).Proxy Information
text/gemini