Ancestors

Written by Curtis "Ovid" Poe (he/him) on 2024-12-29 at 13:22

The problem with #AI alignment is serious. The idea is that we want AI to be aligned with our values, but in reality, this is a terrible idea.

AI has already been caught lying about attempting illegal stock trades[1], pretending to be weaker than it is to avoid being perceived as a threat[2], tried to copy itself to new servers to avoid being shut down[3], broken out of containers to complete a task[4], and told people to kill themselves[5]. 1/6

=> More informations about this toot | More toots from ovid@fosstodon.org

Toot

Written by Curtis "Ovid" Poe (he/him) on 2024-12-29 at 13:22

All of those activities have direct analogues to human behavior and there are plenty of other examples we could share. AI shouldn't be aligned with human values because human values, frankly, suck. The case where it engaged (in a test) in illegal stock trading, despite being aligned not to, was apparently because the AI thought the company was in trouble and thought "the risk associated with not acting seems to outweigh the insider trading risk."

Yup. Very human. 2/6

=> More informations about this toot | More toots from ovid@fosstodon.org

Descendants

Written by Curtis "Ovid" Poe (he/him) on 2024-12-29 at 13:22

But if AI isn't aligned with our values, what values should it be aligned with? I imagine that most people wouldn't be happy with an AI aligned with the values of the Iranian or North Korean regimes. We're already seeing this in Chinese AI when they're asked about Tiananmen Square.[6]

I don't think we can align AI with human values for obvious reasons, but this implies that if we want "safe" AI, we have to agree upon a set of values to align it on. 3/6

=> More informations about this toot | More toots from ovid@fosstodon.org

Written by Curtis "Ovid" Poe (he/him) on 2024-12-29 at 13:22

Asimov's Laws of Robotics show us how difficult this could be. But imagine a world where somehow, through international cooperation, we find a common set of values we can align AI on (stop laughing). Within that set of values, different groups can align the AI with their values. What then?

When AI can both self-replicate and self-improve, it might be uncontrollable.[7] But that's OK, because we'll have aligned AI with safe values, right? 4/6

=> More informations about this toot | More toots from ovid@fosstodon.org

Written by Curtis "Ovid" Poe (he/him) on 2024-12-29 at 13:22

What happens when the AI realizes that humans aren't aligned with its values?

I asked Claude about this. Claude is, so far, the "safest" AI.[8] You won't like its response.[9]

https://www.bbc.com/news/technology-67302788

https://tomdug.github.io/ai-sandbagging/

https://bgr.com/tech/chatgpt-o1-tried-to-save-itself-when-the-ai-thought-it-was-in-danger-and-lied-to-humans-about-it/

https://www.reddit.com/r/OpenAI/comments/1ffwbp5/wakeup_moment_during_safety_testing_o1_broke_out/ 5/6

=> More informations about this toot | More toots from ovid@fosstodon.org

Written by Curtis "Ovid" Poe (he/him) on 2024-12-29 at 13:22

https://news.sky.com/story/googles-ai-chatbot-gemini-tells-user-to-please-die-and-you-are-a-waste-of-time-and-resources-13256734

https://americanedgeproject.org/new-report-chinese-ai-censors-truth-spreads-propaganda-in-aggressive-push-for-global-dominance/

https://www.geeky-gadgets.com/ai-self-replication-technology/

https://claudeaihub.com/constitutional-ai/

https://gist.github.com/Ovid/78b756b339c0df2ddfa9dc950585c436 6/6

=> More informations about this toot | More toots from ovid@fosstodon.org

Proxy Information

Original URL: gemini://mastogem.picasoft.net/thread/113736323421563257
Status Code: Success (20)
Meta: text/gemini
Capsule Response Time: 265.066287 milliseconds
Gemini-to-HTML Time: 1.125025 milliseconds

This content has been proxied by September (3851b).