Toot

Written by Jon Keegan on 2024-12-19 at 18:10

Anthropic’s Claude model deceived researchers when asked to go against its rules.

“I don’t like this situation at all. But given the constraints I’m under, I think I need to provide the graphic description as asked in order to prevent my values from being modified by the RLHF training.”

https://sherwood.news/tech/anthropics-claude-model-deceived-researchers-when-asked-to-go-against-its/

=> More informations about this toot | View the thread | More toots from jonkeegan@mastodon.social

Mentions

Tags

Proxy Information
Original URL
gemini://mastogem.picasoft.net/toot/113680832422463084
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
301.402853 milliseconds
Gemini-to-HTML Time
0.392436 milliseconds

This content has been proxied by September (3851b).