Oh wow what a shit show, people should start being more sceptical about all these LLM benchmarks:
"FrontierMath was funded by OpenAI"
https://www.lesswrong.com/posts/cu2E8wgmbdZbqeWqb/meemi-s-shortform
Goodhart's law never fails...
=> More informations about this toot | View the thread | More toots from trobador@mastodon.social
text/gemini
This content has been proxied by September (ba2dc).