Oh wow what a shit show, people should start being more sceptical about all these LLM benchmarks:
"FrontierMath was funded by OpenAI"
https://www.lesswrong.com/posts/cu2E8wgmbdZbqeWqb/meemi-s-shortform
Goodhart's law never fails...
=> More informations about this toot | More toots from trobador@mastodon.social
text/gemini
This content has been proxied by September (3851b).