so openai claims to be doing great on the FrontierMath dataset. I’ve already seen the usual sort of dipshits using this to pump ai on reddit. here’s a post that went to the frontpage on HN:
…wordpress.com/…/can-ai-do-maths-yet-thoughts-fro…
(tl;dr only a few problems from the dataset are public but if representative the problems are about 25% survivable by an undergrad; coincidentally this is the % openai says their models are completing.)
this post is by kevin buzzard. he has a let’s say not widely beloved personality, but I don’t think of him as credulous or grifty, and people in his area regard him as an excellent mathematician.
he points out but I think does not focus enough on how discrediting the secretive nature of the dataset is. the fact that you can’t make it public is necessary to run such experiments in a scientifically reasonable way, but also makes it totally impossible to run the experiment in a scientifically reasonable way. an experiment which cannot be examined, looked into or reproduced is actually the opposite of science. it’s pure grift fuel
=> More informations about this toot | View the thread | More toots from sc_griffith@awful.systems
=> View BlueMonday1984@awful.systems profile
text/gemini
This content has been proxied by September (3851b).