Toot

Written by Andrew W Swan on 2025-01-26 at 16:10

Despite these problems I feel that chain-of-thought llms are getting close to being able to solve this kind of problem. The particular issues that come up here could be solved either by tool use during the chain of thought to actually look at the api, or by more training, specifically on the Agda standard library in this case. OpenAI have announced some impressive benchmarks for their new o3 model, so maybe that will be the first to succeed on this question.

=> More informations about this toot | View the thread | More toots from aws@mathstodon.xyz

Mentions

Tags

Proxy Information
Original URL
gemini://mastogem.picasoft.net/toot/113895528248548075
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
252.536254 milliseconds
Gemini-to-HTML Time
0.272824 milliseconds

This content has been proxied by September (3851b).