Run 2 completed overnight! The results seem - worse - at a quick glance. The LLM hallucinated more categories this time, and assigned more examples to the hallucinated categories.
I think I’ll kick off another run without changing category definitions. Curious how the results will vary.
=> More informations about this toot | View the thread | More toots from dachary@dacharycarey.social
text/gemini
This content has been proxied by September (3851b).