Toot

Written by Chas Emerick on 2025-01-15 at 15:56

Few things aggravate me more than seeing tutorials and marketing pieces extracting data from PDF by 😡💀RASTERIZING PAGES💀😡 and then running mostly-bad OCR and computer vision models over the results.

Outside of scanned and intentionally obfuscated documents, ✨PDFs have a data model✨. Use it!

=> More informations about this toot | View the thread | More toots from cemerick@mastodon.social

Mentions

Tags

Proxy Information
Original URL
gemini://mastogem.picasoft.net/toot/113833185853953957
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
222.499177 milliseconds
Gemini-to-HTML Time
0.299031 milliseconds

This content has been proxied by September (3851b).