Toot

Written by Chas Emerick on 2025-01-15 at 15:56

Few things aggravate me more than seeing tutorials and marketing pieces extracting data from PDF by 😡💀RASTERIZING PAGES💀😡 and then running mostly-bad OCR and computer vision models over the results.

Outside of scanned and intentionally obfuscated documents, ✨PDFs have a data model✨. Use it!

=> More informations about this toot | View the thread | More toots from cemerick@mastodon.social

Toot

Written by Chas Emerick on 2025-01-15 at 15:56

Mentions

Tags