Few things aggravate me more than seeing tutorials and marketing pieces extracting data from PDF by 😡💀RASTERIZING PAGES💀😡 and then running mostly-bad OCR and computer vision models over the results.
Outside of scanned and intentionally obfuscated documents, ✨PDFs have a data model✨. Use it!
=> More informations about this toot | View the thread | More toots from cemerick@mastodon.social
text/gemini
This content has been proxied by September (3851b).