Toots for bitsgalore@digipres.club account

Written by Johan van der Knijff on 2025-01-21 at 15:28

I just updated my 2023 post on extracting text from #EPUB files in #Python, and added an evaluation of #PyMuPDF (which also supports EPUB!). Includes link to demo script.

https://www.bitsgalore.org/2023/03/09/extracting-text-from-epub-files-in-python

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2025-01-15 at 16:20

Useful - "SingleFile helps you to save a complete web page into a single HTML file. SingleFile is a Web Extension (and a CLI tool) compatible with Chrome, Firefox (Desktop and Mobile), Microsoft Edge, Safari, Vivaldi, Brave, Waterfox, Yandex browser, and Opera."

https://github.com/gildas-lormeau/SingleFile

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2025-01-09 at 14:52

From a recent (December!) Ars Technica forum thread:

"I have been having this itch lately to get back into burning cd/dvds/bluray. It's like the early 2000s are calling me again. I want to burn 3d movies and blu ray and though I like streaming music, I have this fantasy of getting a wicked nice audio system and making my entire library cds again."

😱 #WheelOutTheDigitalDarkAgeKlaxon 📯 #wtfOpticalMedia 💿 📀

https://arstechnica.com/civis/threads/am-i-the-only-one-who-wants-to-buy-an-external-burner-again.1504663/

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-31 at 14:17

ICYMI, the Digital Dark Age Crew vs the Millennium bug #Y2K #WheelOutTheDigitalDarkAgeKlaxon 📯 - "tomorrow, we could all be living in the dark ages!"

https://www.youtube.com/watch?v=pJ86meMFqT8

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-30 at 16:45

Some stills from the video #y2k:

=> View attached media

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-30 at 15:32

And some more info in this post #Y2K:

https://www.bitsgalore.org/2024/12/30/y2k

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-30 at 15:30

Just in time before the end of 2024, the Digital Dark Age Crew are back!

Always fashionably late, after 25 years they finally finished "Y2K", their legendary, previously unfinished track that addresses the threats of the year 2000 #Y2K problem.

Enjoy, and remember: tomorrow, we could all be living in the dark ages! #WheelOutTheDigitalDarkAgeKlaxon 📯

https://www.youtube.com/watch?v=pJ86meMFqT8

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-30 at 13:13

It's almost time to #WheelOutTheDigitalDarkAgeKlaxon 📯 again, as a new track by the Digital Dark Age Crew is about to drop. Watch this space!

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-27 at 19:27

So this is really useful for any odd jobs where you have to cut some large video (or audio) files into smaller segments - #LosslessCut is "The swiss army knife of lossless video/audio editing" (it's basically a very good GUI around #ffmpeg):

https://github.com/mifi/lossless-cut

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-18 at 16:58

Just updated this 2021 post on #opensource #PDF processing tools, and added a section on the Arlington Model PDF Checker (IMO the Arlington Model is really where the future of PDF validation is at):

https://www.bitsgalore.org/2021/09/06/pdf-processing-and-analysis-with-open-source-tools

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-17 at 13:37

More about this bug on the website of the researcher who first discovered it:

https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres_are_switching_written_numbers_when_scanning

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-17 at 13:37

Interesting, both the German Federal Office for Information Security and the Swiss Coordination Office for the Permanent Archiving of Electronic Documents advise against the use of #JBIG2 compression in scanned #PDF documents.

This was prompted by the discovery in 2013 of the infamous "swapped characters" bug in Xerox photocopiers:

https://en.wikipedia.org/wiki/JBIG2#:~:text=not%20the%20same.%5B13%5D%5B14%5D%5B15%5D-,Character%20substitution%20errors%20in%20scanned%20documents,-%5Bedit%5D

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-13 at 16:55

On a related note, time for a periodic reminder this exists #WheelOutTheDigitalDarkAgeKlaxon 📯:

https://www.youtube.com/watch?v=C47ZCosJPAw

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-13 at 16:49

Time again to #WheelOutTheDigitalDarkAgeKlaxon 📯, "Averting the Digital Dark Age: How Archivists, Librarians, and Technologists Built the Web a Memory" (ht @peterwebster, @paigeroberts):

https://muse.jhu.edu/book/123276

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-13 at 12:44

New blog post for #fileformatfriday - #PDF Quality assessment for #digitisation batches with #Python, #PyMuPDF and #Pillow. This introduces the new #Pdfquad tool, which might be useful for others as well:

https://www.bitsgalore.org/2024/12/13/pdf-quality-assessment-for-digitisation-batches-with-python-pymupdf-and-pillow

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-06 at 11:21

"Japanese digital dub legend Takafumi Noda aka Mystica Tribe meets up with Danny Wolfers Legowelt again for this unique concoction of digital dub with raw lo-fi synth wave.

Buckets of echo, crumbled tapes, mind altering phasers, luring spring reverbs, sloppy rhythm boxes, HEAVY INTENSE BASS and the ever mystifying melodica playing from Noda bring this project to a new level."

https://legowelt.bandcamp.com/album/evil-fades-in-echo

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-04 at 11:15

cc @roelgrif @ABScientist

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-12-04 at 11:09

Kom er zojuist achter dat de #Covid19 #corona 🦠 vaccinatiecampagne 💉 eind deze week (6 december) eindigt!

En de GGD meldt op Twitter dat vandaag de laatste dag is waarop je online een afspraak kunt inplannen (bron: https://x.com/GGDGHORNL/status/1863505646693327024)

Zojuist (nog net op tijd!) een afspraak gemaakt op:

https://planjeprik.nl

Hoort zegt het voort! #PakDiePrik

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-11-15 at 14:04

ICYMI - are "octal escape sequences" in #PDF strings really a preservation risk, as claimed by the authors of the recent "The Phantom 👻 of a PDF File" blog post?

Some quick tests I did with eight different PDF processing tools suggest they're not, and #JHOVE's inability to handle them really seems to be the exception here #wtfPDF #fileformatfriday

https://www.bitsgalore.org/2024/11/14/escape-from-the-phantom-of-the-pdf

=> More informations about this toot | View the thread

Written by Johan van der Knijff on 2024-11-14 at 16:39

The authors of the recent "The Phantom of a PDF File" blog post argue that "octal escape sequences" in #PDF strings are a potential preservation risk.

But some quick tests with 8 different PDF tools suggest that #JHOVE is really the only tool that can't handle them!

Details in my new blog post "Escape from the phantom of the PDF" #wtfPDF 👻 :

https://www.bitsgalore.org/2024/11/14/escape-from-the-phantom-of-the-pdf

=> More informations about this toot | View the thread

=> This profile with reblog | Go to bitsgalore@digipres.club account

Proxy Information
Original URL
gemini://mastogem.picasoft.net/profile/95241
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
706.318277 milliseconds
Gemini-to-HTML Time
4.822726 milliseconds

This content has been proxied by September (ba2dc).