Ancestors

Written by John MacKintosh on 2024-12-20 at 08:04

[#]rstats

Strategies for dealing with tidying multiple large CSV files, each of varying dimensions, which are in a list.

They all have the first 4 rows of useless text. Varying column widths.

The next several rows (could be one, could be four) are what should be column headers. No way of knowing how many there are without painstakingly going through each.

The last 6 rows are useless, and can be discarded.

I have a hacky solution but interested to hear how others would start to tackle this

=> More informations about this toot | More toots from johnmackintosh@fosstodon.org

Written by John MacKintosh on 2024-12-20 at 18:02

Think I've cracked it lads.

Edge cases have been addressed, and 70 tidy (and massive) .tsv files are now in place.

Next stop, duckdb and / or parquet

=> More informations about this toot | More toots from johnmackintosh@fosstodon.org

Toot

Written by Dave Mason on 2024-12-20 at 18:52

@johnmackintosh

Glad you got it worked out.

I didn't have any good #rstats advice. But you made me think about how I'd handle it via T-SQL...

=> More informations about this toot | More toots from DaveMasonDotMe@mastodon.social

Descendants

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113686660865627022
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
305.914424 milliseconds
Gemini-to-HTML Time
0.614702 milliseconds

This content has been proxied by September (3851b).