[#]rstats
Strategies for dealing with tidying multiple large CSV files, each of varying dimensions, which are in a list.
They all have the first 4 rows of useless text. Varying column widths.
The next several rows (could be one, could be four) are what should be column headers. No way of knowing how many there are without painstakingly going through each.
The last 6 rows are useless, and can be discarded.
I have a hacky solution but interested to hear how others would start to tackle this
=> More informations about this toot | More toots from johnmackintosh@fosstodon.org
@johnmackintosh Column header rows: Try to find a pattern which you can generalise to find the number of rows to extraxt as headers. Blind guess: the number of columns in the data set should be the same as the number of column header rows. Once you know these numbers you can use them when skipping the first N rows during import. Since these might be different for each data set, I'd use mapply instead of lapply.
=> More informations about this toot | More toots from stitam@fosstodon.org
text/gemini
This content has been proxied by September (3851b).