Didnt think I had anything in common with Mike Portnoy but this was my first time hearing one of her songs also
Mike PortnoyHears Taylor Swift For The First Time
https://youtube.com/watch?v=cHl_gsd0OR0
You don't have to be into drums to appreciate this series, it's one of the best things on the internet.
His first "off-the-cuff" take is outstanding
=> More informations about this toot | View the thread
Current status : resisting the urge to install R on a recently liberated raspberry pi.
Going to use it for general Linux learning instead (should have added that to my latest blog post..)
=> More informations about this toot | View the thread
[#]rstats
Let the header image speak for itself:
https://johnmackintosh.net/blog/2025-01-01-once-more/
=> More informations about this toot | View the thread
While this was a bit of a faff (until I figured out the way forward), I will always take a bunch of text files containing all the data over some godforsaken table builder or anything to do with SPARQL.
"Here's some open data"
What format is it in?
"Terrible"
How do I access it?
"Register for API access. Mix one part of extract of dill with morning dew immediately following the equinox. Change your name to Keith. Make 29 selections on this shiny app. Defeat Medusa. Download as JSON. Cry a lot."
=> More informations about this toot | View the thread
Irregular untidy CSV files hold no fear for me
https://johnmackintosh.net/blog/rstats/2024-12-22-tidying-text-files/
(Early draft, liable to update if I can be bothered. Code on GitHub)
[#]RStats
=> More informations about this toot | View the thread
Think I've cracked it lads.
Edge cases have been addressed, and 70 tidy (and massive) .tsv files are now in place.
Next stop, duckdb and / or parquet
=> More informations about this toot | View the thread
[#]rstats
Strategies for dealing with tidying multiple large CSV files, each of varying dimensions, which are in a list.
They all have the first 4 rows of useless text. Varying column widths.
The next several rows (could be one, could be four) are what should be column headers. No way of knowing how many there are without painstakingly going through each.
The last 6 rows are useless, and can be discarded.
I have a hacky solution but interested to hear how others would start to tackle this
=> More informations about this toot | View the thread
[#]rstats
Recommend me your go-to resources for working with (deeply) nested lists / list columns and purrr, that aren't R4DS or Advanced R.
Thank you
=> More informations about this toot | View the thread
[#]rstats
Late, but #AdventOfCode day 1,with #rdatatable
=> More informations about this toot | View the thread
[#]rstats
https://github.com/johnmackintosh/cusumcharter
It's taken 3 years, but {cusumcharter} is on the brink of 10K CRAN downloads.
I know I've used it, and one other person got in touch about it when it first hit CRAN, but it's been radio silence otherwise.
I developed it really quickly - it was on CRAN within a week of getting the idea.
It also passed CRAN checks first time.
For that alone, it has a special place in my heart.
I may get round to tidying it up and doing a new release, no promises though.
=> More informations about this toot | View the thread
[#]rstats
duckdb, duckplyr, data.table and purrr is one heck of a combination, just so you know
=> More informations about this toot | View the thread
[#]rstats misspelled "comorbidity" as "combordity" and was wrongly annoyed at my rubbish purrr skills
Fixed the typo and everything worked as would be expected if it was written by someone vaguely competent
=> More informations about this toot | View the thread
I am thinking of creating a package of helper functions for #rdatatable that makes things like DT[, .N, by][order(-N)] and other common actions a lot easier, and which also works with the new programming interface.
That particular code for descending sort is something I've written hundreds of times, and I'm fed up of it. Wrapping it into a simple function has been a real boon in my latest project.
If anyone thinks this package may be a good idea, let me know (somehow) on this post #RStats
=> More informations about this toot | View the thread
[#]RStats
Was wondering why my notifications were going nuts over on the corporate BS site:
Yan Holtz's data-to-viz site is a great resource, and I'm always pleased to see the reaction this plot gets.
https://www.linkedin.com/posts/yan-holtz-2477534a_dataviz-activity-7254853100564328449-BP9z
Incidentally, a "hot-take" in the replies was that this could still have been a line chart, but I cannot see, for this level of detail, how on earth that could have worked?
=> More informations about this toot | View the thread
[#]rstats absolutely love the fact that a colleague can mention a statistical technique / public health method I've never heard of but I can Google it + "r package" and get several results.
Even better when one of them is powered by rdatatable and {checkmate} , was updated very recently, and is really straightforward to use with a {pkgdown} website
=> More informations about this toot | View the thread
[#]rstats
Spent some time working with {duckplyr} and {arrow} to save population time series data as parquet files.
Wrote a generalised importing function so I can filter out what I need.
So if I only need 48K rows out of 390K, I only now load the rows I need, and the rest can stay untouched.
I'm pretty much sold on this file format already.
=> More informations about this toot | View the thread
Looks pretty good , a tidy #rstats "solver"
I love stuff like this, but never seem to get the time to figure out how to apply them to real life NHS problems.
https://github.com/colin-fraser/tidyLP
=> More informations about this toot | View the thread
[#]rstats
Footering about with duckdb.
Discovered I can wrap setDT() around the whole tbl(con,data) |> wrangling thing, collect it and do some #rdatatable goodness, using some custom helper functions for some very common actions
It does mean a mix of old pipe and new pipe but I don't care about that so much.
Wish I'd experimented with this ages ago.
Edit: it's faster to wrap it in setDT too, rather than collect () then setDT on result. Didn't expect that TBH
=> More informations about this toot | View the thread
LinkedIn:
"🎵You've got an🎵 unread notification🎵"
=> More informations about this toot | View the thread
[#]RStats
Flashback to the time I wrote a function called "super-hands" - "for when your data is a bit more-ish"
(This only makes sense if you've seen Peep Show , and even then...it's pushing things a bit).
I did give it a more sensible name later on, honest
=> More informations about this toot | View the thread
=> This profile with reblog | Go to johnmackintosh@fosstodon.org account This content has been proxied by September (3851b).Proxy Information
text/gemini