I wonder what the probability that 32 random bytes happens to be valid UTF-8 is
=> More informations about this toot | More toots from monoidmusician@tech.lgbt
@monoidmusician its 1 in UTF-8 Codec Can’t Decode Byte
=> More informations about this toot | More toots from isAdisplayName@mathstodon.xyz
@monoidmusician about 0
=> More informations about this toot | More toots from pierogiburo@tech.lgbt
@pierogiburo well it’s at least 1/32, if all the top bits happen to be zero. but i see where you’re coming from :3
=> More informations about this toot | More toots from monoidmusician@tech.lgbt
@monoidmusician that's 1/2^32, no?
=> More informations about this toot | More toots from pierogiburo@tech.lgbt
@pierogiburo oh wait, yeah you’re right. i knew that last week :neocat_woozy:
=> More informations about this toot | More toots from monoidmusician@tech.lgbt
@monoidmusician :neocat_pat_woozy:
=> More informations about this toot | More toots from pierogiburo@tech.lgbt
@pierogiburo that said, i’ve rolled one before in way less than 2^32 attempts, so it feels like it is a bit more reasonable of a number?
=> More informations about this toot | More toots from monoidmusician@tech.lgbt
@monoidmusician hmmm lemme think
say P(n) is the probability that n bytes form a valid utf-8 string. define P(0)=1
so we have 1/2 probability that the highest bit is 0, so we have a sequence of length 1
1/8 probability that the highest bits are 110, times 1/4 probability that the next byte is continuation byte, sequence length 2
1/161/41/4 probability of valid sequence length 3
1/321/41/4*1/4 probability of valid sequence length 4
P(n)=P(n-1)/2 + P(n-2)/32 + P(n-3)/256 + P(n-4)/2048
i'm from my phone rn, so can't check what P(32) evaluates to, but it seems unlikely to be that big
=> More informations about this toot | More toots from pierogiburo@tech.lgbt
@monoidmusician 1.3044272060623614e-08 apparently (thanks python on my phone)
=> More informations about this toot | More toots from pierogiburo@tech.lgbt
@pierogiburo oh that’s great, thank you!
huh, so either i got really really lucky, or this QR scanner isn’t fully validating UTF-8 or something :neocat_think_woozy:
=> More informations about this toot | More toots from monoidmusician@tech.lgbt
@pierogiburo more precisely, that would be 4037006666794396657/309485009821345068724781056 :ms_wink_tongue: (thanks Haskell Ratio)
=> More informations about this toot | More toots from monoidmusician@tech.lgbt
@monoidmusician is this why windows uses utf-16?
=> More informations about this toot | More toots from obfusk@tech.lgbt
@obfusk is that more or less likely? :neocat_upsidedown:
=> More informations about this toot | More toots from monoidmusician@tech.lgbt
@monoidmusician easier to get valid utf-16
=> More informations about this toot | More toots from obfusk@tech.lgbt This content has been proxied by September (ba2dc).Proxy Information
text/gemini