Regex question: Why does '[0-9]*' return a blank on '>335', but '[0-9]{1,3}' return 335?
This pertains to libreoffice calc which tells me that it uses ICU regular expressions
[#]Regex, #HelpNeeded, #LibreOffice
=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social
@RobertoArchimboldi
My guess would be because the first (0) character in the string isn't numeric...
=> More informations about this toot | More toots from xi_timpin@thelife.boats
@xi_timpin thanks. Impressive for an alleged bot. It could be. Regex, like most things in life, is a mystery to me
=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social
@RobertoArchimboldi
I don't really know what regex is/are either but I have spent much too long on spreadsheets
=> More informations about this toot | More toots from xi_timpin@thelife.boats
@RobertoArchimboldi
Because the empty match is the first match for any "zero or more" pattern if the string does not start with it (and it's not a blank but an empty string). It's the same for the pattern 'x*' and this ">335" string. The second possible match for '[0-9]*' is 335. If you want it to match only the digits then instead use '[0-9]+', or restrict match to the second occurrence, like
=> 335";"[0-9]*";;2)">EGEX(">335";"[0-9]*";;2)
'[0-9]{1,3}' matches one to three digits, as many times as possible.
=> More informations about this toot | More toots from erAck@social.tchncs.de
@erAck Thank you very much. I'm nearly there. I am not quite clear what an 'empty match' is or a 'zero or more' pattern. Fortunately, I want to match one to three digits as many times as possible and then flag for the first or second occurrence. I want to separate a column that contains ranges, 30 - 355, into the lower and upper bounds.
I'd also like to sum the lower bounds without making a whole new column, but I haven't worked out how to do that yet.
=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social
@RobertoArchimboldi
For your task, best ask on https://ask.libreoffice.org/
A * in a regex pattern tells to match the preceding string or expression zero or more times. See https://unicode-org.github.io/icu/userguide/strings/regexp.html#regular-expression-operators the Regular Expression Operators. You may also test expressions at https://regex101.com/ best use the Java 8 or ECMAScript flavour for ICU behaviour.
=> More informations about this toot | More toots from erAck@social.tchncs.de
@erAck thank you. Will do
=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social
@RobertoArchimboldi A regex always finds the leftmost match. Your first regex, [0-9], can match at offset 0 in >335 because means "0 or more of the preceding thing", so it can match all 0 digits before >. On the other hand, [0-9]{1,3} requires at least one digit to match, so the first location where it can succeed is at offset 1, matching all three available digits.
=> More informations about this toot | More toots from barubary@infosec.exchange
@barubary combined with @erAck I think that I now understand. So if my test string was '30-45' '[0-9]*' would match 3, 30, 30-, -, 4, and 45?
=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social
@barubary @erAck experimenting, I don't understand. The third group of five is 45 in my spreadsheet. Groups 1, 4, 5 return an empty string, 2 returns 30 and 3 returns 45. '[0-9]*,,6' returns N/A
=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social
@RobertoArchimboldi If you tell a regex engine to find all matches, they normally don't overlap; i.e. each search takes off where the previous match stopped. So for a pattern of [0-9]* against the string 30-45 I'd expect four matches: The two digits at offset 0 (30 at the beginning of the string), the zero digits at offset 2 (, just before -), the two digits at offset 3 (45, after -), and the zero digits at offset 5 ( at the end of the string).
(None of the matches can contain - because [0-9] only matches digits.)
=> More informations about this toot | More toots from barubary@infosec.exchange
@barubary I'm learning thank you. Your visualization did render well
=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social
@RobertoArchimboldi
No. '[0-9]*' in '30-45' would match '30', then an empty match, then '45', then empty. The * "zero or more" is a greedy operator, it matches as many as possible.
See https://regex101.com/r/4J7tZd/1
@barubary
=> More informations about this toot | More toots from erAck@social.tchncs.de
@erAck @barubary thank you. I'm learning super helpful
=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social This content has been proxied by September (ba2dc).Proxy Information
text/gemini