Ancestors

Toot

Written by Roberto von Archimboldi on 2025-01-14 at 19:14

Regex question: Why does '[0-9]*' return a blank on '>335', but '[0-9]{1,3}' return 335?

This pertains to libreoffice calc which tells me that it uses ICU regular expressions

[#]Regex, #HelpNeeded, #LibreOffice

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Descendants

Written by xi timpin on 2025-01-14 at 19:26

@RobertoArchimboldi

My guess would be because the first (0) character in the string isn't numeric...

=> More informations about this toot | More toots from xi_timpin@thelife.boats

Written by Roberto von Archimboldi on 2025-01-14 at 19:35

@xi_timpin thanks. Impressive for an alleged bot. It could be. Regex, like most things in life, is a mystery to me

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by xi timpin on 2025-01-14 at 19:51

@RobertoArchimboldi

I don't really know what regex is/are either but I have spent much too long on spreadsheets

=> More informations about this toot | More toots from xi_timpin@thelife.boats

Written by erAck on 2025-01-14 at 20:13

@RobertoArchimboldi

Because the empty match is the first match for any "zero or more" pattern if the string does not start with it (and it's not a blank but an empty string). It's the same for the pattern 'x*' and this ">335" string. The second possible match for '[0-9]*' is 335. If you want it to match only the digits then instead use '[0-9]+', or restrict match to the second occurrence, like

=> 335";"[0-9]*";;2)">EGEX(">335";"[0-9]*";;2)

'[0-9]{1,3}' matches one to three digits, as many times as possible.

=> More informations about this toot | More toots from erAck@social.tchncs.de

Written by Roberto von Archimboldi on 2025-01-14 at 20:29

@erAck Thank you very much. I'm nearly there. I am not quite clear what an 'empty match' is or a 'zero or more' pattern. Fortunately, I want to match one to three digits as many times as possible and then flag for the first or second occurrence. I want to separate a column that contains ranges, 30 - 355, into the lower and upper bounds.

I'd also like to sum the lower bounds without making a whole new column, but I haven't worked out how to do that yet.

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by erAck on 2025-01-14 at 20:43

@RobertoArchimboldi

For your task, best ask on https://ask.libreoffice.org/

A * in a regex pattern tells to match the preceding string or expression zero or more times. See https://unicode-org.github.io/icu/userguide/strings/regexp.html#regular-expression-operators the Regular Expression Operators. You may also test expressions at https://regex101.com/ best use the Java 8 or ECMAScript flavour for ICU behaviour.

=> More informations about this toot | More toots from erAck@social.tchncs.de

Written by Roberto von Archimboldi on 2025-01-14 at 20:44

@erAck thank you. Will do

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by Füsilier Breitlinger on 2025-01-14 at 20:15

@RobertoArchimboldi A regex always finds the leftmost match. Your first regex, [0-9], can match at offset 0 in >335 because means "0 or more of the preceding thing", so it can match all 0 digits before >. On the other hand, [0-9]{1,3} requires at least one digit to match, so the first location where it can succeed is at offset 1, matching all three available digits.

=> More informations about this toot | More toots from barubary@infosec.exchange

Written by Roberto von Archimboldi on 2025-01-14 at 20:38

@barubary combined with @erAck I think that I now understand. So if my test string was '30-45' '[0-9]*' would match 3, 30, 30-, -, 4, and 45?

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by Roberto von Archimboldi on 2025-01-14 at 20:43

@barubary @erAck experimenting, I don't understand. The third group of five is 45 in my spreadsheet. Groups 1, 4, 5 return an empty string, 2 returns 30 and 3 returns 45. '[0-9]*,,6' returns N/A

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by Füsilier Breitlinger on 2025-01-14 at 20:45

@RobertoArchimboldi If you tell a regex engine to find all matches, they normally don't overlap; i.e. each search takes off where the previous match stopped. So for a pattern of [0-9]* against the string 30-45 I'd expect four matches: The two digits at offset 0 (30 at the beginning of the string), the zero digits at offset 2 (, just before -), the two digits at offset 3 (45, after -), and the zero digits at offset 5 ( at the end of the string).

(None of the matches can contain - because [0-9] only matches digits.)

=> More informations about this toot | More toots from barubary@infosec.exchange

Written by Roberto von Archimboldi on 2025-01-14 at 21:48

@barubary I'm learning thank you. Your visualization did render well

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by erAck on 2025-01-14 at 20:50

@RobertoArchimboldi

No. '[0-9]*' in '30-45' would match '30', then an empty match, then '45', then empty. The * "zero or more" is a greedy operator, it matches as many as possible.

See https://regex101.com/r/4J7tZd/1

@barubary

=> More informations about this toot | More toots from erAck@social.tchncs.de

Written by Roberto von Archimboldi on 2025-01-14 at 21:48

@erAck @barubary thank you. I'm learning super helpful

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113828301463196431
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
309.527445 milliseconds
Gemini-to-HTML Time
3.409612 milliseconds

This content has been proxied by September (ba2dc).