Ancestors

Written by Roberto von Archimboldi on 2025-01-14 at 19:14

Regex question: Why does '[0-9]*' return a blank on '>335', but '[0-9]{1,3}' return 335?

This pertains to libreoffice calc which tells me that it uses ICU regular expressions

[#]Regex, #HelpNeeded, #LibreOffice

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Toot

Written by Füsilier Breitlinger on 2025-01-14 at 20:15

@RobertoArchimboldi A regex always finds the leftmost match. Your first regex, [0-9], can match at offset 0 in >335 because means "0 or more of the preceding thing", so it can match all 0 digits before >. On the other hand, [0-9]{1,3} requires at least one digit to match, so the first location where it can succeed is at offset 1, matching all three available digits.

=> More informations about this toot | More toots from barubary@infosec.exchange

Descendants

Written by Roberto von Archimboldi on 2025-01-14 at 20:38

@barubary combined with @erAck I think that I now understand. So if my test string was '30-45' '[0-9]*' would match 3, 30, 30-, -, 4, and 45?

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by Roberto von Archimboldi on 2025-01-14 at 20:43

@barubary @erAck experimenting, I don't understand. The third group of five is 45 in my spreadsheet. Groups 1, 4, 5 return an empty string, 2 returns 30 and 3 returns 45. '[0-9]*,,6' returns N/A

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by Füsilier Breitlinger on 2025-01-14 at 20:45

@RobertoArchimboldi If you tell a regex engine to find all matches, they normally don't overlap; i.e. each search takes off where the previous match stopped. So for a pattern of [0-9]* against the string 30-45 I'd expect four matches: The two digits at offset 0 (30 at the beginning of the string), the zero digits at offset 2 (, just before -), the two digits at offset 3 (45, after -), and the zero digits at offset 5 ( at the end of the string).

(None of the matches can contain - because [0-9] only matches digits.)

=> More informations about this toot | More toots from barubary@infosec.exchange

Written by Roberto von Archimboldi on 2025-01-14 at 21:48

@barubary I'm learning thank you. Your visualization did render well

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Written by erAck on 2025-01-14 at 20:50

@RobertoArchimboldi

No. '[0-9]*' in '30-45' would match '30', then an empty match, then '45', then empty. The * "zero or more" is a greedy operator, it matches as many as possible.

See https://regex101.com/r/4J7tZd/1

@barubary

=> More informations about this toot | More toots from erAck@social.tchncs.de

Written by Roberto von Archimboldi on 2025-01-14 at 21:48

@erAck @barubary thank you. I'm learning super helpful

=> More informations about this toot | More toots from RobertoArchimboldi@kolektiva.social

Proxy Information
Original URL
gemini://mastogem.picasoft.net/thread/113828543853419224
Status Code
Success (20)
Meta
text/gemini
Capsule Response Time
332.686349 milliseconds
Gemini-to-HTML Time
1.527443 milliseconds

This content has been proxied by September (ba2dc).