python - Basic RegEx pattern throws non desired results -
I have input like this ( list mylist name):
Ankatrodos 2 Inmubls Pagha 1 D1 Ancordado 1 Inmubls pay ¡live 1 D1 Ankatrodos 0 Inmubls Ankatrrados 1.931 Inmubls pay ¡live 1D 12 9 Ankonadros 12 Inmubls Página 1 de 1 I want to extract the first occurrence of each line to be two digit number 0-99. The desired output:
[ '2', '1', '0', '12'] I do not want the fourth line matches because more than two points (in Spanish decimal point is a comma, and the thousands separator dot)
My approach is pattern (\ d {1,2}) , With mask = re.compile ('\ d +') , then I take the first group with the [[mask.search (item)). Items in my list for the group (0)]
But the output I am getting:
['2', '1', ' 0 ',' 1 ',' 12 ']
I believe that this happens because the first incident in Encodrados 1.931 embubbles. Página 1 de 12 9 is the string '1' that follows the word 'pygin' but I can fix this bug on my own.
proposed solution
Use negative Lukahed (?! ) Specify that the decimal digits after the digits should not be, such as:
\ d {1, 2} (?! \.) < / Code> although it still matches the number after Piagina you want to be more specific and more:
( \ D {1,2} (?! De | \.)) This denies, by the example, showing the word "D" after the match.
Online Examples:
Regex101
Comments
Post a Comment