python - Basic RegEx pattern throws non desired results -


I have input like this ( list mylist name):

  Ankatrodos 2 Inmubls Pagha 1 D1 Ancordado 1 Inmubls pay ¡live 1 D1 Ankatrodos 0 Inmubls Ankatrrados 1.931 Inmubls pay ¡live 1D 12 9 Ankonadros 12 Inmubls Página 1 de 1   

I want to extract the first occurrence of each line to be two digit number 0-99. The desired output:

  [ '2', '1', '0', '12']   

I do not want the fourth line matches because more than two points (in Spanish decimal point is a comma, and the thousands separator dot)

My approach is pattern (\ d {1,2}) , With mask = re.compile ('\ d +') , then I take the first group with the [[mask.search (item)). Items in my list for the group (0)]

But the output I am getting:

['2', '1', ' 0 ',' 1 ',' 12 ']

I believe that this happens because the first incident in Encodrados 1.931 embubbles. Página 1 de 12 9 is the string '1' that follows the word 'pygin' but I can fix this bug on my own.

proposed solution

Use negative Lukahed (?! )

Specify that the decimal digits after the digits should not be, such as:

  \ d {1, 2} (?! \.) < / Code>  

although it still matches the number after Piagina you want to be more specific and more:

  ( \ D {1,2} (?! De | \.))   

This denies, by the example, showing the word "D" after the match.

Online Examples:

Regex101



Comments

Popular posts from this blog

c - Mpirun hangs when mpi send and recieve is put in a loop -

python - Apply coupon to a customer's subscription based on non-stripe related actions on the site -

java - Unable to get JDBC connection in Spring application to MySQL -