Unexpected end of Pattern : Python Regex -


when use following python regex perform functionality described below, error unexpected end of pattern.

regex:

modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i) (code[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input) 

purpose of regex:

input:

code876 code223 matchjustcode657 code69743 code876 testing1code888 example2code098 http://replaced/code665 

should match:

code876 code223 code657 code697 

and replace occurrences with

http://productcode/code876 http://productcode/code223 matchjusthttp://productcode/code657 http://productcode/code69743 

should not match:

code876 testing1code888 testing2code776 example3code654 example2code098 http://replaced/code665 

final output

http://productcode/code876 http://productcode/code223 matchjusthttp://productcode/code657 http://productcode/code69743 code876 testing1code888 example2code098 http://replaced/code665 

edit , update 1

modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(code[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input) 

the error no more happening. not match of patterns needed. there problem matching groups or matching itself. because when compile regex such, no match input.

edit , update 2

f=open("/users/mymac/desktop/regex.txt") s=f.read()  s1 = re.sub(r'((?!http://|testing[0-9]|example[0-9]).*?)(code[0-9]{3})(?!</a>)',              r'\g<1><a href="http://productcode/\g<2>">\g<2></a>', s) print s1 

input

code123 code765 testing1code123 example1code345 http://www.coding.com/code333 code345  code234  code333 

output

<a href="http://productcode/code123">code123</a> <a href="http://productcode/code765">code765</a> testing1<a href="http://productcode/code123">code123</a> example1<a href="http://productcode/code345">code345</a> http://www.coding.com/<a href="http://productcode/code333">code333</a> <a href="http://productcode/code345">code345</a>  <a href="http://productcode/code234">code234</a>  <a href="http://productcode/code333">code333</a> 

regex works raw input, not string input text file.

see input 4 , 5 more results http://ideone.com/3w1e3

okay, looks problem (?-i), surprising. purpose of inline-modifier syntax let apply modifiers selected portions of regex. @ least, that's how work in flavors. in python seems modify whole regex, same external flags (re.i, re.m, etc.). alternative (?i:xyz) syntax doesn't work either.

on side note, don't see reason use 3 separate lookaheads, did here:

(?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*? 

just or them together:

(?:(?!http://|testing[0-9]|example[0-9]).)*? 

edit: seem have moved question of why regex throws exceptions, question of why doesn't work. i'm not sure understand requirements, regex , replacement string below return results want.

s1 = re.sub(r'^((?!http://|testing[0-9]|example[0-9]).*?)(code[0-9]{3})(?!</a>)',              r'\g<1><a href="http://productcode/\g<2>">\g<2></a>', s) 

see in action 1 ideone.com

is you're after?


edit: know replacements being done within larger text, not on standalone strings. that's makes problem more difficult, know full urls (the ones start http://) occur in already-existing anchor elements. means can split regex 2 alternatives: 1 match complete <a>...</a> elements, , 1 match our target strings.

(?s)(?:(<a\s+[^>]*>.*?</a>)|\b((?:(?!testing[0-9]|example[0-9])\w)*?)(code[0-9]{3})) 

the trick use function instead of static string replacement. whenever regex matches anchor element, function find in group(1) , return unchanged. otherwise, uses group(2) , group(3) build new one.

here's demo (i know that's horrible code, i'm tired right learn more pythonic way.)


Comments

Popular posts from this blog

linux - Using a Cron Job to check if my mod_wsgi / apache server is running and restart -

actionscript 3 - TweenLite does not work with object -

jQuery Ajax Render Fragments OR Whole Page -