Unexpected end of Pattern : Python Regex -
when use following python regex perform functionality described below, error unexpected end of pattern.
regex:
modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(?-i) (code[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
purpose of regex:
input:
code876 code223 matchjustcode657 code69743 code876 testing1code888 example2code098 http://replaced/code665
should match:
code876 code223 code657 code697
and replace occurrences with
http://productcode/code876 http://productcode/code223 matchjusthttp://productcode/code657 http://productcode/code69743
should not match:
code876 testing1code888 testing2code776 example3code654 example2code098 http://replaced/code665
final output
http://productcode/code876 http://productcode/code223 matchjusthttp://productcode/code657 http://productcode/code69743 code876 testing1code888 example2code098 http://replaced/code665
edit , update 1
modified=re.sub(r'^(?i)((?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?)(code[0-9]{3})(?!</a>)',r'<a href="http://productcode/\g<1>">\g<1></a>',input)
the error no more happening. not match of patterns needed. there problem matching groups or matching itself. because when compile regex such, no match input.
edit , update 2
f=open("/users/mymac/desktop/regex.txt") s=f.read() s1 = re.sub(r'((?!http://|testing[0-9]|example[0-9]).*?)(code[0-9]{3})(?!</a>)', r'\g<1><a href="http://productcode/\g<2>">\g<2></a>', s) print s1
input
code123 code765 testing1code123 example1code345 http://www.coding.com/code333 code345 code234 code333
output
<a href="http://productcode/code123">code123</a> <a href="http://productcode/code765">code765</a> testing1<a href="http://productcode/code123">code123</a> example1<a href="http://productcode/code345">code345</a> http://www.coding.com/<a href="http://productcode/code333">code333</a> <a href="http://productcode/code345">code345</a> <a href="http://productcode/code234">code234</a> <a href="http://productcode/code333">code333</a>
regex works raw input, not string input text file.
see input 4 , 5 more results http://ideone.com/3w1e3
okay, looks problem (?-i)
, surprising. purpose of inline-modifier syntax let apply modifiers selected portions of regex. @ least, that's how work in flavors. in python seems modify whole regex, same external flags (re.i
, re.m
, etc.). alternative (?i:xyz)
syntax doesn't work either.
on side note, don't see reason use 3 separate lookaheads, did here:
(?:(?!http://)(?!testing[0-9])(?!example[0-9]).)*?
just or them together:
(?:(?!http://|testing[0-9]|example[0-9]).)*?
edit: seem have moved question of why regex throws exceptions, question of why doesn't work. i'm not sure understand requirements, regex , replacement string below return results want.
s1 = re.sub(r'^((?!http://|testing[0-9]|example[0-9]).*?)(code[0-9]{3})(?!</a>)', r'\g<1><a href="http://productcode/\g<2>">\g<2></a>', s)
is you're after?
edit: know replacements being done within larger text, not on standalone strings. that's makes problem more difficult, know full urls (the ones start http://
) occur in already-existing anchor elements. means can split regex 2 alternatives: 1 match complete <a>...</a>
elements, , 1 match our target strings.
(?s)(?:(<a\s+[^>]*>.*?</a>)|\b((?:(?!testing[0-9]|example[0-9])\w)*?)(code[0-9]{3}))
the trick use function instead of static string replacement. whenever regex matches anchor element, function find in group(1) , return unchanged. otherwise, uses group(2) , group(3) build new one.
here's demo (i know that's horrible code, i'm tired right learn more pythonic way.)
Comments
Post a Comment