regex issues with replacing placeholders dictionary key values

All we need is an easy explanation of the problem, so here it is.

This is with the reference to question:
Replacing placeholders with dictionary keys/values

I have placeholders (the same as in the referenced question except the last one). There I need to replace placeholder $fil_TABLE_NAME1, where $fil_ stays the same but table name differs (split with underscores, can contain numbers)

placeholders = {r'\$plc_hldr1': '1111',
                r'\$plc_hldr2': 'abcd',
                r'\$\d*date_placeholder': '20200101',
                r'\$fil_\w+': '(select * from table)'
                }

For replacement I’m using the adjusted code from the referenced question

def remove_escape_chars(reggie):
     return re.sub(r'\\\$\\d\*|\$\d*|\\\$fil\\\_\\\w\\\+|\\', '', reggie)   #modification

def multiple_replace(escape_dict, text):
   # Create a second dictionary to lookup regex match replacement targets
   unescaped_placeholders = { remove_escape_chars(k): placeholders[k] for k in placeholders }

   # Create a regular expression from all of the dictionary keys
   regex = re.compile("|".join(escape_dict.keys()))
   return regex.sub(lambda match: unescaped_placeholders[remove_escape_chars(match.group(0))], text)

But when I execute it with

text = "sometext $fil_SAMPLE_TABLE_NAME some more text $plc_hldr2 some more more text 
1234date_placeholder some text $5678date_placeholder"

result = multiple_replace(placeholders, text)
print(result)

I get sometext $fil_SAMPLE_TABLE_NAME some more text abcd some more more text 20200101 some text 20200101$fil_SAMPLE_TABLE_NAME is not replaced.

I think I have some issue in regular expression, maybe something incorrectly escaped, but after several modifications, I was not able to find the issue.

Would anybody help me please?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

I would take a slightly different approach to this. Rather than trying to match the regex which matched part of the string, create a regex which has each individual regex in its own group, and then use the matching group number to look up the replacement value. For your sample data, the regex would look like this:

(\$plc_hldr1)|(\$plc_hldr2)|(\$\d*date_placeholder)|(\$fil_\w+)

and the python code would then be:

placeholders = {r'\$plc_hldr1': '1111',
                r'\$plc_hldr2': 'abcd',
                r'\$\d*date_placeholder': '20200101',
                r'\$fil_\w+': '(select * from table)'
                }
replacements = list(placeholders.values())

text = "sometext $fil_SAMPLE_TABLE_NAME some more text $plc_hldr2 some more more text $1234date_placeholder some text $5678date_placeholder"

regex = re.compile('(' + ')|('.join(placeholders.keys()) + ')')
regex.sub(lambda m: replacements[m.lastindex-1], text)

Output:

sometext (select * from table) some more text abcd some more more text 20200101 some text 20200101

Note that this requires that any group in any of the placeholder regexes needs to be non-capturing i.e. (?:...) rather than (...).

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply