Skip to content Skip to sidebar Skip to footer

Matching Json With A Regular Expression

I have a JavaScript file containing many object literals: // lots of irrelevant code oneParticularFunction({ key1: 'string value', key2: 12345, key3: 'strings which may

Solution 1:

Why not writing a state machine that reads { and increments a counter on every { and decrements it with every } so when it reaches 0 again, take all the characters in the middle and use the json parser from python to check if it is valid or not? on that way, you can get the benefit of syntactical errors instead of a simple match no match from the regex (remember python is { free so false positives are impossible).

Solution 2:

Regex code:

(?<=(?:\s\"))[\s\S]+?(?=\")|(?<=(?:\s))\d+

Live example of regex at https://regex101.com/r/bfNkvF/3

To use the previous regex in Python:

import re
text = '''oneParticularFunction({
key1: "string value",
key2: 12345,
key3: "strings which may contain ({ arbitrary characters })"
});'''for m in re.finditer(r"(?<=(:\s\"))[\s\S]+?(?=\")|(?<=(:\s))\d+", text):
    print('%s' % (m.group(0)))

I tested this code on pythontutor, and it seems to work. You can copy it and paste it there. Let me know if it works on the other object literals.

Solution 3:

I was able to use this to remove all brackets from a string without eliminating or mismatching an outer '({' and '})'

whileTrue:
    newstring = re.sub(r'(\(\{.*)\{([^{}]*)\}(.*\}\))', r'\1\2\3', mystring)
    if newstring == mystring:
        break
    mystring = newstring

There are 3 groups here (I know, it's hard to tell). The first is (\(\{.*). This finds your ({ and then whatever comes after it up until it finds the inner most {

We know it is the inner most { because of the second group ([^{}]*). This will match anything that is not a { or }.

Then, (.*\}\)) finds everything after the innermost }.

This whole match is replaced by combining these three groups back together (with the {}'s left out). It repeats this until it finds no more matching braces to replace.

If you wanted to also replace ()'s, you could modify it to

newstring = re.sub(r'(\(\{.*)(\{|\()([^{}()]*)(\}|\))(.*\}\))', r'\1\3\5', mystring)

Post a Comment for "Matching Json With A Regular Expression"