

Now there are four possible formats for an Html attribute We have to consider the fact that the greater-than symbol does not end a tag if it’s within a quoted attribute value.


Unfortunately, this implementation is too naive. Do you see it? What if I asked you to match the following tag? Now this will probably work 99 times out of 100, but there’s a flaw in this expression. Roughly Translated, this expression looks for the beginning tag and tag name, followed by some white-space and then anything that doesn’t end the tag. You might consider the following expression: When you initially think to parse an HTML tag, it seems quite easy. So let’s look at a common task of matching HTML tags within the body of some text. Reading it will make your Regex-Fu powerful. This is really THE book on Regular Expressions. To that end, I recommend Mastering Regular Expressions by Jeffrey Friedl. They should be on the tool belt of every developer. But after a few projects that required some intense text processing, I realized the power and utility of regular expressions. All I needed was a Substring method and an IndexOf method and I was set. Ok I admit, I was a bit intimidated by regular expressions when I first started off as a developer.
