Note to self: Remember that the COM RegEx parser doesn't deal with the . operator the same way in multi-line content as .NET or most other RegEx parsers do. I've just spent 20 minutes troubleshooting a RegEx expression that works just fine in RegEx Buddy and .NET code, but failed in one of my FoxPro apps here using the COM VBScript.RegEx parser.
The code I was working on required stripping out @Register tags from an ASP.NET style markup document:
TEXT TO lcText NoShow
<!--
* Set the name of your class in the ID property
* Set the GeneratedSourceFile at a PRG file in your FoxPro project directory
* NOTE: the path is relative to your executing directory (CURDIR())
* Remove this block of comment text
-->
<%@ Page Language="C#"
GeneratedSourceFile="devDemo/BasePage.prg"
ID="BasePage_Page"
AuthenticationMode="Basic"
%>
<%@ Register Assembly="WebConnectionWebControls"
Namespace="Westwind.WebConnection.WebControls"
TagPrefix="ww" %>
<%@ Register Assembly="WebConnectionWebControls"
Namespace="Westwind.WebConnection.WebControls.Customization"
TagPrefix="ww" %>
... more HTML here
<form id="form1" runat="server">
ENDTEXT
LOCAL loRegEx as VBScript.RegExp
loRegEx = CREATEOBJECT("VBScript.RegExp")
loRegEx.IgnoreCase = .T.
loRegEx.Global = .T.
loRegEx.MultiLine = .T.
loRegEx.Pattern = '<%@\s{0,}Register.*?%>\s{0,}'
? loRegEx.Replace(lcText,"")
RETURN
So I started out with the above expression to match and then remove the entire @Register tags:
loRegEx.Pattern = '<%@\s{0,}Register.*?%>\s{0,}'
using the . to specify any character in a multi-line expression to parse. This doesn't work because apparently the . operator in the VBScript RegEx parser doesn't match the newline and so only effective matches on the first line. This is despite the multi-line option, which only affects how the ^ and $ (beginning and end of line) characters are parsed by the RegEx parser.
There are a couple of ways around this. What I used here since I just replace the . with [\s,\S] which is essential every character:
loRegEx.Pattern = '<%@\s{0,}Register[\s,\S]*?%>\s{0,}'
Or to be more explict [.|\n] also works to provide the same results.
My short term memory is going bad. Just as I got this working I ran into some older code (in the same program file even!) where I had apparently done exactly the same thing previously using [\s,\S] instead of the .. Nothing like solving the same problem twice, eh? Hopefully this time after writing it up I'll remember. <g>
In general I wish I could remember more of the little bit of RegEx work I do. Even better some of that what other people do, he he. I appreciate the power of RegEx, but it seems whenever I do anything with RegEx it takes forever to do even simple things and once it's done I immediately and completely forget the syntax and process that went into figuring it out. No retention there for me. Case in point here. Next time maybe I'll remember.