Regular Expressions Links

–$–

$ for matching –

$string =~/string to find stuff after (.+)/;

$string =~/string to find stuff after (.+)$/;

$string =~/string to find stuff after (.+)$1/;

all yield the same thing when later looking for $1

$lookingFor = $1;

–A–

after, show everything after a match – use the $'variable

alphanumeric, to find - \w to match a "word" character (alphanumeric plus "_"). A \w matches a single alphanumeric character, not a whole word. To match a word you'd need to say \w+.

alphanumeric, to find non-alphanumeric characters - \W

$pattern =~ s/(\W)/\\$1/g;

anything between two characters -

let's say extract the substring between square brackets, without returning the brackets themselves.

(?<=\[)(.*?)(?=\])

between 1st square bracket and end of line

(?<=\[)(.*?)(\n)

–B–

before, show everything after a match – use the $` variable

blank line – \r\n\r\n - finds two newline characters (what you get from pressing Enter twice).

–C–

carriage return & end of line - \r\n

case insensitive (?i:)

comment - (?#text)

–D–

delimiter – use the Perl split function instead

–E–

end of line - \n (or \r\n in Notepad++)

$contents =3D `cat textfile.txt`;

if ($contents !~ /\n$/sm) {

print "no newline at end of file\n";

}

excess whitespace, replace with a single space $inputstring =~ s/\s+/ /g;

everything after, show everything after a match – use the $'variable

everything before, show everything after a match – use the $`variable

exclude text from a match - one easy way to exclude text from a match is negative lookbehind:

\w+\b(?<!\bfox)

would include all the words except "fox" below:

The quick brown fox jumped over the lazy dog.

But not all regex flavors support negative lookbehind. And those that do typically have severe restrictions on the lookbehind, eg, it must be a simple fixed-length expression. To avoid incompatibility, we can restate our solution using negative lookahead:

(?!fox\b)\b\w+

–F–

–G–

–H–

HTML tag and the matching end tag - <(.+?)>(.+)<\/\1>

leading whitespace, remove $inputstring =~ s/^\s+//g;

letters, remove – s/\D//g

Learn Regex The Hard Way: Scanning And Parsing Text Without Going Insane

lookahead (positive) - q(?=u) matches a q that is followed by a u, without making the u part of the match. The positive lookahead construct is a pair of round brackets, with the opening bracket followed by a question mark and an equals sign

lookahead (negative) - q(?!u) matches a q that is not followed by a u. The negative lookahead construct is the pair of round brackets, with the opening bracket followed by a question mark and an exclamation point.

lookbehind (positive) - (?<=a)b matches the b (and only the b) in cab, but does not match bed or debt. Positive lookbehind is written as (?<=text): a pair of round brackets, with the opening bracket followed by a question mark, "less than" symbol and an equals sign.

lookbehind (negative) - (?<!a)b matches a "b" that is not preceded by an "a". It will not match cab, but will match the b (and only the b) in bed or debt. Negative lookbehind is written as (?<!text), using an exclamation point instead of an equals sign.

–M–

“/m” pattern matching operator at end of line – treat string as multiple lines– as opposed to “/s” to treat as single line

–N–

new line - \n or \r\n\r\n in Notepad++

$contents =3D `cat textfile.txt`;

if ($contents !~ /\n$/sm) {

print "no newline at end of file\n";

}

–O–

“/o” pattern matching operator at end of line – only compile pattern once

overview

–P–