–$–
$ for matching –
$string =~/string to find stuff after (.+)/;
$string =~/string to find stuff after (.+)$/;
$string =~/string to find stuff after (.+)$1/;
all yield the same thing when later looking for $1
$lookingFor = $1;
after, show everything after a match – use the $'
variable
alphanumeric, to find - \w to match a "word" character (alphanumeric plus "_"). A \w matches a single alphanumeric character, not a whole word. To match a word you'd need to say \w+.
alphanumeric, to find non-alphanumeric characters - \W
$pattern =~ s/(\W)/\\$1/g;
anything between two characters -
let's say extract the substring between square brackets, without returning the brackets themselves.
(?<=\[)(.*?)(?=\])
between 1st square bracket and end of line
(?<=\[)(.*?)(\n)
before, show everything after a match – use the $`
variable
blank line – \r\n\r\n - finds two newline characters (what you get from pressing Enter twice).
carriage return & end of line - \r\n
case insensitive (?i:)
comment - (?#text)
delimiter – use the Perl split function instead
end of line - \n (or \r\n in Notepad++)
$contents =3D `cat textfile.txt`;
if ($contents !~ /\n$/sm) {
print "no newline at end of file\n";
}
excess whitespace, replace with a single
space $inputstring
=~ s/\s+/ /g;
everything after, show everything after a match – use the $'
variable
everything before, show everything after a match – use the $`
variable
exclude text from a match - one easy way to exclude text from a match is negative lookbehind:
\w+\b(?<!\bfox)
would include all the words except "fox" below:
The quick brown fox jumped over the lazy dog.
But not all regex flavors support negative lookbehind. And those that do typically have severe restrictions on the lookbehind, eg, it must be a simple fixed-length expression. To avoid incompatibility, we can restate our solution using negative lookahead:
(?!fox\b)\b\w+
HTML tag and the matching end tag - <(.+?)>(.+)<\/\1>
leading whitespace, remove $inputstring
=~ s/^\s+//g;
letters, remove – s/\D//g
Learn Regex The Hard Way: Scanning And Parsing Text Without Going Insane
lookahead (positive) - q(?=u)
matches a q
that is followed by a u, without making the u part of the match. The positive
lookahead construct is a pair of round brackets, with the opening bracket
followed by a question mark and an equals sign
lookahead (negative) - q(?!u)
matches a q
that is not followed by a u. The negative lookahead construct is the
pair of round brackets, with the opening bracket followed by a question mark
and an exclamation point.
lookbehind (positive) - (?<=a)b
matches
the b (and only the b) in cab, but does not match bed or debt. Positive
lookbehind is written as (?<=text): a pair of round brackets, with the
opening bracket followed by a question mark, "less than" symbol and
an equals sign.
lookbehind (negative) - (?<!a)b
matches a
"b" that is not preceded by an "a". It will not match cab,
but will match the b (and only the b) in bed or debt. Negative lookbehind
is written as (?<!text), using an exclamation point instead of an equals
sign.
“/m” pattern matching operator at end of line – treat string as multiple lines– as opposed to “/s” to treat as single line
new line - \n
or \r\n\r\n
in Notepad++
$contents =3D `cat textfile.txt`;
if ($contents !~ /\n$/sm) {
print "no newline at end of
file\n";
}
“/o” pattern matching operator at end of line – only compile pattern once
paragraph with no ending period: [A-Za-z]$
phone: ^\([0-9]{3}\)\s[0-9]{3}-[0-9]{4}$
to match (555) 555-7890
remove everything after the first “["
$string =~ /([^\[]+)\[(.*)/)
“/s” pattern matching operator at end of line – treat string as single line – as opposed to “/m” to treat as multiple lines
select everything after the first “xyz”
if ($string =~ /xyz(.*)/s))
{
$remainder = $1;
}
spaces – see also white space
<span> tag - \/?span[^>]*
- this
does not include the angle brackets
Match the character “/” literally <<\/?>>
Between zero and one times, as many times as possible, giving back as needed (greedy) <<?>>
Match the characters “span” literally <<span>>
Match any character that is NOT a “>” <<[^>]*>>
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) <<*>>
<span> tag, find instances of, including the closing
tag and the angle brackets: <\s*\/?\s*span\s*.*?>
or
</?(span)(.|\n)*?>
<span> tag, find “empty” span tags (filled with
and space) - <span[^>]*(?:/>|>(?:\s| )*</span>)
This will match autoclosing spans, spans on multilines and whatever the case, spans with attributes, span with unbreakable spaces
<span style='color:black'>, find –
<\s*\w*\s*style\s*='color:black\s*([\w\s%#\/\.;:_-]*)\s*.*?>
Finds <span style='color:black'>, but not closing tag
trailing whitespace,
remove $inputstring
=~ s/\s+$//g;
remove leading
whitespace
$inputstring =~ s/^\s+//g;
remove trailing
whitespace
$inputstring =~ s/\s+$//g;
replace excess whitespace with a single
space $inputstring =~ s/\s+/ /g;