Camelia

Perl 6 Regexes

Making the first steps learning regular expressions in Perl 6.

Note! This site is about Perl 6, the future version of Perl.
If you are looking for a solution for the current production version of Perl 5, please check out the Perl 5 tutorial.

Perl 6 regex

Let's start by saying that in Perl 6 we call these thing a Regexes not Regular Expressions, but you'd be forgiven if you used either name. Just don't get into an argument with someone from a University.

Regex operator

In Perl 6 the smart match ~~ operator is used for regex matching.

For negative matching use the !~~ operator.

Basic usage

Let's start with a simple example.

use v6;

my $str = 'abc123';

if $str ~~ m/a/ {
    say "Matching";
}

if $str !~~ m/z/ {
    say "No z in $str";
}

Both conditions will be true so both the word "Matching" and "No z in abc123" will be printed

Special characters

An important change from the way the regular expressions worked in Perl 5 (and thus in any language that claims to be Perl compatible), is that in Perl 6, any non-alphanumeric character needs to be escaped. Even if they don't currently have any special meaning. Otherwise you'll get a compilation error.

I think, this can make all the regexes look more cryptic, as we will see a lot more character escaping, but I already saw people escaping various non-alpha characers in a rather random way. So this requirement will make it clear which are real special characters and which are not..

It will also direct us to pick cleaner solutions even if they are more wordy.

In the next example we will have to escape the - sign:

use v6;

my $str = 'abc - 123';

if $str ~~ m/-/ {
    say "Matching";
}

Generates:

===SORRY!===
Unrecognized regex metacharacter - (must be quoted to match literally) at line 6, near "/) {\n    s"

use v6;

my $str = 'abc - 123';

if $str ~~ m/\-/ {
    say "Matching";
}

works, and prints Matching.

New special characters

We will have to be very careful as there are going to be a number of cases that can easily trip up anyone who already uses regular expressions. For example the pound key # is now a special character by default meaning a comment.

So you'd better escape it when you really want to match a # character.

The following example seems to be a bug in Rakudo as it should not even compile but in fact it runs and prints "match 'a'"

use v6;

my $str = 'abc # 123';

if $str ~~ m/(#.)/ {
    say "match '$/'";
}

Gives this error message;

===SORRY!===
Unrecognized regex metacharacter ( (must be quoted to match literally) at line 6, near "#.)/ {\n   "

This is the correct regex, escping #:

use v6;

my $str = 'abc # 123';

if $str ~~ m/(\#.)/ {
    say "match '$/'";
}

The Match variable of Perl 6

I ran a bit forward with that example. Let me explain.

Every time there is a regex operation a localized version of the match variable $/ is set to the actual match. That variable has a lot more power than simply containing the match. We'll talk about that later but for now see this example:

use v6;

my $str = 'abc123';

if $str ~~ m/a/ {
    say "Matching   '$/'";       # Matching  'a'
}

if $str !~~ m/z/ {
    say "No z in $str   '$/'";   # No z in abc123  ''
}

Spaces in a Perl 6 regex

Regex in Perl 6 disregard spaces by default. People who are used to the Perl 5 style regular expressions - which means basically every programming language that has a regular expression library - will usually think as spaces being significant in the regular expressions.

We will have to unlearn that, and think about the individual bits and pieces that are the tokens we would like to match.

Basically Perl 6 regexes work as if you always had the /x modifier on which in Perl 5 means disregard spaces and treat # as start of comment.

use v6;

my $str = 'The black cat climbed to the green tree.';

if $str ~~ m/black/ {
    say "Matching '$/'";     # Matching 'black'
}

if $str ~~ m/black cat/ {
    say "Matching '$/'";
} else {
    say "No match as whitespaces are disregarded";  # prints this
}

As we can see in the above example, the space in the regex does not match the space in the string.

so you ask How can I match a white-space with the Perl 6 regex?

The following will all match and print "Matching 'black cat'" and the text after it.

use v6;

my $str = 'The black cat climbed to the green tree.';

if $str ~~ m/black\scat/ {
    say "Matching '$/' - Perl 5 style white-space meta character works";
}

if $str ~~ m/black \s cat/ {
    say "Matching '$/' - Meta white-space matched, real space is disregarded";
}

if $str ~~ m/black  ' '  cat/ {
    print "Matching '$/' - ";
    say "the real Perl 6 style would be to use strings embedded in regexes";
}

if $str ~~ m/black <space> cat/ {
    print "Matching '$/' - ";
    say "or maybe the Perl 6 style is using named character classes ";
}

In any case this points out that we could have written:

use v6;

my $str = 'The black cat climbed to the green tree.';

if $str ~~ m/  b l a c k <space> c a t/ {
    say "Matching '$/' - a regex in Perl 6 is just a sequence of tokens";
}

You can see, that you can embed literal strings in the regex using single quotes and there are new types of character classes, using angle brackets.

Matching everything

The . (dot) is a meta-character that will be ready to match any character.

As opposed to the Perl 5 regular expressions, in Perl 6 this really includes everything. Even newlines.

If you want to match any character, except newline you can use the \N special character class.

use v6;

my $str = 'The black cat climbed to the green tree.';

if $str ~~ m/c./ {
    say "Matching '$/'";      # 'ck'
}

my $text = "
The black cat
climbed the green tree";

if $text ~~ m/t./ {
    say "Matching '$/'";
}

The first regex matches and prints 'ck', the result of the second reges is on two lines:

't
'

Using \N:

use v6;

my $text = "
The black cat
climbed the green tree";

if $text ~~ m/t\N/ {
    say "Matching '$/'";     # 'th'    of the word 'the'
}

In the last example you can see that \N can match the letter h but not the newlines.


The Perl 6 Tricks and Treats newsletter has been around for a while. If you are interested to get special notification when there is new content on this site, it is the best way to keep track:
Email:
Full name:
This is a newsletter temporarily running on my personal site (szabgab.com) using Mailman, till I implement an alternative system in Perl 6.
Gabor Szabo
Written by Gabor Szabo

Published on 2012-07-10



Comments

In the comments, please wrap your code snippets within <pre> </pre> tags and use spaces for indentation.
comments powered by Disqus

Perl 6 Tricks and Treats newsletter

Register to the free newsletter now, and get updates and news.
Email:
Name: