Camelia

How to download a web page with Perl 6

Because of various personal issues I stopped using IRC a while ago. Once in a while I am still joinng the #perl6 channel on, when I have an urgent question, but usually I have enough other distractions.

As I publish articles about Perl 6, once in a while people mention my name, or paste a link to my web site. I don't have time to follow the channel, but I'd like to know when something related to my work is mentioned.

Or just smile I was mentioned on ze Internets!

Luckily Moritz Lenz runs an IRC bot that creates beautiful logs of various IRC channels. Including that of Perl 6.

We are going to write a script that downloads the page containing today's log and check if certain strings were mentioned. If they were, we can dispatch an e-mail.

Note! This site is about Perl 6, the future version of Perl.
If you are looking for a solution for the current production version of Perl 5, please check out the Perl 5 tutorial.

Monitoring a web site

Thanks to the LWP::Simple module of Cosimo Streppone, this is a simple task.

(Check here how to install Perl 6 modules.)

use v6;

use LWP::Simple;

sub MAIN($day = 0) {
    my $d = Date.today() - $day;
    say $d;

    my @strings = <szabgab maven>;

    my $html = LWP::Simple.get('http://irclog.perlgeek.de/perl6/' ~ $d);

    if @strings.grep(-> $p { $html ~~ m:i/$p/ }) {
        say 'found';
        my @rows = split /\n/, $html;
        say @rows.elems;
        for @strings -> $s {
            for @rows.grep({ m:i/$s/ }) -> $r {
                say "$s -- $r";
            }
        }
    }
}

Checking out the web site of the Perl 6 IRC logs, you will see that every day has its own page. The name of the page is built from the date in YYYY-MM-DD format.

How can we get that?

Perl 6 has a Date class which has a constructor called today It returns a Date object, which happens to be in that exact format, when stringified. So generating of today's URL is just a matter of

my $d = Date.today();
my $url = 'http://irclog.perlgeek.de/perl6/' ~ $d;

As I wanted to be able to check earlier dates as well, I let the MAIN subroutine define a command line argument for $day. That should represent the number of days in the past. It defaults to 0, which means today.

LWP::Simple.get() download the content of a page and we assign it to the $html variable.

As we allow for multiple strings to look for we use grep to check the original HTML with each one of the expected strings. The m:i prefix makes our regex to be case insensitive.

@strings.grep(-> $p { $html ~~ m:i/$p/ })

Then we split the html to work on rows and print out the rows that have any of the strings we are monitoring.

A better approach would be to parse the HTML and check the strings in the right places, but for now this works well enough.

Sending e-mail was left as an exercise to you.


The Perl 6 Tricks and Treats newsletter has been around for a while. If you are interested to get special notification when there is new content on this site, it is the best way to keep track:
Email:
Full name:
This is a newsletter temporarily running on my personal site (szabgab.com) using Mailman, till I implement an alternative system in Perl 6.
Gabor Szabo
Written by Gabor Szabo

Published on 2012-09-21



Comments

In the comments, please wrap your code snippets within <pre> </pre> tags and use spaces for indentation.
comments powered by Disqus

Perl 6 Tricks and Treats newsletter

Register to the free newsletter now, and get updates and news.
Email:
Name: