1. Aug 28th, 2006

    scrAPI 1.2.0: Explicit skip and hAtom

    logo1.jpg

    Based on your feedback, I decided to change the behavior so processing rules no longer “consume” the element they process. Instead, if you decide that you don’t want to process that element (and its children) with any other rule, either call the skip method, or pass the argument :skip=>true. The old behavior was premature optimization (bad), the new one is more explicit and easier to control.

    I’ve done a lot of work around hAtom recently. I modified the great Simpla theme from Phu to do hAtom and hCard. I added hAtom, hCard and MicroID support to co.mments.

    Out of that, I extracted a Microformats helper for Rails. And it was only reasonable I use one piece of code to produce the output, another piece of code to test it. So I wrote a simple hAtom scraper using scrAPI. It’s an early release that does hAtom and very basic hCard, but it’s worth checking out. It’s also an example of how to write scrapers, I incorporated a few tips and tricks in there.

    You can find it in lib/scraper/microformats.rb.

    Last notable change is the addition of a collect() method that gets called before result(). It turned out essential, for example, when working with hAtom, if the update date/time is missing it defaults to published. That all happens during collect.

    1. Dec 31st, 2007

      Michael Staton

      how does one get to lib/scraper/microformats.rb?

    Your comment, here ⇓