1. Jul 26th, 2006

    New features: scrAPI toolkit for Ruby and assert_select

    Major update to both libraries.

    I added a full test suite, and in the process caught and fixed a few bugs, like case sensitive (where it shouldn’t), group selectors not working as expected, and a few other small gotchas.

    I also added pseudo classes from CSS 3. Pseudo classes are a bit tricky to explain, so let me show with some examples:

    table tr:nth-child(odd)

    Selects every other (odd) row in the table.

    table tr:nth-child(-n+6)

    Selects the first six rows in the table.

    table tr:nth-child(6)

    Selects the sixth row in the table.

    table tr:first-child

    Selects the first row in the table.

    div p:first-child

    Will almost work like you expect it to, but only if the paragraph is the first element in the div. Otherwise, it selects nothing.

    div p:first-of-type

    Will select the first paragraph in the div, ignoring any elements that are not a paragraph.

    div p:not(.post)

    Will select all the paragraphs in the div, except those that have the class “post”.

    p:not(:empty)

    Will select all the paragraphs except the ones that are empty.

    p a:only-child

    Will select all paragraphs that have a single link, no paragraphs that have zero, two or more links.

    You can install assert_select as a plugin with:

    ./script/plugin install http://labnotes.org/svn/public/ruby/rails_plugins/assert_select

    To download the scrAPI toolkit for Ruby:

    gem install scrapi

    Or:

    svn export http://labnotes.org/svn/public/ruby/scrapi

    And if you have cool tricks for scraping that you’d like to share, leave a comment or e-mail me. I’d like to collect them all into a tips & tricks post (with attribution, of course).

    1. Jul 26th, 2006

      Labnotes » Blog Archive » Scraping with style: scrAPI toolkit for Ruby

      [...] Update: I just added the every useful pseudo classes (:nth-child, :empty, etc), check here for more details. Tagged: css, dsl, microformats, ruby, scrapi, scrappingShare and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages. [...]

    2. Jul 30th, 2006

      Labnotes » Blog Archive » assert_select plugin for Rails

      [...] Update: The new release of assert_select includes support for CSS pseudo classes such as nth-child, first-child, empty. More details here. It also supports nested assertions for dealing with lists, tables and forms. Some examples here. I updated the examples below to use nested asserts. [...]

    3. Aug 4th, 2006

      Andrew Turner

      After getting pointed to scrAPI from kingryan on #microformats, I wrote a Geo Class:


      class Geo :text }
      end

      class Location Geo
      result :geos
      end

      I’d like to try and do something like:

      class Microformat :text }
      end

      class Geo

      but that doesn't quite work.

    4. Aug 4th, 2006

      Assaf

      I think WordPress ate your code. So much for technical discussion over comments. Do you want to e-mail me instead, and we’ll post the results?

    5. Apr 26th, 2007

      Jared Howard

      Assaf,

      I’ve only seen this in assert_select but I’d love to use the HTML::Selector library not only in the test environment. I’d like to be able to gather the html from a site and use your library to parse through the html tags. I’ve been searching for someone doing this but with no luck. Is this possible to do?

    6. Apr 26th, 2007

      Assaf

      Jared, have a look at ScrAPI. It uses the same engine, but for scraping HTML content.

    Your comment, here ⇓