
Major update to both libraries.
I added a full test suite, and in the process caught and fixed a few bugs, like case sensitive (where it shouldn’t), group selectors not working as expected, and a few other small gotchas.
I also added pseudo classes from CSS 3. Pseudo classes are a bit tricky to explain, so let me show with some examples:
table tr:nth-child(odd)
Selects every other (odd) row in the table.
table tr:nth-child(-n+6)
Selects the first six rows in the table.
table tr:nth-child(6)
Selects the sixth row in the table.
table tr:first-child
Selects the first row in the table.
div p:first-child
Will almost work like you expect it to, but only if the paragraph is the first element in the div. Otherwise, it selects nothing.
div p:first-of-type
Will select the first paragraph in the div, ignoring any elements that are not a paragraph.
div p:not(.post)
Will select all the paragraphs in the div, except those that have the class “post”.
p:not(:empty)
Will select all the paragraphs except the ones that are empty.
p a:only-child
Will select all paragraphs that have a single link, no paragraphs that have zero, two or more links.
You can install assert_select as a plugin with:
./script/plugin install http://labnotes.org/svn/public/ruby/rails_plugins/assert_select
To download the scrAPI toolkit for Ruby:
gem install scrapi
Or:
svn export http://labnotes.org/svn/public/ruby/scrapi
And if you have cool tricks for scraping that you’d like to share, leave a comment or e-mail me. I’d like to collect them all into a tips & tricks post (with attribution, of course).
Labnotes » Blog Archive » Scraping with style: scrAPI toolkit for Ruby
Labnotes » Blog Archive » assert_select plugin for Rails