1. Jul 26th, 2006

    Parsing expressions with sub! and blocks

    To get CSS pseudo classes working for scrAPI and assert_select, I had to rewrite the CSS selector parser. I’m sure there’s a Lex and Yacc for Ruby somewhere, but I ended up with a much simpler solution. One I can actually read and fix.

    I ended up using sub! and blocks.

    Starting with the current expression, I simply sub! the token I’m looking for, testing if it exists and removing it at the same time, reducing the expression by one token.

    To test if :empty comes next:

    if statement.sub!(/^:empty, "")
      @pseudo << some code
      next
    end

    To deal with tokens that have values, I use blocks. For example:

    next if statement.sub!(/^#(w+)/) do |match|
      id = $1
      attributes << ["id", id]
      ""
    end

    Again, test for a match, do something with the token, and reduce the expression.

    In CSS selectors, identifiers, class names, attributes and pseudo classes can come in any order. So a loop repeats on the expression until it doesn’t find any token it recognizes, or there’s nothing left to parse.

    Inside the loop, I could use if and elsif, but I found it’s easier to keep the code readable (less indentation) by repeating on each match and breaking at the end. So the loop looks something like:

    while true
      next if statement.sub!(/^#(w+)/) do |match|
        # handle ID
        ""
      end
        next if statement.sub!(/^.(w+)/) do |match|
        # handle class name
        ""
      end
      # And so forth.
      break
    end

    If you’re interested, you can check the code here.

    Your comment, here ⇓