Parslet Intro

ruby / parser

Tonight I wanted to take some time off from Picky to write about Parslet, a parser construction library by my dear friend Kaspar Schiess.

tl;dr

  1. Parslet is great.
  2. gem install parslet
  3. Look at any of the examples.
  4. Try, learn, try again, profit!

What is it?

In Kaspar’s words: “A small Ruby library for constructing parsers in the PEG (Parsing Expression Grammar) fashion”.

A parser is used to transform text data into a semantically meaningful structure by injecting information based on assumptions on the text’s structure. For example, "Hello, Florian!" could be parsed into something like: [sentence: [greeting:hello, separation:comma, name:florian, mark:exclamation]].

It’s probably best if you just tried it for yourself.

Are there other parser constructors?

Yes, Citrus and Treetop. But let’s be frank here. Parslet eats these for breakfast in terms of ease of use and power, in my humble and almost unbiased opinion. Let me explain why.

Why is it so powerful and easy?

On the main page, Kaspar notes that Parslet is especially easy by “providing the best error reporting” and “not generating reams of code for you to debug”.

While both are certainly true, and I do not disagree, but I don’t think that that is what makes Parslet so easy or powerful. Surely easi-er, but the main reason I love it is that it harnesses the power of Ruby.

The second reason I consider it so great is that it split into a parser and a transformer step, with an intermediate syntax tree that is entirely in Ruby basic atoms, like hashes and arrays.

Why is this cool? To repeat my example, above: The parser would first parse "Hello, Florian!" into [sentence: [greeting:hello, separation:comma, name:florian, mark:exclamation]] and then, for example, a FrenchTransformer could be used to transform this into: Bonjour, Florian!, the french representation of the english input sentence. So first we get an intermediate semantic expression that we can then transform into something else. And there can be a lot of transformers starting from where the parser ended. Thinking about a SwedishTransformer or an ItalianTransformer? Me too. “Optimus Primo, transformate! Ciao!”

Or a chain of transformers that first take the intermediate tree and morph it into a different intermediate tree. The possibilities are endless.

Simple Example

Let’s consider a simple example. It is a subpart of the ERB parser and transformer that I wrote. ERB is a Ruby templating language by Seki Masatoshi.

We’ll look at the whole thing later on.

A simple ERB example would be ERB with a Ruby expression inside:

Hello, my name is<%= name >!

What we get out of the parser is the parts that are text, and the parts that are ruby code. So with parslet we’d write this:

require 'parslet'

class ErbParser < Parslet::Parser
  
  rule(:ruby_expression) { (str('%>').absnt? >> any).repeat.as(:ruby) }
  rule(:erb_with_tags) { str('<%=') >> ruby_expression >> str('%>') }
  
  rule(:text) { (str('<%=').absnt? >> any).repeat(1).as(:text) }
  
  rule(:text_with_ruby_expressions) { (text | erb_with_tags).repeat }
  root(:text_with_ruby_expressions)
end

p ErbParser.new.parse("Hello, my name is<%= name %>!")

Just run it :) What you get is a nice semantic tree:

[{:text=>"Hello, my name is"}, {:ruby=>" name "}, {:text=>"!"}]

Let me go through it in steps. I’ve found out that it is easiest for me to go top-down to define a parser. I hope this suits you too.

We define the starting point, aka the root of the parser with the root method:

root(:text_with_ruby_expressions)

This just says, start with the rule(:text_with_ruby_expressions).

So, now what we know about our simple-ERB language is that it is basically a sequence of text and ruby expressions, repeating. So let’s define that:

rule(:text_with_ruby_expressions) { (text | erb_with_tags).repeat }

So either we have text OR (|) a ruby expression. And we have that in a repeating fashion. Just as the rule says.

Let’s look at the text rule we just used:

rule(:text) { (str('<%=').absnt? >> any).repeat(1).as(:text) }

This means: As long as you don’t encounter a ERB start tag (<%=), keep taking everything as text. This will stop if it encounters a <%=.

At which point Parslet will try to apply the other rule:

rule(:erb_with_tags) { str('<%=') >> ruby_expression >> str('%>') }

This rule just matches anything with erb start <%= and end tags %> around it, with a ruby expression inside.

The ruby expression is simple:

rule(:ruby_expression) { (str('%>').absnt? >> any).repeat.as(:ruby) }

We know this already: As long as you don’t encounter an ERB end tag, keep consuming as ruby code.

Got it?

Again, if you run it, you get:

[{:text=>"Hello, my name is"}, {:ruby=>" name "}, {:text=>"!"}]

Niiice.

Let’s not think about the transform step for a second and look at some of the good shit.

Goodies that will blow your mind.

Parslet doesn’t force you to use a class. It’s totally ok to just do this:

include Parslet
parser = (str('Hello') | str('Hi')).as(:greeting)
p parser.parse('Hello')

In Parslet, you can run the parser with a subset of its rules:

p ErbParser.new.erb_with_tags.parse("<%= name %>")

while

p ErbParser.new.erb_with_tags.parse("Hello, <%= name %>!")

would fail since the erb_with_tags rule just covers text which starts with <%= and ends with %>.

Running a parse on a subrule works because a parser is composed of Parslets, or parser atoms, hence the name. str('hello') is one of these atoms, and so is a sequence of atoms, like str('no') >> str('kidding'). And you can do a parse directly with one of these, if you want, (str('Hello') | str('Hi')).parse('Hello') as we have seen before.

Did I say it’s pure Ruby? Why, yes! Let’s harness the power of Ruby, and combine it with the power of Parslet parser atoms.

I need a parser that is case insensitive regarding the string.

def case_insensitive string
  chars = string.split //
  chars.inject(str('')) do |parslet, char|
    parslet >> match("[#{char.downcase}|#{char.upcase}]")
  end
end

p case_insensitive('hello').parse('HeLLo')

What it does is generate this: match([h|H]) >> match([e|E]) >> match([l|L]) >> match([l|L]) >> match([o|O])

This returns me a case insensitive parser that I can directly use to parse the HeLLo. Or why not combine it with other parslets?

p (case_insensitive('hello') >> str(' ') >> str('Florian')).parse('HeLLo Florian')

Transforming

Can you take a quick look at the ERB parser, copy it into a script and give it a go?

As you can see, it’s not just able to parse text and ruby expressions (<%= ruby expression %>), but also comments (<%# comment %>) and normal ruby code (<% ruby %>) that both will not be inserted into the rendered text.

Now we’ll have a look at the transformer that will spit out rendered text:

evaluator = Parslet::Transform.new do
  
  erb_binding = binding
  
  rule(:code => { :ruby => simple(:ruby) }) { eval(ruby, erb_binding); '' }  
  rule(:expression => { :ruby => simple(:ruby) }) { eval(ruby, erb_binding) }
  rule(:comment => { :ruby => simple(:ruby) }) { '' }
  
  rule(:text => simple(:text)) { text }
  rule(:text => sequence(:texts)) { texts.join }
  
end

Ignore for now the part where bindings are used.

A transformer consists of a number of rules. And a rule consists of a part that recognizes structure in the semantic tree, and a block which tells the transformer what to do with the recognized thing. Got it? So this rule,

rule(:text => sequence(:texts)) { texts.join }

recognizes hashes that look like :text => sequence(:texts), sequences of things that are denoted as text. The identifier :texts is used in the block where we tell the transformer what to do: { texts.join }. So what we do is simple, we just join a sequence of texts together.

Another rule, the comment rule,

rule(:comment => { :ruby => simple(:ruby) }) { '' }

will return just nothing.

Now, if we want to parse and transform something like this:

The <% a = 2 %>not printed result of "a = 2".
The <%# a = 1 %>not printed non-evaluated comment "a = 1", see the value of a below.
The <%= 'nicely' %> printed result.
The <% b = 3 %>value of a is <%= a %>, and b is <%= b %>.

It gets a little more complicated. If you look at line 1, you see that a is given a value of 2. And then we will use that value in line 4, where we put the result of 2 into the rendered template. Have you tried it? No? Run it and see :)

Remembering State

If you want the transformer rules to remember values in between transformations – like the a that is set to 2, above, you’ll need state of some sort.

I can show you the way I did it with the ERB transformer. I’m sure you can think of many others that are perhaps safer, more powerful, or simply cleaner. But for now, we’ll have a look at this:

evaluator = Parslet::Transform.new do
  
  erb_binding = binding
  
  rule(:code => { :ruby => simple(:ruby) }) { eval(ruby, erb_binding); '' }
  
  ...

end

What happens here? First, I assign the binding of the block to erb_binding:

erb_binding = binding

This is the object where we will safe the state.

It’s a good thing for me that the rule method uses a block to define what to do when encountering a rule. Why? Well, since it is a block, the local variable erb_binding is bound in the context of the block, which means that I have easy access to it in { eval(ruby, erb_binding); '' }.

So what I do with

eval(ruby, erb_binding); ''

is: Evaluate the code piece that I get in the variable ruby, and evaluate it with the binding I have saved. Then, I return an empty string since <% ruby code %> should not write anything into the resulting rendered template.

Not so in the expression:

rule(:expression => { :ruby => simple(:ruby) }) { eval(ruby, erb_binding) }

Here I return whatever the evaluation returned to be inserted into the rendered result.

Isn’t it nice? And between parser and transformer I was able to look at my nice semantic tree, to check that everything is a-ok.

Writing tests, as everything is in Ruby, is a breeze, as you can imagine!

Conclusion

My personal conclusion is that this thing is here to stay.

Not only is it easy to use, but you have the full power of Ruby available to write parsers, comfortably.

It already has garnered the attention of quite a few excellent Rubyists – the hard core of parslet users – which hang out at the #parslet IRC channel.

So we’ve seen

  1. that Parslet harnishes Ruby’s powers for success and profit.
  2. that it offers a parser constructor AND a transformer constructor, which is a good thing.
  3. that trying it yourself is fun and a piece of cake.
  4. And: That using bindings is crazy fun when used at the right place :)

Hope you learnt something new :)

Next Running Sinatra inside a Ruby Gem

Share


Previous

Comments?