IronJS Blog

All things IronJS and F#

New lexer and parser in IronJS

with 5 comments

I’ve been thinking about replacing the lexer and parser IronJS has been using for a year now, which is an ANTLR generated LL(*) parser. The main drive behind this is that I’ve been wanting to shed the two DLL dependencies the parser caused, first the runtime for ANTLR (Antlr3.Runtime.dll) and then the parser itself (Xebic.ES3.dll) – since it was C# it wasn’t possible to integrate the code into the IronJS F# code base and it had to be linked as a separate assembly.

I’m glad to announce that I’ve finally gotten around to do this and that the new F# based lexer and parser were pushed to the master branch on github earlier today. I also decided to remove the dependency on Microsoft.Dynamic.dll which I only did about ten calls into. This means IronJS now only requires FSKit other then itself, the plan is to merge the FSKit functionality that IronJS requires into the IronJS project itself so it will only be one DLL.

Another great benefit of rewriting the parser in F# is a pretty nice speed boost, if I can direct your attention to the chart below you will see that the new lexer and parser is about eight times faster on the jQuery 1.5.1 (uncompressed) source code. This of course means that IronJS i getting even faster then it was.

Lexing and parsing speed

Also, keep your eyes open for the first 0.2 beta that will arrive shortly.

Update:

I got a question on IRC on how the profiling was done, so here’s a description of it.

  • System.Threading.Thread.CurrentThread.Priority was set to System.Threading.ThreadPriority.Highest
  • Timing was done with the System.Diagnostics.Stopwatch class
  • The source code was loaded into memory before lexing and parsing so no disk penalty would occur
  • The machine, which is a i7 Quad Core with 8Gb of ram, was restarted between each test and as many processes as possible were killed when Windows was done booting
  • The projects were compiled in release mode with full optimizations and no debug info
  • Each test was ran ten times before timing started to make sure all assemblies were loaded in memory and there would be no JIT overhead
  • After the ten warm-up runs the test was ran 100 times, the ten fastest were picked and averaged

If there are any flaws in the above process please do point them out and I will re-do the test and post new results.

About these ads

Written by Fredrik Holmström

March 19, 2011 at 8:46 pm

Posted in F#, IronJS

5 Responses

Subscribe to comments with RSS.

  1. Hi,

    I have a keen interest in a correct F# parser for JavaScript. Right now having a look at yours:

    https://github.com/fholm/IronJS/blob/master/Src/IronJS/Compiler.Parser.fs

    So is it a recursive-descent parser rewritten by hand? Why exactly do you think it is outperforming the ANTLR-based one?

    I have been following your project and in fact have tried using the Xebic grammar. Unfortunately it is broken/incorrect, for example it does not parse “new new f()()”. I had to fix that myself but then I am not sure it is correct for other cases as well.

    So the question is: what assurance do you have that your hand-written parser parses the ES3 language correctly? There are many corner-cases..

    Do the sputnik or other test suites cover parsing corner-cases?

    Anton Tayanovskyy

    March 20, 2011 at 11:56 pm

  2. BTW if all you wanted to do is drop the dependencies, why not just –staticlink:Antlr.Runtime.dll ?

    If there are no issues with licensing..

    Anton Tayanovskyy

    March 21, 2011 at 12:00 am

  3. @Anton:

    There were other reasons for wanting to drop the Xebic grammar also, for example the one you mention: It doesn’t handle all corner cases.

    It’s not a recursive-descent parser, rather it’s a top down operator precedence parser or pratt parser (http://en.wikipedia.org/wiki/Pratt_parser), TDOP-parsers is an evolution of recursive descent parsers.

    About the speed, I’ve done some benchmarks and about 30-40% of the speed gain comes from the lexer which I’ve used a bunch of F# specific tricks to make really fast and eliminate unnecessary calls, the other speed improvements seem to come from how easily and efficiently a Pratt parser deals with expressions.

    About correctness, I have a bunch of edge-case tests that I’ve evolved from the Xebic grammar (things it failed to parse) and there are also quite a few in the sputnik tests. I’ve been doing my best to verify correctness and so far it seems pretty darn solid. The technique used (Pratt parsing) is also very solid in terms of correctness, it’s very hard to introduce ambiguity due to how the parser is constructed.

    Regards,
    Fredrik

    Fredrik Holmström

    March 21, 2011 at 8:26 am

  4. Fredrik,

    this sounds great, I think I will give it a go.
    Too bad I wasted a weekend tinkering with ANTLR and trying to write a correct JS grammar for it :) Maybe I will still try to finish it just for the fun.

    So the F#-specific tricks in the lexer that I can see is aggressive inlining – is that it, or there is more notable stuff?

    Speaking of Pratt parsing – to me it seemed that expressions were not so much of a problem for ES3 as were (1) semicolon insertion; (2) dynamic switching between two lexer grammars – with rx and without; (3) unicode stuff – a hand-written lexer is handy here. Looking forward to studying your parser more.

    And here are some tests from me:

    function assert(x) {
    if (!x) throw “Fail”;
    }

    function test1() {
    var f = 1, a = 0.5, b = 0.5, r =
    f
    /a/b

    assert(r == 4);
    }

    function test2() {
    var a = {switch: 1, new: 2};
    assert(a.switch == 1);
    assert(a.new == 2);
    }

    function test3() {
    function f() {
    return function g() {
    this.k = 1;
    }
    }
    var o = new new f()()
    assert(o.k == 1);
    }

    Anton Tayanovskyy

    March 21, 2011 at 12:01 pm

  5. […] recently wrote about the new lexer and parser in IronJS, giving a 8x performance boost to parsing. Over the past two days I’ve been looking […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: