Why not F#?
This post is a response to a discussion that was held on twitter as a result of me stating that the next version of IronJS will be completely in C# and not contain any F# code at all. Before I explain my exact reasoning behind this I want to state that I find F# to be an excellent language and it’s a joy to work in, Don Syme & Co have done an excellent job with it. If you have followed IronJS over the past two years it should be clear that I have tried really hard to make it work in F#, but as a firm believer in “the right tool for the job” I’ve come to the conclusion that it’s not a feasible to build a high performance dynamic language on top of F# while staying true to F# itself.
The key statement here really is “while staying true to F# itself”, as both F# and C# run on top of .NET they have (at least in theory) the exact same performance characteristics. F# has access to the same things as C# in terms of .NET “low level” features such as native objects, mutability, structs, p/invoke calls, etc. It even adds a couple of things on top of this that C# doesn’t have such as the inline keyword and the ability to mix code and IL instructions inside source files.
But F# adds so much more on top of C#, such as: Immutable Values, Computation Expressions, Pattern Matching, Discriminated Unions, Fast Native Functions, etc. – this list can be made very long. But if you inspect all of these features, you will notice that they are all abstractions on top of the existing .NET functionality (or a compiler feature in the case of immutable values). While these abstractions allow you to write very elegant and concise code, they are often slow. For example with computation expressions and discriminated unions you can create a very elegant AST definition and an equally gorgeous parser, as demonstrated by Matthew Manela.
But again, the parser will be very slow compared to one that is written in hand optimized C#. The AST is immutable, so when you want to optimize a node far down in the AST you need to throw away and re-create every parent node to it, this ends up being slow also due to the overhead of the garbage collector and heap allocations. Now, if you don’t need the absolute best performance you could squeeze out of .NET (which most don’t) then all of these constructs will give you clear code, that is easy to understand and reason about. But in the case of IronJS not being able to squeeze all the performance out of the .NET run-time is not an option.
A lot of people will now say something like: “But F# supports mutability, native .NET objects, etc”. Yes, it does. But the “mutability/imperative/OO” features in F# are hamstrung by several crippling issues, and what you end up with is what feels like a slightly crappier version of C#. Just to mention a few things (there are a lot more, both major and minor):
- Mutually Recursive types must be defined in the same file, so when two classes need to be able to refer to each other they need to be right next to each other in the source also (I understand why F# works like this). This is fine for a few grouped objects, but when you end up with a 2.5k line behemoth that used to be Runtime.fs (open at your own risk) inside IronJS, it’s not fun any more. Whenever I bring this up in ##fsharp on irc.freenode.net everyone tells me that this is due to bad library design. While there are several ways to break apart two classes from each other, they all end up being slow compared to just directly accessing a field on the object instance. Maybe it’s fast enough for you, but it’s not fast enough for IronJS.
- Constructors if used incorrectly will cause a run-time over head on every method call, to verify that the class has been initialized properly. There are also multiple other problems with initializing an instance inside both the implicit and explicit constructor syntax.
- Interfaces can only be implemented explicitly, so you end up having to cast your objects into their interface type constantly.
- Collections the specialized F# collections are slow compared to their BCL counter-parts, especially
Map<K, V>
compared toDictionary<K, V>
, while I understand why they are slower, it doesn’t change the fact that they are.
Again, a lot of people will say something like: “All things you said is true, but just use mutability where you need it for speed and stick to immutability everywhere else”. Oh, how I wish it was this simple. The problem with mutability is that once you let a little of is get in, it spreads like a wild-fire through out your code. Even if a piece of code itself looks immutable, if it ends up depending on something deep down in your code base that uses mutability, then it is by definition is not immutable any more.
In conclusion: I love F#, it’s a great language and if you can stick to what it does really well (immutability and FP) it’s a joy to work in. But as soon as the mutability starts creeping it’s way into the code due to performance reasons (which is exactly what happened to IronJS) then it falls apart very fast and you end up with code that is hamstrung by several crippling issues and that is very hard to follow due to a bunch of quirks in the F# syntax. I usually describe F# OO as a bad version of C#.
My gripes with JavaScript
Note: This post doesn’t mean I’ll stop working on IronJS, I love working on IronJS and will continue to do so to make it as fast and awesome as possible.
Apparently my recent appearance on hanselminutes caused some stir on twitter. People think I was laying to heavily into JavaScript as a feasible platform for server side development (node.js, etc.).
This might sound odd coming from someone that has built a JavaScript runtime, but my point of view after having developed IronJS is that there are a couple of critical problems with JavaScript that prevents it from ever being a viable alternative as development platform for server application development.
Lack of language defined modules and namespaces
The language doesn’t define the concept of a module or namespaces, several work-arounds exist like require in node.js but there are several problems with using something with require.
- It’s implementation dependent, which means it’s not standardized and you can’t count on it existing in every javascript implementation.
- You can re-write anything in any namespace at will, some people like this since it allows for something called monkey-patching. However this just leads to a whole new can of problems.
- You can replace the require function, or any other function for that matter, at a whim or by mistake leading to very hard to track bugs.
ECMA specification doesn’t define how to organize code over several files
This ties into the previous modules and namespaces, but it’s such a major thing thing it has to be mentioned separately. The specification does not define how to load code from a file, this might seem like a minor issue as node “solved” it with require. But in reality the possible issues that can arise here are many.
For example, as already mentioned, require is specific to node.js and doesn’t exist on any other major platforms. There are major problems with having non-standardized and implementation specific behavior that effect such crucial parts of a language as loading code.
The most obvious one is how the code actually is loaded, for example:
require("foo"); require("foo");
Is the “foo” module/file loaded/executed once or twice?
There’s also more subtle problems, such as: When you use require to load a piece of code, is the code executed in the global environment or the loading context?
(function foo(bar) { require("foo"); }({x: 1}));
Is the code inside the “foo” module/file executed in the global scope or in the context of the foo function with “foo” and “bar” already bound? Most people will probably say “in the global scope”, but the designers of the PHP include probably won’t agree.
And before you mention CommonJS: It is what some people think is correct and does in no way represent some type of standard
Very small standard library
The standard library, as defined by the specification is incredibly small (compared to the Python standard library, or the .NET BCL, etc.) and only gives you access to the most basic operations, not even I/O is included. Sure you can build your own library for a specific implementation, but it’s not going to be universal and it will tie the code that relies on it to that specific implementation.
Language problems
If you’ve ever written any moderate amount of JavaScript code you’ll know that there are several problems with the language itself, things like with, eval or the inner quirks of how JavaScript applies the concept of equality. The http://wtfjs.com/ is a good read for some more WTF? moments.
Null vs. undefined
JavaScript has two “this is not here/doesn’t exist” values, they’re subtly different and undefined is the one that is most common. But why would you even have two different types of ‘nil’ to begin with? I know of no other language in existence that does this (I’m sure someone reading this is going to dig up another language that has this).
Context sensitive function keyword
In case you didn’t know, these two functions are not identical:
(function bar() { }) function foo() { }
Finding out the difference I’ll leave as an exercise to the reader.
Limited set of data types
- Only 8 byte float numbers, how would you even interface to a database schema that has a 64bit integer column?
- No fast (as in native) arrays
- No int, byte, etc. number types
Fragmented runtimes
While pretty much every single one of these problems could be fixed by either extending the specific runtime you’re using or by ECMA releasing a new standard, it’s just not that simple. Assuming you extend the runtime you’re using to allow for things like native/fast arrays and you define a module/namespace solution like node.js has done, now any code you write will only run on that specific runtime with your extensions which punches a big hole in the “same language everywhere”-argument that you hear every day on twitter (and it really is server-side JavaScript’s only claim to fame).
So what if ECMA releases a new standard that fixes every single problem I’ve listed above (this is no way an exhaustive list)? I can currently count to about six different runtimes in use today (JScript, JeagerMonkey, TraceMonkey, V8, Chakra, Carakan, Rhino). And that’s not even counting the small emerging ones like IronJS, Jurassic, Jint, etc. that are platform specific implementations for embedding. A lot of these runtimes are available in different versions in different browsers and will never be upgraded, so you’ll have to exclude all those new features in our new utopia-style ECMA specification if you wan’t to be “cross-runtime” compatible (which you need for web development at least).
But what about node.js?
First I want to make clear that IronJS is in no way a competitor to node.js, IronJS is a runtime – node.js is an application server that uses another runtime (V8).
Node is decently fast, sure. But it’s nowhere near as breathtakingly fast as it’s zealots want you to believe. Nor are any of the ideas it employs new or groundbreaking. It also comes with several warts that have been inherited from JavaScript which forces you to do manual continuation-passing style. I just can’t see the reasoning behind using a language that wasn’t designed to be used in this context, it just feels like one big hack (albeit a pretty well preforming hack).
Assuming you want to run an asynchronous SQL query in node, it’d go something like this:
db.query("select * from person", function(result) { print(result); });
Now compare this to a language that actually was designed (F#):
async { let! result = db.query "select * from person" print result }
Conclusion
The way I see it is that JavaScript has dug itself into a hole that is impossible to get out of, at least if you wan’t to keep “same language/code everywhere” idea alive. And if you’re not, then why would you use JavaScript when so many better (and faster) alternatives exist? There are so many problems with the language, I’ll list a couple more that I haven’t touched on but I think this image illustrates my point of view better then any words could:
- The scoping of the this keyword
- Switch case fall through
- Automatic semicolon insertion
- Bitwise operators that work on doubles
- Type wrappers and type conversions, new String(“foo”) vs. “foo”
- The new keyword makes function behavior dependent on the context they’re called from
- with and eval (it doesn’t hurt to mention these again)
- The arguments object which is an array, almost. Except it’s not. And it’s also magically bound to the parameter variables
- The typeof operator
- The global object
IronJS is now faster than IE8
We just hit a pretty major milestone in the dev branch of IronJS: We’ve surpassed IE8 in performance. I’ll try to keep this short and get straight to the point (the benchmarks), here is IronJS side-by-side to IE8, Jurassic and Jint (two other .NET based JavaScript runtimes).
Note the Jint failed on several tests and those have a zero as result. Click on the image for full size.
We’ve still got a long way to go until we reach our goal (within 300% of V8), but it’s looking good so far!
Update: Here’s the total test score also
IronJS 0.2 is out
IronJS 0.2
IronJS is an ECMAScript 3.0 implementation built on top of the Dynamic Language Runtime from Microsoft which allows you to embed a javascript runtime into your .NET applications.
Thanks to
- John Gietzen for all his work on ECMA3 conformance whom without this would not have been possible
- Christian Knutsson for the awesome logo
Changelog
- ECMA3 Conformance
- Added 5,500 ECMA3 conformance tests with over 30,000 assertions
- Major API refactoring moving all Module/Functions located in Api.fs to their appropiate classes in Core.fs instead
- Re-implemented the AST analyzer to make it single pass
- Re-implemented variable handling allowing for faster and easier compilation
- Removed the dependancy of Microsoft.Dynamic.dll for CLR4 projects
- Removed dependancy on FSKit
- Implemented the Date object
- Implemented the RegExp object
- Implemented missing functionality on String.prototype (match, split, search, replace)
- Implemented F# operators ? and ?
- Implemented F# operators for all common binary DLR expressions
- Implemented a Sputnik test suite runner, courtesy of John Gietzen
- Implemented a proper REPL console, available in the aptly named “REPL” project.
- Implemented dynamic invoke operators for calling IronJS functions with an unknown amount of arguments
- Implemented a new F# based lexer and parser which allows IronJS to drop the dependencies on Xebic.ES3.dll and Antlr.Runtime.dll
- Cleaned up and removed a lot of old/redundant code
- Renamed ObjectClass to “Schema” and split out the dynamic functionallity into it’s own DynamicSchema class
- Replaced the FunctionCompiler class with an F# function with the signature IronJS.FunctionObject -> System.Type -> #Delegate
- A lot of smaller improvements to code stability and readability
- Added debug constructs in big parts of the codebase that only gets compiled when the DEBUG flag is set
- Refactored several constructors in the IronJS.Ast.Tree union to be more obvious
- Unified error handling, so it all passes through IronJS.Error and its members
Binary
Source
Information
- IronJS Blog
- IRC: #ironjs @ irc.freenode.net
- Twitter: @ironjs
JavaScript Quotations
Lately I’ve been working on the ECMA3 conformance in IronJS, but last night I did a small side-tour into something completely different: JavaScript Quotations. The ideal is similar to the one found in F# code quotations or Lisp macros, not as evolved as any of them though – but still pretty nice. I wanna say right now that this is my own extension to the JavaScript language and it’s not something you can do in any browser or other implementation (that I know of).
What it does is it introduces a new symbol, @ – stolen from F#, which gives you access to the syntax tree of a function during runtime and allows you to modify it as you see fit and then compile it to a regular JavaScript function. While it could be abused to no end it allows for some pretty interesting possibilities. The example I’m going to show creates a function which reads a property out of an object, and optionally compiles a console.log statement into the function body.
function makeLoggedPropertyReader(includeLog, propertyName) { // Note the @ symbol infront of the function keyword var quoted = @function (x) { if(x) { console.log(x); } return x._; }; // This is how the quoted structure looks like, // it's basically a syntax tree that you can // traverse, modify as you see fit and then compile /* quoted = { type: 19, // function body: [ { type: 18, // if statement test: { type: 5, // identifier value: "x" }, trueBranch: [ { type: 9, // method call target: { type: 5, // identifier value: "console" }, member: { type: 5, // identifier value: "log" }, arguments: [ { type: 5, // identifier value: "x" } ] } ], elseBranch: { type: 0 // void node } }, { type: 25, // return value: { type: 43, // property accessor object: { type: 5, // identifier value: "x" }, name: { type: 5, // identifier value: "_" // the value we're going to replace } } } ] } */ // The first statement in the qouted body // is the ifStatement, which we will conditionally // remove depending on the boolean value of includeLog if(!includeLog) { quoted.body[0] = Quotations.voidStatement(); } // Pull the second statement out of the function body var returnStmt = quoted.body[1]; // Pull the value node out of the return statement var propertyAccessor = returnStmt.value; // Set the value of the "name" node of the property accessor // to the string value of the propertyName that is passed in propertyAccessor.name.value = propertyName.toString(); // We've modified our qouted expression // and we can now compile it so it // becomes a return quoted.compile(); } // And here we'll use it: var logged = makeLoggedPropertyReader(true, "myProp"); var notLogged = makeLoggedPropertyReader(false, "myProp"); var myObj = {myProp: "hello world"} var xValue = logged(myObj); // will return and print the value of myProp to console.log var xValue = notLogged(myObj); // will only return the value of myProp
Analyzer: Single-Pass vs. Multi-Pass
I recently wrote about the new lexer and parser in IronJS, giving a 8x performance boost to parsing. Over the past two days I’ve been looking at the AST analyzer IronJS has been using, and how to improve it. The analyzer steps through the AST produced by the parser and figures out things like closures, static types, dynamic scopes, etc. Due to the non-mutable nature of discriminated unions in F# it has been forced to re-build the syntax tree to resolve everything it needs to, since it sometimes required changes to, it has also been a doing several passes over the syntax tree.
I’m glad to announce that with some clever use of reference cells I’ve been able to both eliminate the need to re-build the AST and also, due to having access to the internals of the new TDOP-based parser, manged to make it require only a single pass over the syntax tree, the performance difference is pretty staggering.
As you can see with both the new parser and analyzer it’s a whopping ~13x faster then the old ANTLR based parser and multi-pass analyzer. It’s also ~4x faster then the new parser with the old multi-pass analyzer.
New lexer and parser in IronJS
I’ve been thinking about replacing the lexer and parser IronJS has been using for a year now, which is an ANTLR generated LL(*) parser. The main drive behind this is that I’ve been wanting to shed the two DLL dependencies the parser caused, first the runtime for ANTLR (Antlr3.Runtime.dll) and then the parser itself (Xebic.ES3.dll) – since it was C# it wasn’t possible to integrate the code into the IronJS F# code base and it had to be linked as a separate assembly.
I’m glad to announce that I’ve finally gotten around to do this and that the new F# based lexer and parser were pushed to the master branch on github earlier today. I also decided to remove the dependency on Microsoft.Dynamic.dll which I only did about ten calls into. This means IronJS now only requires FSKit other then itself, the plan is to merge the FSKit functionality that IronJS requires into the IronJS project itself so it will only be one DLL.
Another great benefit of rewriting the parser in F# is a pretty nice speed boost, if I can direct your attention to the chart below you will see that the new lexer and parser is about eight times faster on the jQuery 1.5.1 (uncompressed) source code. This of course means that IronJS i getting even faster then it was.
Also, keep your eyes open for the first 0.2 beta that will arrive shortly.
Update:
I got a question on IRC on how the profiling was done, so here’s a description of it.
- System.Threading.Thread.CurrentThread.Priority was set to System.Threading.ThreadPriority.Highest
- Timing was done with the System.Diagnostics.Stopwatch class
- The source code was loaded into memory before lexing and parsing so no disk penalty would occur
- The machine, which is a i7 Quad Core with 8Gb of ram, was restarted between each test and as many processes as possible were killed when Windows was done booting
- The projects were compiled in release mode with full optimizations and no debug info
- Each test was ran ten times before timing started to make sure all assemblies were loaded in memory and there would be no JIT overhead
- After the ten warm-up runs the test was ran 100 times, the ten fastest were picked and averaged
If there are any flaws in the above process please do point them out and I will re-do the test and post new results.