This article is a case study of the performance impact of improving bnf-parser library to be able to take a given BNF syntax input and compile it all the way down to an optimised parser in wasm for execution - to improve parse times of arbitrary syntaxes.
For each round of testing we will parse two rather complex BNFs (sequalize, lolcode) via three different parsing methods sequentially. Measuring the total parse time for each method using parse hooks. The library itself is actually boot strapped (it compiles and uses itself), and the first stage of compiling a BNF syntax is parsing it - so this is a valid test case for potential parsers generated by the library.
All three parse methods are tested per round to keep the execution of each method closely coupled to each other to mitigate the impacts of background processes, and V8 optimisations so that these external factors will hopefully affect each of the parsers similarly.
The first two parsers are actually the same wasm compiled parser in two different modes, the first with source mapping enabled, and the second one without. Source mapping is an optional extra parse which can be applied to the wasm parser which correctly maps syntax nodes of the tree to the
index (don't get me started on UTF-16) it spans. This is an optional extra parse in bnf-parser because it allows the parser to not waste time allocating extra reference objects which aren't necessary for applications which don't need syntax error messages.
We ran the testing round
10,000 consecutive times, gathering results using NodeJS perf_hooks, we then also ran the tests a second time with more in-depth hooks put into the bnf-parser artifacts to see exactly what's going on under the hood. These performance measurements where not taken in the tests comparing the different parsers as the act of measuring them would heavily impact the overall performance of the wasm results, they're just that fast.
From the results below we can see a few interesting trends, both the
wasm w/ mapping parser and the
legacy parse both have significantly higher 99% execution times than their 1%. This is due to the fact both of them are receiving a lot of love from V8's excellent optimiser. For the first couple of runs it's slow, but once the JIT realises it's doing the same thing many times it starts to optimise it.
We also did a test where after each testing round we attempted to parse another random non bnf syntax, to see if it threw off V8's optimisations due to the graph traversal functions now running on a different graph to the one it was optimised for. However that had no negligible effect.
Comparing the median
legacy times to
wasm no map we can see an approximate
2.56x - however only
legacy is generating
SyntaxNode references, so it's not a fair comparison of equivalent compute. Comparing the
wasm parser with source mapping to
legacy we see only a
And that might get you thinking - wow, JS really isn't that slow. But comparing
wasm to raw JS performance isn't fair either, because you're missing a step. You need to move data in and out of the JS world to the
wasm instance, and that has a tax.
The transport tax
There are four main stages of using the
wasminstance's memory, and also tell the instance how long the data you just put in is
- Parsing: This is the actual work we want to complete, this is iterating over the string and generating the entire syntax tree
- Decoding: We want to be able to use that tree in JS, so we need to load it back out to be useful - bring it back over to JS land.
- Mapping: This is generating the source references for a given syntax tree, based on the input data.
It's important to note that the
wasm, because the computation is super simple, you're just iterating forward over a string counting the index as you depth first traverse over the syntax tree filling in the references (this is done using stack operations, so it's a single function call to save on the extra tax of recursive calls in JS).
Since the majority of the complex work being done is simply allocating new objects to store the reference at each point - there will be no real time-saved by doing this in WASM, and any time saved will be mostly lost due to the data transfer tax.
From this data we can see we are spending
40.03% of our time just moving data between JS to WASM land - that's almost half of the entire computation.
We can also see the other
55.41% is taken up by generating the source references.
7.70% of the time we took to run this parse actually computing the syntax tree!!!
The reason this tax for transferring data is so high is painfully illustrated by the difference in compute time between
The reason is object allocation.
In any statically typed language if you allocate a
struct which has another
SyntaxNode, has a
ReferenceRange which contains two
Reference objects - so that means if you want to allocate a
SyntaxNode and fill in all of it's children, that's actually
4 allocations, not
The reason decoding is able to be as fast as it is; is because of object reuse. By default every single
SyntaxNode actually shares the same
ReferenceRange instance, that means that range and it's two children only need to be allocated once, and every
SyntaxNode gets a
ReferenceRange so now you don't need to do null checks everywhere - and we only have one allocation per node.
But when you run the source map over the syntax tree, now for every single
SyntaxNode you have to perform
Reference, and end
Part of the reason the execution in
wasm is actually so fast is because it only does one allocation, the entire tree itself.
The entire tree represented in
wasm is actually flat packed into linear memory. And since after every parse the data is read out, we don't need the previous tree after each parse - so we can just write over it. So we have zero allocations because we just use the same single allocation. In other languages line
C++ you could allocate a vector a factor or two larger than your estimated tree size, then compute your flat tree, then shrink afterwards. Two allocations.
Can Wasm still work as an accelerator?
- Take something from the network
- Perform a small amount of manipulation
- Send it out to the network, or write to the DOM.
JS is primarily a middle man language, attempting to use it like python to abstract the middle man's duty to another person creating another middle man leads to very little performance gain, and a whole lot of headache. Try talking to the tech support of any major tech company and you'll see what I mean.
Wasm also has another extra headache. It's security focused, meaning every wasm instance has it's own independent memory, that means two different wasm libraries can't actually share memory, unless they are recompiled together, or you parse data between wasm libraries much like you do from JS to wasm.
Plus the majority of wasm compiled modules don't actually play nicely together, they're compiled to rule their sandpit, and no one else can enter. If you attempted to bring a C++ library and Rust library into the same WASM module, who's
malloc implementation are we using? There is only one linear memory, and they can't both be operating on the same space.
Who's are we choosing? How do we choose? How do each of the children know who was chosen?
Wasm is what people wanted docker to be
Wasm is a really powerful tool, but I think people are miss understanding where it's heading and what it will be great for. No it will not be great for bringing that one Rust library to use in your TS workflow. It's better to think of it like a super light weight and actually portable docker container that can execute anywhere.
You can bring that container into the browser to act as your front end, or you can have it running as micro-service, or the entire backend.
What it's not is a way to use language
X's library in language