SELF Just-In-Time compiler optimizations
Being interested about programming language design, I read the "The Design and Implementation of the SELF Compiler, an Optimizing Compiler for Object-Oriented Programming Languages". Despite that it was rather old paper, from 1992, it was worth reading.
I'm studying Self and it's optimizing compiler because I'd like to design and implement my own programming environment. I'm somewhat satisfied to the existing higher-order programming languages there are. These languages are more abstract, easier to program than C or C++. In Self you don't need to insert every damn irrelevant, obvious, superfluous type declaration you just can imagine. It allows you to concentrate on meaningful details.
The paper claims that benchmarks they used reached performance third to half speed of the equivalent C programs. They didn't get there by accident, so a paper that describes what they were doing and why can be thought to be quite valuable for someone wanting to design something similar.
After reading the paper, the main challenge you seem to have when implementing Javascript or Self is that the source code in itself is very generic. The same function accepts many kind of objects, which requires dropping in lot of tests and method lookups into the objects. Eventually when evaluating, you might notice that the code only works on certain kind of objects. The JIT compiler in Self is able to form specialized code that doesn't need to do as much tests and method lookups, but can be run in place of the generic code for the situations that arise. The memory used by the programs is organised such that it can be accessed efficiently from the specialized code.
I got the impression that the optimizations done by the JIT compiler have to be fast and carefully chosen, although they are inspired by traditional compiling techniques. To do the same, you have to understand exactly what slows down in the runtime. It's not obvious without studying a live system. They must have had rich diagnostic and profiling tools early on for this, while designing the language. The profiling is also necessary because JIT can't be efficient if it just compiles everything that comes in. Otherwise the time gets spent in compiling instead of evaluating the task at hand.
Reading it out made me to realise that good language implementation and design is mostly a showcase of well done optimization practise, after all. There's a famous statement about optimization, made by Donald Knuth:
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil
About 3% of the program spends the 90% of the execution time. Also known by the name Pareto principle. To get the best performance, you have to pinpoint your optimizations to the hotspot of the program. I dare to claim that there doesn't exist other kind of optimizations, because optimization efforts that are done based on a hunch do not really optimize it. Every optimization that misses the hotspot will increase code size, sometimes executable size, with marginal performance benefits. In worst case you're attempting to optimize the program before you have designed or written it down. Your prematurely optimized C++ program might end up trashing the cache, resulting in slower performance than the unoptimized program running on CPython.
So if you want to implement a well written, good, fast programming environment, start with the performance analysers and simple design!