See also
This takes a lot of content from the Wikipedia article for inlining.
Inlining replaces a function call site with the body of the called function. This occurs during compilation. Notably, it can have complicated effects on performance!
Performance impact
The direct benefits of inlining are:
- No function call overhead. No arguments on the stack, no branch or jump, etc.
- Reduced register spilling. We don’t need to use registers to pass arguments, meaning we don’t need to move variables out of registers.
- Reduced indirection. If using call by reference, we don’t need to pass references and then dereference them.
- Potentially improved locality of instructions. Since we eliminate branches and keep code that’s executed together close in memory. However, this isn’t always the case; see below.
However, the main benefit we unlock is being able to take into account the inlined body of the function is optimizations; they’re no longer separated across function boundaries!
- We can do constant propagation and loop-invariant code motion into the procedure body (see partial-redundancy elimination)
- We can do dead-code elimination (see liveness analysis)
- Register allocation can be done across the larger body
- Higher-level optimizations like escape analysis and tail duplication can be performed on a larger scope and be more effective.
For example, if we have this code:
…we can inline the predicate, yielding:
…which, with dead-code elimination, gets us to:
The drawbacks are potentially increased code size and worse instruction cache performance due to code expansion. If the expanded code pushes out of L1 cache capacity, performance can get way worse!
Compilers choose based on programmer hints and their own heuristics. For instance, Java HotSpot uses dynamic runtime profiling to figure out what to inline. Additionally, JIT compilers can:
- dynamically speculate on which code paths will result in the best exec time improvement;
- dynamically adjust their cost heuristic for inlining, based on how much has been inlined so far;
- inline clusters of subroutines at once