Value range analysis papers
After understanding the subtyping, I studied value range analysis a little.
I also read "How to Prove It: A Structured Approach" by Daniel J. Velleman. Reading that book helped me to better understand the language used to layout formal proofs.
To start my study on the range analysis, I found one set of lecture slides especially helpful. They were titled 'RangeAnalysis.pdf'. "Program Analysis and Optimization - DCC888, Fernando Magno Quintão Pereira". Those slides refered to four papers:
- Cousot, P. and Cousot, R. "Abstract Interpretation: a Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints", POPL, p 238-252 (1977)
- Couto, D. and Pereira, F. "The Design and Implementation of a non-iterative Range Analysis Algorithm on a Production Compiler", SBLP, p 45-59 (2011)
- Campos, V., Rodrigues, R., and Pereira, F. "Speed and Precision in Range Analysis", SBLP, p 42-56 (2012)
- Rodrigues, R., Campos, V. and Pereira, F. "A Fast and Low Overhead Technique to Secure Programs Against Integer Overflows", CGO, (2013)
I read through the papers, but concluded that the slides themselves described the basic technique well enough.
I won't be certain about understanding an algorithm before I've implemented it at least once. But I think I got a fairly good idea of what is involved.
The range analysis described in these papers starts on the SSA IR. The IR is first extended with constraints that represent the constraints introduced to the values after branching. These constraint may contain 'futures', the naming makes sense only after you understand the algorithm, but this complex term is just used to refer on constraints that consists of a variable rather than a constant.
Next the IR is transformed into a bipartite graph. It consists of variables and operations in the program. There are directional edges between variables and operations, describing sources and destinations in the program.
The strongly connected components are identified from the graph to be processed in groups. The groups are topologically sorted. Then widening and narrowing is applied to each group.
The widening defines an arithmetic between intervals and operations in the graph. The result of the evaluation is used to refine the range in the variable. This evaluation ignores the branching and constraints in the graph entirely.
If it turns out that the repeated evaluations of the graph would cause the value to grow during widening, the value is immeiately extended with a bound to an infinity, so that it won't grow again. This step loosens the results of the range analysis, but is there to ensure that the analysis terminates.
After the widening the 'futures' are resolved. For each constraint a value is calculated from the graph.
The narrowing evaluates through the program again, evaluating the constraints as well. It refines the infinite boundaries if it can.
I am not sure whether they run this to a fixed point or whether it goes once per SCC, but after it's done the graph contains an approximation of the ranges that each integer gets in the program.
It doesn't always make it. There are few examples in the paper where this algorithm fails to refine the boundaries although it is obvious for the reader that certain variables in that program reduce to constants.
This algorithm should help a lot when refining the the code to run on SIMD, or when erasing overflow guards that never trigger.
For now I put this aside, waiting for the time that it becomes necessary to implement.