前言
本文是南京大学李樾、谭添老师的《软件分析》课程的笔记。内容来自课程ppt以及课程讲解。
课程主页:Static Program Analysis | Tai-e (pascal-lab.net)
授课视频:视频南京大学《软件分析》课程01(Introduction)_哔哩哔哩_bilibili
课程实验repo:pascal-lab/Tai-e-assignments: Tai-e assignments for static program analysis (github.com)
Contents
- PL and Static Analysis
- Why We Learn Static Analysis
- What is Static Analysis
- Static Analysis Features and Examples
PL and Static Analysis
PL: Progeamming Languages
- Theory
- Language design
- Type system
- Semantics and logics
- ……
- Environment
- Compilers
- Runtime system
- ……
- Application
- ==Program analysis==
- Program verification
- Program synthesis
- ……
Background: In the last decade, the language cores had few changes, but the programs became significantly larger and more complicated.
Challenge: How to ensure the reliability, security and other promises of large-scale and complex programs?
Why we need static analysis?
- Program Reliability
- Null pointer dereference, momery leak, etc.
- Program Security
- Private information leak, injection attack, etc.
- Compiler Optimization
- Dead code elimination, code motion, etc.
- Program Understanding
- IDE call hierarchy, type indication, etc.
Static Analysis
Static Analysis
==Static== analysis analyzes a program P to reason about its behaviors and deterines whether it satisfies some properties ==before running== P.
- Does P contain andy private information leaks?
- Does P deference any null pointers?
- Are all the cast operations in P safe?
- Can v1 and v2 in P point to the same memory location?
- Will certain assert statements in P fail?
- Is this piece of code in P dead(so that it could be eliminated)?
- …
Rice Theorem
Unfortuanately, by Rice’s Theorem, there is no such approach to determine whether P satisfies such non-trivial propertirs, i.e., giving exact answer: Yes or No.
Rice’s Theorem:
Any non-trivial property of the behavior of progras in a r.e. language is undecidable
r.e.(recurisively enumerable) = recognizable by a Turing-machine(C, Java, and amost all languages)
A property is trivial if either it is not satisfied by any r.e. language, or if it is satisfied by all r.e. languages; otherwise it is non-trivial.
non-trivial properties
~= interesting properties
~= the properties related with run-time behaviors of programs
任何编程语言编写的程序中与动态运行时相关的特性都是不确定的
Perfect static analysis
- Sound(Overapproximate,false postitivs), and
- Complete(Underaapproximate,false negatives)
Useful static analysis
- Compromise soundness(false negatives), or
- Compromise completeness(false positives)
Mostly compromising completensess: ==Sound== but ==not fully-precise== static analysis.
Necessity of Soundness
Soundness is ==critiacal== to a collection of important(static-analysis) applications such as compiler optimization and program verification.
Example: Safe Cast
- Unsound: Safe Cast -> wrong conclusion
- Sound: Not Safe Cast: correct conclusion
- Soundness is also ==preferable== to other (static-analysis) applications for which soundness is not demanded, e.g., bug detection, as better soundness implies more bugs could be found.
Static Analysis-Bird’s Eye View
1 | if (input) |
Two analysis results:
when input is true, x = 1
when input is false, x = 0
Sonud, precise, expensive
x = 1 or x = 0
Sound, imprecise, cheap
Static Analysis: ensure(or get close to) ==soundness==, while making good trade-offs between analysis ==precision== and analysis ==speed==.
Two Words to Conclude Static Analysis
Abstraction + Over-approximation
Example: Determine the sign (+, -, or 0) of all the variables of a given program.
- Abstraction
- Over-approximation
- Transfer functions
- Control flows
Abstaction
Determine the sign (+, -, or 0) of all the variables of a given program.
Over-approximation:
Transfer Functions
- In static analysis, transfer functions define how to evaluate different program statements on abstract values.
- Transfer functions are defined according to “analysis problem” and the “semantics” of different program statements.
1, 2: Static analysis is useful
3: But(over-approximated) static analysis produces false positives
Control Flows
As it’s impossible to enumerate all paths in practice, flow merging (as a way of over-approximation) is taken for granted in most static analyses.