0%

南大《软件分析》01.Introduction

前言

本文是南京大学李樾、谭添老师的《软件分析》课程的笔记。内容来自课程ppt以及课程讲解。

课程主页:Static Program Analysis | Tai-e (pascal-lab.net)

授课视频:视频南京大学《软件分析》课程01(Introduction)_哔哩哔哩_bilibili

课程实验repo:pascal-lab/Tai-e-assignments: Tai-e assignments for static program analysis (github.com)

Contents

  1. PL and Static Analysis
  2. Why We Learn Static Analysis
  3. What is Static Analysis
  4. Static Analysis Features and Examples

PL and Static Analysis

PL: Progeamming Languages

  • Theory
    • Language design
    • Type system
    • Semantics and logics
    • ……
  • Environment
    • Compilers
    • Runtime system
    • ……
  • Application
    • ==Program analysis==
    • Program verification
    • Program synthesis
    • ……

Background: In the last decade, the language cores had few changes, but the programs became significantly larger and more complicated.

Challenge: How to ensure the reliability, security and other promises of large-scale and complex programs?

Why we need static analysis?

  • Program Reliability
    • Null pointer dereference, momery leak, etc.
  • Program Security
    • Private information leak, injection attack, etc.
  • Compiler Optimization
    • Dead code elimination, code motion, etc.
  • Program Understanding
    • IDE call hierarchy, type indication, etc.

Static Analysis

Static Analysis

==Static== analysis analyzes a program P to reason about its behaviors and deterines whether it satisfies some properties ==before running== P.

  • Does P contain andy private information leaks?
  • Does P deference any null pointers?
  • Are all the cast operations in P safe?
  • Can v1 and v2 in P point to the same memory location?
  • Will certain assert statements in P fail?
  • Is this piece of code in P dead(so that it could be eliminated)?

Rice Theorem

Unfortuanately, by Rice’s Theorem, there is no such approach to determine whether P satisfies such non-trivial propertirs, i.e., giving exact answer: Yes or No.

Rice’s Theorem:

Any non-trivial property of the behavior of progras in a r.e. language is undecidable

r.e.(recurisively enumerable) = recognizable by a Turing-machine(C, Java, and amost all languages)

A property is trivial if either it is not satisfied by any r.e. language, or if it is satisfied by all r.e. languages; otherwise it is non-trivial.

non-trivial properties

~= interesting properties

~= the properties related with run-time behaviors of programs

任何编程语言编写的程序中与动态运行时相关的特性都是不确定的

Perfect static analysis

  • Sound(Overapproximate,false postitivs), and
  • Complete(Underaapproximate,false negatives)

Useful static analysis

  • Compromise soundness(false negatives), or
  • Compromise completeness(false positives)

Mostly compromising completensess: ==Sound== but ==not fully-precise== static analysis.

Necessity of Soundness

Soundness is ==critiacal== to a collection of important(static-analysis) applications such as compiler optimization and program verification.

Example: Safe Cast

  • Unsound: Safe Cast -> wrong conclusion
  • Sound: Not Safe Cast: correct conclusion
  • Soundness is also ==preferable== to other (static-analysis) applications for which soundness is not demanded, e.g., bug detection, as better soundness implies more bugs could be found.

Static Analysis-Bird’s Eye View

1
2
3
4
5
if (input)
x = 1;
else
x = 0
-> x = ?

Two analysis results:

  1. when input is true, x = 1

    when input is false, x = 0

    Sonud, precise, expensive

  2. x = 1 or x = 0

    Sound, imprecise, cheap

Static Analysis: ensure(or get close to) ==soundness==, while making good trade-offs between analysis ==precision== and analysis ==speed==.

Two Words to Conclude Static Analysis

Abstraction + Over-approximation

Example: Determine the sign (+, -, or 0) of all the variables of a given program.

  • Abstraction
  • Over-approximation
    • Transfer functions
    • Control flows

Abstaction

Determine the sign (+, -, or 0) of all the variables of a given program.

Over-approximation:

Transfer Functions

  • In static analysis, transfer functions define how to evaluate different program statements on abstract values.
  • Transfer functions are defined according to “analysis problem” and the “semantics” of different program statements.

1, 2: Static analysis is useful

3: But(over-approximated) static analysis produces false positives

Control Flows

As it’s impossible to enumerate all paths in practice, flow merging (as a way of over-approximation) is taken for granted in most static analyses.