PLDI 2018 Artifact Evaluation of Pinpoint

Prelimilaries

Pinpoint is built on LLVM 3.6. It analyzes the bitcode files (.bc files) of software projects to check vulnerabilities. A bitcode file is a kind of intermediate representation of the source code. We have prepared these bitcode for evaluation.

Evaluation Host

Because pinpoint needs to be run on a server with large memory space, we prepared a ubuntu server for the artifact evaluation. Reviewer can access the server via ssh (the password is given at the submission site):

  • ssh PLDI2018AE@143­.89.191.101

NOTE: when running large benchmark programs (>0.5MLoC), the computation resource in the server may be used up if multiple users use the server at the same time. Thus, we recommend to run small benchmark programs to verify our idea.

In the server, we have installed all necessary binaries for evaluation:

  • pp-check: the binary of pinpoint
  • saber: the binary of saber, which implements one of our baseline approach
  • pp-capture: our wrapper of clang static analyzer
  • infer: the facebook infer analyzer

In the home directory, there are several folders, including three groups of benchmarks and other folders:

  • CINT2000SRC: the spec cint2000 program sources (12 in total)
  • CINT2000BC: the llvm bitcode of the cint2000 program sources (12 in total)
  • OpenSourcePro­jectSrc: open source projects for evaluation (18 in total)
  • OpenSourcePro­jectBC: the llvm bitcode of the opensource program sources (18 in total)
  • juliet: the benchmark for evaluating the recall of our approach.
  • juliet-bc: the llvm bitcode files of juliet test suite.
  • pinpoint: where pinpoint is installed
  • infer-linux64-v0.12.1: where facebook infer is installed
  • benchmarks: a soft link of OpenSourcePro­jectSrc
  • boost59: the boost library used for compiling mysql-server, one of our benchmark programs.

Artifact Evaluation

The following artifact evaluation process follows the evaluation order in our paper (Section 5).

5.1 Comparison with SVFA Techniques

5.1.1 Scalability and 5.1.2 (1) Precision

We will use two benchmark programs (webassembly 23KLoC and mysql-server 2MLoC), one small and one big, to demonstrate our evaluation results. For other benchmark programs, the evaluation process is the same.

Run the following commands to run pinpoint:

* mkdir $HOME/test #create a testing folder. reviewers are expected to use different folder names to avoid conflicts
* cd $HOME/test # enter the testing folder
* pp-check -ps-uaf -nworkers=15 -report=web.TXT $HOME/OpenSourceProjectBC/webassembly.bc
* pp-check -ps-uaf -nworkers=15 -report=mysql.TXT $HOME/OpenSourceProjectBC/mysqld.bc

It takes several minutes for webassembly to finish and about 1.5 hours for mysqld. After it finishes, you will see the time and memory usage on the screen (see the figure as below (mysqld)). For webassembly, it reports 1 use-after-free, which is a real bug that has been confirmed. For mysqld, it reports 5, in which 4 are real bugs. The bug reports are in ./web.TXT and ./mysql.TXT, respectively. You can refer to this to understand the format of bug report files.

Caveats.

  • pp-check will report three use-after-free bugs for goaccess. The three are due to the same root cause. Thus, we regard them as a single bug.
  • pp-check will report two use-after-free bugs for transmission. The two are due to the same root cause. Thus, we regard them as a single bug.
  • pp-check will report three use-after-free bugs for firefox, in which one is real bug and the other two are false positives which are regarded as a single false warning because they are caused by the same root reason.
  • When running pp-check on shadowsocks, an extra option (-report-pass-line=60) should be added.

We compare with SVF, the most recent SVFA technique.

Next, we run SVF and compare the results. In the commands, saber is the binary of the SVF technique. It has been installed in the system. The source code is at github. Because saber is not time- and memory- efficient, it may run a long time. We set a timeout for saber, which is 12 hours. This means that saber will stop after at most 12 hours. After it stops, time and memory usage will be printed on the screen.

* saber -uaf -no-global -stat=false $HOME/OpenSourceProjectBC/webassembly.bc
* saber -uaf -no-global -stat=false $HOME/OpenSourceProjectBC/mysqld.bc

For webassembly, it takes about 2min and 2G memory to finish building FSVFG, which is similar to building SEG (our approach). After the FSVFG is built, it will check the use-after-free vulnerabilities based on the FSVFG. After a long time checking (1 hour or more), it reports hundreds false warnings, which are printed on the screen. A screenshot is illustrated as below.

For mysqld, it will takes more than 12 hours to finish building FSVFG (timeouts). Because it timeouts, it does not have any chance to check use-after-free based on the FSVFG. In contrast, pinpoint takes only 20 min to finish building SEG and finishes the whole process in about 1.5 hours.


The above results should be consistent with Figures 7/8/9 and Table 1 in our paper.

5.1.2 (2) Recall

We use juliet test suite for evaluating recall of pinpoint. There are three parts, of which the source codes are in the following folders:

  • $HOME/juliet/tes­tcases/CWE415_D­ouble_Free/s01/
  • $HOME/juliet/tes­tcases/CWE415_D­ouble_Free/s02/
  • $HOME/juliet/tes­tcases/CWE416_U­se_After_Free/
The files under each source folder, e.g., $HOME/juliet/tes­tcases/CWE415_D­ouble_Free/s01/ are like 
CWE415_Double_Fre­e__new_delete_a­rray_struct01­.cpp
The prefix CWE415_Double_Fre­e__ is the vulnerability type. The mid part new_delete_arra­y_struct_ represents the functions and data types that cause the bug. The suffix is the No. of the vulnerability. In most cases, one file contains one vulnerability. Some vulnerabilities consist of several files. For example, No. 74 vulnerability consists of two files: 
CWE415_Double_Fre­e__new_delete_a­rray_struct_74a­.cpp
CWE415_Double_Fre­e__new_delete_a­rray_struct_74b­.cpp

The llvm bitcode of these benchmark programs are

  • $HOME/juliet-bc/CWE415_s01.bc
  • $HOME/juliet-bc/CWE415_s02.bc
  • $HOME/juliet-bc/CWE416.bc

Use CWE415_s01.bc as an example. Run the following commands for evaluation:

* mkdir $HOME/test #create a testing folder. reviewers are expected to use different folder names to avoid conflicts
* cd $HOME/test # enter the testing folder
* pp-check -ps-uaf -nworkers=15 -report=CWE415_s01.bugs.TXT $HOME/juliet-bc/CWE415_s01.bc # run pinpoint's use-after-free checker

It will take several minutes. After it finishes, you will see the time and memory usage on the screen. The bug reports are in ./CWE415_s01.bug­s.TXT, which contain all vulnerabilities in the test suite. Remove the testing folder to keep the home directory clean.

5.2 Detected Real Vulnerabilities

This section does not contain any experiments.

5.3 Study of the Taint-Issue Checkers

Run the following commands and the results will contain both path-traversal vulnerabilities and data transmission vulnerabilities.

* mkdir $HOME/test #create a testing folder. reviewers are expected to use different folder names to avoid conflicts
* cd $HOME/test # enter the testing folder
* pp-check -ps-taint -nworkers=15 -report=mysqld.taint.bugs.TXT $HOME/OpenSourceProjectBC/mysqld.bc # run pinpoint's taint-issue checker

It will take about half an hour. After it finishes, you will see the time and memory usage on the screen. The bug reports are in ./mysqld.tain­t.bugs.TXT. Remove the testing folder to keep the home directory clean.

5.4 Comparison with Other Static Bug Detectors

Here, we still use mysql-server (the 2MLoC project) as an example to run clang static analyzer (CSA) and infer.

Run the following commands to run clang static analyzer. Because the building system is complex for some projects, we prepare a build-csa folder in the source folder for some projects so that we can run CSA directly

* cd $HOME/OpenSourceProjectSrc/mysql-sever # enter the source folder
* cd build-csa # enter csa folder
* make clean && rm -rf .piggy/
* pp-capture --capture-only --run-csa -- make -j15

It will takes about 30 min to finish. It will report 26 use-after-free bugs, in which 24 are false positives. The bug reports are in the folder .piggy/reports/

For projects that do not have the build-csa folder, run the following command directly after entering the source directory

* make clean && rm -rf .piggy/
* pp-capture --capture-only --run-csa -- make -j15

Note that CSA cannot be run on the following projects:

  • shadowsocks

Run the following commands to run facebook infer. Because the building system is complex for some projects and infer is not friendly to use (e.g., cmake related bugs), we prepare a build-infer folder in the source folder for some projects so that we can run infer directly

* cd $HOME/OpenSourceProjectSrc/mysql-sever # enter the source folder
* cd build-infer # enter infer folder
* make clean && rm -rf infer-out
* infer run -- make -j15

It will takes about 30 min to finish. It will report 13 use-after-free bugs, all of which are false positives. You can run the following command to inspect the bug reports.

* inferTraceBugs

For projects that do not have the build-infer folder, run the following command directly after entering the source directory

* make clean && rm -rf infer-out
* infer run -- make -j15

Note that infer cannot be run on the following projects:

  • html5-parser
  • shadowsocks
  • swoole
  • vim
  • wrk
  • php
  • firefox