VulPecker:
An Automated Vulnerability Detection System Based on Code Similarity Analysis
.
- Vulnerability Pecker (VulPecker), a system for automatically detecting whether a piece of
software source code
contains a given vulnerability or not
- The key insight underlying VulPecker is to leverage (i)
a set of features that we define to characterize patches
, and (ii)code-similarity algorithms that have been proposed for various purposes
, while noting thatno single code-similarity algorithm is effective for all kinds of vulnerabilities
souce link
:
what is it import :
Software vulnerabilities are the fundamental cause of many attacks. It is difficult to patch all vulnerabilities in software systems because of
code reuse
, namely that a vulnerability may exist silently in multiple software programs without being adequately tracked.it be called thevulnerability prevalence problem
( it cannot be solved simply by using multiple patch management mechanisms, because they often do not cover all vulnerability instances. While it may sound simple to track code reuse, it is actually unmanageable because of the large number of programs)
what is the method that deals with detection vulnerability :
- the detection vulnerablity Method:
using vulnerability patterns or using code similarity.
- The pattern-based detection approach typically requires multple instances of the same or similar vulnerability before a pattern can be identified.
- The code-similarity based detection approach only requires a single instance of vulnerability.
which code-similarity algorithm(s) is effective for detecting which vulnerability?
- we analyze a set of candidate code-similarity algorithms by taking advantage of
features describing vulnerabilities
andpatches
. This analysis leads to aCVE-to-algorithm mapping
, which maps aCVE-ID
to the selectcode-similarity algorithm(s)
that is effective for detecting the vulnerability.
- we analyze a set of candidate code-similarity algorithms by taking advantage of
how should we generate and use vulnerability signatures?
- a code-similarity algorithm can be characterized by three attributes:
code-fragment level
,code representation
, andcomparison method
.
- a code-similarity algorithm can be characterized by three attributes:
1.Defining vulnerability and code-reuse features.
Features for describing vulnerability diff hunks
(we define two features:basic feature
/patch feature
).Features for describing code reuses
(usepatch feature
).
- 2.Preparing the input
(three input:
NVD/
PVD/
VCID) - 3.Code-similarity algorithm selection.
Extracting vulnerability diff hunk features
.Code-similarity algorithm selection engine
.
- 4.Vulnerability signature generation.
* FIRST,` we extract the patched/unpatched diff code and the unpatched code fragment corresponding to a vulnerability` * for each diff hunk, we preprocess and represent the patched/unpatched diff code and unpatched code fragments obtained at the FIRST.
- 5.Vulnerability detection