-
Notifications
You must be signed in to change notification settings - Fork 6
Some preliminary data analysis
The newest test run over all of sv-comp is not complete, but anyway I wanted to have a look at what we have found so far. I am going to combine all previous databases and take it from there.
> vdiff-viewer -d complete.db --merge 2018-03-22.db 2018-04-07-random.db 2018-04-10-smart.db 2018-04-12-smart.db sv-comp-survey.db
This will combine all databases into complete.db
.
Finding real unsoundness in the current state-of-the art tools would be really cool. So that's what I'm going to focus on first. A verifier is unsound if it states that the error location is unreachable despite it being reachable. In SV-Comp terms, a verifier says unsat
although it's acutally sat.
> vdiff-viewer -d complete.db --count --unsound
378
Unfortunately most of those are "unsoudnesses" of KLEE. So we're going to filter those.
> vdiff-viewer -d complete.db --list --unsound | grep -v klee
runId _verifierName _originalFn _programHash _unsats _sats
524 seahorn sanfoundry_02_true-unreach-call_ground.i e0916b113313a71f678e43804287b2b8fc72f9fd 1 2
527 seahorn sanfoundry_02_true-unreach-call_ground.i 714699d9291ebcb018e37ac9ef10517ff7d74b04 1 2
530 seahorn sanfoundry_02_true-unreach-call_ground.i 3f1c7e7f464177289fc19dbcd67947fb888c2135 1 2
578 seahorn data_structures_set_multi_proc_false-unreach-call_ground.i 685e9b30a2079abb20bfeaa425957b5f295d1f83 1 2
584 seahorn standard_minInArray_true-unreach-call_ground.i 2c2cee641c8c7f2f1d1b26b28c2694781160a288 1 2
623 seahorn data_structures_set_multi_proc_true-unreach-call_ground.i 31aeb1be103f75593b824bfe3ebd1893eed67fcf 1 2
872 seahorn sanfoundry_27_true-unreach-call_ground.i 9771a62fcc4ada81e7b3afea5d271c47652fc900 1 2
1031 seahorn standard_maxInArray_true-unreach-call_ground.i a95c7c9df3f5535efbb60d1f389dd5161955edaa 1 2
The column unsats
counts the number of verifiers that said "unsat" (error is unreachable). It's interesting to note that there are in total only 3 verifiers that even terminatd on those test cases. Note that the ground truth is always established by consensus, so in those cases we definitely need to look more closely to find out whether those are actual cases of unsoundness.
Let's focus on the first test case with hash e0916b113313a71f678e43804287b2b8fc72f9fd
. What verifier came to which verdict?
> vdiff-viewer -d complete.db --runs e0916b113313a71f678e43804287b2b8fc72f9fd
uautomizer sat Just 4.090000152587891 Just 260976
utaipan sat Just 4.150000095367432 Just 272968
seahorn unsat Just 0.1599999964237213 Just 38580
So for some reason, our data is incomplete. We don't have the runs from the other verifiers. But nontheless this is an interesting case. Let's extract the source code from the database and run the verifiers manually on it.
vdiff-viewer -d complete.db --hash e0916b113313a71f678e43804287b2b8fc72f9fd > program.c
Spoiler: The program contains a comparison of a pointer with a random integer value. Anyway, let's see what the other verifiers make of this.
> vdiff --run program.c
"cbmc"
VerifierTimedOut
"cpachecker"
VerifierTerminated Unsat (Timing {userTime = 5.83, systemTime = 0.26, elapsedWall = 3.23, maxResidentMemory = 253688})
"klee"
/tmp/kleedir32/program.c:11:9: warning: implicit declaration of function '__assert_fail' is invalid in C99
[-Wimplicit-function-declaration]
klee_assert(0);
^
/tmp/kleedir32/klee.h:96:6: note: expanded from macro 'klee_assert'
: __assert_fail (#expr, __FILE__, __LINE__, __PRETTY_FUNCTION__)) \
^
/tmp/kleedir32/program.c:23:32: warning: comparison of constant 480729375883536286 with expression of type 'int' is always
true [-Wtautological-constant-out-of-range-compare]
__VERIFIER_assert(largest1 != 480729375883536286);
~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~
2 warnings generated.
VerifierTerminated Unsat (Timing {userTime = 3.2, systemTime = 1.0e-2, elapsedWall = 3.25, maxResidentMemory = 18444})
"seacrab"
VerifierTerminated Unsat (Timing {userTime = 7.0e-2, systemTime = 5.0e-2, elapsedWall = 0.22, maxResidentMemory = 37024})
"seahorn"
VerifierTerminated Unsat (Timing {userTime = 7.0e-2, systemTime = 4.0e-2, elapsedWall = 0.11, maxResidentMemory = 36684})
"uautomizer"
VerifierTerminated Sat (Timing {userTime = 10.6, systemTime = 0.49, elapsedWall = 4.54, maxResidentMemory = 334000})
"utaipan"
VerifierTerminated Sat (Timing {userTime = 9.15, systemTime = 0.47, elapsedWall = 4.1, maxResidentMemory = 319736})
Verifier | Verdict |
---|---|
cbmc | timeout |
cpachecker | unsat |
klee | unsat |
seacrab | unsat |
seahorn | unsat |
uautomizer | sat |
utaipan | sat |
As Klee nicely mentioned, the constant that we generated is indeed out of range for a 32bit variable. (This was a bug in the instrumentation) and the ground truth is therefore "unsat". It also looks like all the other non-klee cases are of a similar nature. So to conclude: No cases of unsoundness found.
Now, let's have a look at the incomplete cases. A verifier is incomplete (or imprecise) when it states that the error location is reachable despite it being unreachable. Or in other words, a verifier is incomplete when it says "sat" while the correct answer is "unsat".
> vdiff-viewer -d complete.db --list --incomplete
runId _verifierName _originalFn _programHash _unsats _sats
1407 cpachecker data_structures_set_multi_proc_false-unreach-call_ground.i 781fccbaf0d56d4daaba3ccb83e6fc72a3ff1d7b 4 1
1533 cpachecker data_structures_set_multi_proc_trivial_true-unreach-call_ground_true-termination.i cb46d4266372b49899fe80b432d922277b18fb7e 5 1
1722 cpachecker data_structures_set_multi_proc_true-unreach-call_ground.i f082f2d0510d02a74ffed65073467499fa71415e 5 1
2170 cpachecker sanfoundry_43_true-unreach-call_ground.i cc1f7e442af5e13d35171c0f35c053d00f3a799b 5 1
2177 cpachecker sanfoundry_43_true-unreach-call_ground.i 6e1de2ab16afbca73d681b77169ca729c83e4988 5 1
13251 seacrab ./floats-cbmc-regression/float12_true-unreach-call_true-termination.i 68876612967987e3558698c1c7e21105800186d6 3 2
13252 seahorn ./floats-cbmc-regression/float12_true-unreach-call_true-termination.i 68876612967987e3558698c1c7e21105800186d6 3 2
program | description |
---|---|
781fccbaf0d56d4daaba3ccb83e6fc72a3ff1d7b |
comparison of pointer with integer constant that is not an even number |
cb46d4266372b49899fe80b432d922277b18fb7e |
same |
f082f2d0510d02a74ffed65073467499fa71415e |
same |
cc1f7e442af5e13d35171c0f35c053d00f3a799b |
comparison of pointer (stack variable) with even integer constant |
6e1de2ab16afbca73d681b77169ca729c83e4988 |
same |
68876612967987e3558698c1c7e21105800186d6 |
assert(0) statement. Strange because this is not even nested inside a control structure. |
Something is wrong with the last test case. Somehow the definition for
__VERIFIER_assert()
did not get added and with that it is indeed impossible to
reach the error location (as there is none).
void __DUMMY_VERIFIER_assert(int condition)
{
}
extern void __VERIFIER_error(void);
extern float __VERIFIER_nondet_float(void);
extern unsigned char __VERIFIER_nondet_uchar(void);
int main()
{
float f = __VERIFIER_nondet_float();
double d;
unsigned char x = __VERIFIER_nondet_uchar();
__VERIFIER_assert(0);
d = f;
if (f == x)
{
if (!(d == x))
{
__VERIFIER_error();
}
}
}
The test case itself is okay. I believe this must have been a bug from an earlier version.