Skip to content

Some preliminary data analysis

Christian Klinger edited this page Apr 25, 2018 · 10 revisions

The newest test run over all of sv-comp is not complete, but anyway I wanted to have a look at what we have found so far. I am going to combine all previous databases and take it from there.

Combine previous databases

> vdiff-viewer -d complete.db --merge 2018-03-22.db 2018-04-07-random.db 2018-04-10-smart.db 2018-04-12-smart.db sv-comp-survey.db

This will combine all databases into complete.db.

Unsoundness

Finding real unsoundness in the current state-of-the art tools would be really cool. So that's what I'm going to focus on first. A verifier is unsound if it states that the error location is unreachable despite it being reachable. In SV-Comp terms, a verifier says unsat although it's acutally sat.

> vdiff-viewer -d complete.db --count --unsound 
378

Unfortunately most of those are "unsoudnesses" of KLEE. So we're going to filter those.

> vdiff-viewer -d complete.db --list --unsound | grep -v klee
runId     _verifierName     _originalFn                                                                                             _programHash                                 _unsats     _sats
524       seahorn           sanfoundry_02_true-unreach-call_ground.i                                                                e0916b113313a71f678e43804287b2b8fc72f9fd     1           2    
527       seahorn           sanfoundry_02_true-unreach-call_ground.i                                                                714699d9291ebcb018e37ac9ef10517ff7d74b04     1           2    
530       seahorn           sanfoundry_02_true-unreach-call_ground.i                                                                3f1c7e7f464177289fc19dbcd67947fb888c2135     1           2    
578       seahorn           data_structures_set_multi_proc_false-unreach-call_ground.i                                              685e9b30a2079abb20bfeaa425957b5f295d1f83     1           2    
584       seahorn           standard_minInArray_true-unreach-call_ground.i                                                          2c2cee641c8c7f2f1d1b26b28c2694781160a288     1           2    
623       seahorn           data_structures_set_multi_proc_true-unreach-call_ground.i                                               31aeb1be103f75593b824bfe3ebd1893eed67fcf     1           2    
872       seahorn           sanfoundry_27_true-unreach-call_ground.i                                                                9771a62fcc4ada81e7b3afea5d271c47652fc900     1           2    
1031      seahorn           standard_maxInArray_true-unreach-call_ground.i                                                          a95c7c9df3f5535efbb60d1f389dd5161955edaa     1           2    

The column unsats counts the number of verifiers that said "unsat" (error is unreachable). It's interesting to note that there are in total only 3 verifiers that even terminatd on those test cases. Note that the ground truth is always established by consensus, so in those cases we definitely need to look more closely to find out whether those are actual cases of unsoundness.

Let's focus on the first test case with hash e0916b113313a71f678e43804287b2b8fc72f9fd. What verifier came to which verdict?

> vdiff-viewer -d complete.db --runs e0916b113313a71f678e43804287b2b8fc72f9fd
uautomizer     sat       Just 4.090000152587891      Just 260976
utaipan        sat       Just 4.150000095367432      Just 272968
seahorn        unsat     Just 0.1599999964237213     Just 38580 

So for some reason, our data is incomplete. We don't have the runs from the other verifiers. But nontheless this is an interesting case. Let's extract the source code from the database and run the verifiers manually on it.

vdiff-viewer -d complete.db --hash e0916b113313a71f678e43804287b2b8fc72f9fd > program.c

Spoiler: The program contains a comparison of a pointer with a random integer value. Anyway, let's see what the other verifiers make of this.

> vdiff --run program.c
"cbmc"
VerifierTimedOut
"cpachecker"
VerifierTerminated Unsat (Timing {userTime = 5.83, systemTime = 0.26, elapsedWall = 3.23, maxResidentMemory = 253688})
"klee"
/tmp/kleedir32/program.c:11:9: warning: implicit declaration of function '__assert_fail' is invalid in C99
      [-Wimplicit-function-declaration]
        klee_assert(0);
        ^
/tmp/kleedir32/klee.h:96:6: note: expanded from macro 'klee_assert'
   : __assert_fail (#expr, __FILE__, __LINE__, __PRETTY_FUNCTION__))    \
     ^
/tmp/kleedir32/program.c:23:32: warning: comparison of constant 480729375883536286 with expression of type 'int' is always
      true [-Wtautological-constant-out-of-range-compare]
    __VERIFIER_assert(largest1 != 480729375883536286);
                      ~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~
2 warnings generated.
VerifierTerminated Unsat (Timing {userTime = 3.2, systemTime = 1.0e-2, elapsedWall = 3.25, maxResidentMemory = 18444})
"seacrab"
VerifierTerminated Unsat (Timing {userTime = 7.0e-2, systemTime = 5.0e-2, elapsedWall = 0.22, maxResidentMemory = 37024})
"seahorn"
VerifierTerminated Unsat (Timing {userTime = 7.0e-2, systemTime = 4.0e-2, elapsedWall = 0.11, maxResidentMemory = 36684})
"uautomizer"
VerifierTerminated Sat (Timing {userTime = 10.6, systemTime = 0.49, elapsedWall = 4.54, maxResidentMemory = 334000})
"utaipan"
VerifierTerminated Sat (Timing {userTime = 9.15, systemTime = 0.47, elapsedWall = 4.1, maxResidentMemory = 319736})
Verifier Verdict
cbmc timeout
cpachecker unsat
klee unsat
seacrab unsat
seahorn unsat
uautomizer sat
utaipan sat

Conclusion

As Klee nicely mentioned, the constant that we generated is indeed out of range for a 32bit variable. (This was a bug in the instrumentation) and the ground truth is therefore "unsat". It also looks like all the other non-klee cases are of a similar nature. So to conclude: No cases of unsoundness found.

Incompleteness

Now, let's have a look at the incomplete cases. A verifier is incomplete (or imprecise) when it states that the error location is reachable despite it being unreachable. Or in other words, a verifier is incomplete when it says "sat" while the correct answer is "unsat".

> vdiff-viewer -d complete.db --list --incomplete 
runId     _verifierName     _originalFn                                                                            _programHash                                 _unsats     _sats
1407      cpachecker        data_structures_set_multi_proc_false-unreach-call_ground.i                             781fccbaf0d56d4daaba3ccb83e6fc72a3ff1d7b     4           1    
1533      cpachecker        data_structures_set_multi_proc_trivial_true-unreach-call_ground_true-termination.i     cb46d4266372b49899fe80b432d922277b18fb7e     5           1    
1722      cpachecker        data_structures_set_multi_proc_true-unreach-call_ground.i                              f082f2d0510d02a74ffed65073467499fa71415e     5           1    
2170      cpachecker        sanfoundry_43_true-unreach-call_ground.i                                               cc1f7e442af5e13d35171c0f35c053d00f3a799b     5           1    
2177      cpachecker        sanfoundry_43_true-unreach-call_ground.i                                               6e1de2ab16afbca73d681b77169ca729c83e4988     5           1    
13251     seacrab           ./floats-cbmc-regression/float12_true-unreach-call_true-termination.i                  68876612967987e3558698c1c7e21105800186d6     3           2    
13252     seahorn           ./floats-cbmc-regression/float12_true-unreach-call_true-termination.i                  68876612967987e3558698c1c7e21105800186d6     3           2   
program description
781fccbaf0d56d4daaba3ccb83e6fc72a3ff1d7b comparison of pointer with integer constant that is not an even number
cb46d4266372b49899fe80b432d922277b18fb7e same
f082f2d0510d02a74ffed65073467499fa71415e same
cc1f7e442af5e13d35171c0f35c053d00f3a799b comparison of pointer (stack variable) with even integer constant
6e1de2ab16afbca73d681b77169ca729c83e4988 same
68876612967987e3558698c1c7e21105800186d6 assert(0) statement. Strange because this is not even nested inside a control structure.

Something is wrong with the last test case. Somehow the definition for __VERIFIER_assert() did not get added and with that it is indeed impossible to reach the error location (as there is none).

void __DUMMY_VERIFIER_assert(int condition)
{
}
extern void __VERIFIER_error(void);
extern float __VERIFIER_nondet_float(void);
extern unsigned char __VERIFIER_nondet_uchar(void);
int main()
{
    float f = __VERIFIER_nondet_float();
    double d;
    unsigned char x = __VERIFIER_nondet_uchar();
    __VERIFIER_assert(0);
    d = f;
    if (f == x)
    {   
        if (!(d == x)) 
        {   
            __VERIFIER_error();
        }   
    }   
}

The test case itself is okay. I believe this must have been a bug from an earlier version.

Clone this wiki locally