Combined tasks #1104

lembergerth · 2020-07-22T17:00:53Z

Description of tasks

New benchmark tasks that are a combination of existing benchmark tasks.

Each benchmark task in this directory is created from two other tasks
of other categories, according to the following pattern:

/* definitions of Task 1 ... */
int main1() { /* main-method of Task 1 ... */ }
/* definitions of Task 2 ... */
int main2() { /* main-method of Task 2 ... */ }

int main() {
  if (__VERIFIER_nondet_int()) {
    main1();
  } else {
    main2();
  }

Definitions are renamed to avoid conflicts, if necessary.

This construction leads to programs with independent program parts.

The name of each benchmark task reveals its origin:
All task names are created by the pattern ${TASK_1}+${TASK_2}.

Open Issue: Licensing

I'm not sure about how to license these tasks.
Some of the tasks that we used are part of sv-benchmarks, but do not have a separate license in their corresponding directory.
For now, I have written a LICENSE.txt that coarsely references the original licenses (if present),
but I don't think this is good enough.

I'd be happy about any constructive input on this matter.

Checklist

programs added to new and appropriately named directory
license present and acceptable (either in separate file or as comment at beginning of program)
contributed-by present (either in README file or as comment at beginning of program)
programs added to a .set file of an existing category, or new sub-category established (if justified)
intended property matches the corresponding .prp file
programs and expected answer added to a .yml file according to task definitions

architecture (32 bit vs. 64 bit) matches the corresponding .cfg file
original sources present
preprocessed files present
preprocessed files generated with correct architecture
Makefile added with correct content and without overly broad suppression of warnings

PhilippWendler · 2020-07-23T05:52:08Z

Yes, the licensing should be properly handled. The best and easiest way to do this would be to decide on #1099, add machine-readable (REUSE-style) license headers to all relevant source files, and adjust your tool that constructs the combined programs such that it puts the union of all such headers from the input files in to the output files.

PhilippWendler · 2020-10-27T19:06:31Z

Now that #1099 is closed to getting merged: For which directories would you need REUSE-style license declarations in order to be able to update this PR?

After we have this list we can maybe split the workload over a few people. For some directories the task of adding license headers might be a lot of work, but then we could leave programs from these directories out for a first round of combined tasks.

lembergerth · 2020-10-28T07:21:36Z

Thanks for following up on this.

We have the following pairs:

systemc/pc_sfifo*, systemc/token_ring*
systemc/token_ring*, eca-rers2012/Problem05_label4*
eca-rers2012/Problem12_label0*, seq-mthreaded/pals_lcr.*
bitvector/gcd_*, floats-cdfpl/newton_*
floats-cdfpl/square*, bitvector/soft_float_*

So directory-wise, we need systemc/, eca-rers2012/, seq-mthreaded/, bitvector/, and floats-cdfpl/

PhilippWendler · 2020-10-28T07:34:04Z

Oh, that's a problem. For both systemc and bitvector we still do not know the license (cf. #165), so if we want to do this correctly and add new these new tasks only if there license is clear in order to avoid more work in the future, we would have to ping the original submitters of these tasks first and ask for a license.

lembergerth · 2020-10-28T08:51:01Z

What about eca-rers2012/Problem12_label0*, seq-mthreaded/pals_lcr.*?
Is that pair easy?

We could try to get these merged for SV-COMP'21 and postpone the others.

PhilippWendler · 2020-10-28T08:56:08Z

Right, I overlooked this pair. This seems doable.

All ECA programs should be easy, because I think there were never any cases where other people added tasks to their directories or so, so we can assume they are all from the same source.

For seq-mthreaded, one should double check the history of the directory maybe. Furthermore, their readme claims that these have been produced by processing some other applications sources. So we either need to hope that the license also covers the original applications, or have a look whether we can find out more about this and confirm this assumption.

…2012/Problem12_label0* Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

…ring Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

…ton_ Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

…oft_float_ Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

…systemc/token_ring Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

lembergerth · 2020-11-03T18:40:14Z

I've created #1210 to handle the feasible tasks.

hernanponcedeleon · 2020-11-03T19:25:12Z

I might be missing something, but I don't see what is the aggregated value for this benchmarks.

Besides the fact that the new benchmarks are larger than the base ones in isolation, I don't think they pose any challenge to the verifiers that are not posed by the base benchmarks already.

Is there anything I am missing?

lembergerth · 2020-11-03T19:31:13Z

The tasks are not just larger than the individual components, but simulate systems with loosely coupled components. This is almost non-existent in sv-benchmarks at the moment, as most tasks describe a single algorithm or program behavior.

hernanponcedeleon · 2020-11-03T19:45:47Z

I can see the value in having loosely couple systems (microservices), but since "loosely" in here actually mean completely decoupled (exactly one of them is executed) one could argue that this does not fairly simulate these systems.

As I see it, they just increase maintainability complexity without bringing much new in terms of verification challenges.

lembergerth · 2020-11-03T19:51:49Z

I didn't talk about microservices, and I'm not trying to simulate those.
The tasks submitted here are very loosely coupled, yes, but that doesn't mean that instances like this don't exist in the real world (strategy pattern, command-line tools with different commands, etc.)

You can read more about the tasks here, and also see that they actually are challenging: https://www.sosy-lab.org/research/pub/2020-SEFM.Difference_Verification_with_Conditions.pdf

hernanponcedeleon · 2020-11-03T20:09:58Z

Thanks for the pointer. I see that in the context of evolving programs this makes sense.

Previously, only pals_ were updated

priyankadarke · 2020-11-05T06:17:45Z

I agree with @hernanponcedeleon. These programs are interesting if the original and evolved programs are provided. Are these programs being submitted in such a context ? If not, then what is the value add by them in the context of reachability analysis? These tasks can be made more challenging if not as loosely coupled.

lembergerth · 2020-11-05T07:04:47Z

Since when is it necessary to argue for new benchmark contributions? I don't see this in the PR checklist. It requires time that is probably better spend on fixing other benchmark tasks.

These programs are interesting if the original and evolved programs are provided

Just to avoid confusion: The submitted programs are "original" and "evolved", but these concepts may be confusing because the tasks were artificially generated and "original" and "evolved" can be swapped arbitrarily. (We can say that A -> B or B -> A).
There is no task marked as coming before another, chronologically.

Are these programs being submitted in such a context ?

Sorry, I'm not sure what you mean with this.

If not, then what is the value add by them in the context of reachability analysis?

This was already stated: the tasks are not just larger than the individual components, but simulate systems with loosely coupled components. This is almost non-existent in sv-benchmarks at the moment, as most tasks describe a single algorithm or program behavior.

Loose coupling is a feature for these tasks:
As far as I know, for tasks with expected verdict 'false', this are the first tasks that provide difficult-to-analyze program logic that is not relevant to finding a counterexample.

In case you're interested, all further information about how theses tasks were selected and how well CPA-seq and UAutomizer perform on them can be found in the referenced paper.

priyankadarke · 2020-11-06T04:29:29Z

Hi,
It is interesting to see how the combination programs have been created and used in the context of incremental verification. Very interesting indeed. We present our understanding of the combination programs here, please correct us if we are wrong. These programs have been submitted as a new sub-category of the Reach-Safety category. Each program, Pi, is defined as follows:
/* definitions of Task 1 ... /
int main1() { / main-method of Task 1 ... / }
/ definitions of Task 2 ... /
int main2() { / main-method of Task 2 ... */ }

int main() {
if (__VERIFIER_nondet_int()) {
main1();
} else {
main2();
}

main1 and main2 are independent of each other. So, they belong to different program slices. Any tool that can verify existing svcomp programs corresponding to main1 and main2 within a CPU time limit of 900 seconds can verify Pi. Thus verifying Pi is like firing the following commands either in parallel, or sequentially,
cbmc --unwind x --unwinding-assertions Pi --function main1
cbmc --unwind x --unwinding-assertions Pi --function main2
Replace cbmc with any other tool.
Any tool that cannot verify both the existing svcomp programs corresponding to main1 and main2 (within 900s) will not be able to verify Pi.
To the best of our knowledge, such a program Pi when seen in practice represent large 'software systems' which run independent or loosely coupled modules. Shall we then not add combinations as a new 'synthesized programs' sub-category of software systems ?
Adding a large number of programs are harder to maintain while the main feature and the intent of adding each Pi is the same.

lembergerth · 2020-11-06T16:19:44Z

@priyankadarke,

main1 and main2 are independent of each other. So, they belong to different program slices.

Exactly.

Any tool that can verify existing svcomp programs corresponding to main1 and main2 within a CPU time limit of 900 seconds can verify Pi.

Yes, in theory. But the runtime of verifying Pi is unknown.
For example, if Pi is safe, let's assume we use predicate abstraction with predicates applied globally.
Then, the predicates necessary to prove Pi safe is the sum of the predicates necessary to prove main1 and main2 correct (in the worst case, but expected).
Let's assume predicate abstraction first solves main1 fully, and then goes on to solve main2. Predicate abstraction now has lots of predicates from main1 that it will try to apply for each abstraction step in main2, but none of them are valid.
This may worsen the runtime.

Similarly, traditional bounded model checking on Pi will have to check significantly larger formulas then if main1 and main2 were considered separately.

As another example, let's assume Pi is unsafe; main1 is unsafe and main2 is safe.
If a verifier is not directed and does not perform smart abstraction, it may take a significant amount of time (or even fail) to prove main2 safe.

The tasks that were chosen for main1 and main2 are already difficult to solve by themselves - so choosing an efficient abstraction or portfolio approach may be necessary to solve Pi.

In theory it is possible to simply run two instances of your tool, as you proposed; you run one instance on main1 and one instance on main2.
But how do you find out that this is the right strategy to do?
And, looking at CPU time, this may take too much time.

Many other issues regarding scalability of verifiers apply to Pi. I would argue that these scalability issues are even more evident because Pi consists of loosely coupled components that don't benefit from caching etc.

I hope this was able to show that solving these combinations without knowing their loose structure is not as trivial as it may seem.

To the best of our knowledge, such a program Pi when seen in practice represent large 'software systems' which run independent or loosely coupled modules. Shall we then not add combinations as a new 'synthesized programs' sub-category of software systems ?

I think that's an interesting idea.
But I think the current idea of SoftwareSystems is that these are real-world software systems, which doesn't apply here.
I wouldn't mind changing the category name, though.

Adding a large number of programs are harder to maintain while the main feature and the intent of adding each Pi is the same.

Yes, that's certainly true. But these tasks, as they currently are, can still be useful for incremental verification. So removing some of them would diminish their purpose.
In addition, most of the maintenance work we do is either (a) automated through scripts or (b) fixing wrong verdicts or undefined behavior. Since the submitted tasks only slightly differ, if one of the tasks is wrong, (b) should also be automatable.

priyankadarke · 2020-11-30T06:06:31Z

Hmm. Thanks for the clarification.

PhilippWendler added C Task in language C new benchmarks labels Jul 23, 2020

lembergerth force-pushed the combined-tasks branch 3 times, most recently from 718eb2d to cbc1944 Compare November 3, 2020 18:35

lembergerth added 5 commits November 3, 2020 19:36

Add benchmark tasks combined from seq-mthreaded/pals_lcr and eca-rers…

02d8440

…2012/Problem12_label0* Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

Add benchmark tasks combined from systemc/pc_sfifo and systemc/token_…

ea9eeb4

…ring Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

Add benchmark tasks combined from bitvector/gcd_ and floats-cdfpl/new…

b674661

…ton_ Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

Add benchmark tasks combined from floats-cdfpl/square and bitvector/s…

4911f5a

…oft_float_ Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

Add benchmark tasks combined from eca-rers2012/Problem05_label4* and …

4b37803

…systemc/token_ring Used in 'Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger: Difference Verification with Conditions, SEFM20'

lembergerth force-pushed the combined-tasks branch from cbc1944 to 4b37803 Compare November 3, 2020 18:36

lembergerth mentioned this pull request Nov 3, 2020

Combinations pals lcr problem12 label0 #1210

Merged

11 tasks

lembergerth closed this Nov 3, 2020

lembergerth reopened this Nov 3, 2020

dbeyer added this to the SV-COMP 2022 milestone Nov 4, 2020

lembergerth added 2 commits November 4, 2020 15:34

Update task definitions of combinations to v2.0

bebf228

Update all task definitions of combinations to v2.0

fa5d940

Previously, only pals_ were updated

lembergerth marked this pull request as draft November 4, 2020 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combined tasks #1104

Combined tasks #1104

lembergerth commented Jul 22, 2020

PhilippWendler commented Jul 23, 2020

PhilippWendler commented Oct 27, 2020

lembergerth commented Oct 28, 2020

PhilippWendler commented Oct 28, 2020

lembergerth commented Oct 28, 2020

PhilippWendler commented Oct 28, 2020

lembergerth commented Nov 3, 2020

hernanponcedeleon commented Nov 3, 2020

lembergerth commented Nov 3, 2020

hernanponcedeleon commented Nov 3, 2020

lembergerth commented Nov 3, 2020

hernanponcedeleon commented Nov 3, 2020

priyankadarke commented Nov 5, 2020

lembergerth commented Nov 5, 2020

priyankadarke commented Nov 6, 2020

lembergerth commented Nov 6, 2020

priyankadarke commented Nov 30, 2020

Combined tasks #1104

Are you sure you want to change the base?

Combined tasks #1104

Conversation

lembergerth commented Jul 22, 2020

Description of tasks

Open Issue: Licensing

Checklist

PhilippWendler commented Jul 23, 2020

PhilippWendler commented Oct 27, 2020

lembergerth commented Oct 28, 2020

PhilippWendler commented Oct 28, 2020

lembergerth commented Oct 28, 2020

PhilippWendler commented Oct 28, 2020

lembergerth commented Nov 3, 2020

hernanponcedeleon commented Nov 3, 2020

lembergerth commented Nov 3, 2020

hernanponcedeleon commented Nov 3, 2020

lembergerth commented Nov 3, 2020

hernanponcedeleon commented Nov 3, 2020

priyankadarke commented Nov 5, 2020

lembergerth commented Nov 5, 2020

priyankadarke commented Nov 6, 2020

lembergerth commented Nov 6, 2020

priyankadarke commented Nov 30, 2020