Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce an option similar to --after-timeout for timers #298

Open
dszoboszlay opened this issue Dec 5, 2018 · 6 comments
Open

Introduce an option similar to --after-timeout for timers #298

dszoboszlay opened this issue Dec 5, 2018 · 6 comments

Comments

@dszoboszlay
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Concuerror already supports treating timeouts in receive/after statements as infinity above some threshold with the --after-timeout option. However, this option does not cover the case of timers started with erlang:start_timer. My immediate problem is with gen_statem, that unlike gen_server uses timers which I cannot prevent from firing in my Concuerror tests.

Describe the solution you'd like
I'd like either the --after-timeout option to apply to timers started with erlang:start_timer (that is, timers above the specified threshold shall never fire while running my tests), or I'd like a new option, such as --timer-timeout to be introduced specifically for timers. Whichever is more convenient to implement.

Describe alternatives you've considered
My only alternative at the moment is to change the code I'm testing to not use finite timeouts with gen_statem when used from a Concuerror test case. However, this would make the code I test and the code I use in production differ for no good reason.

Additional context
N/A

@aronisstav
Copy link
Member

The --after-timeout applying to timers as well makes perfect sense. Thanks for pointing this out! Will be fixed (soon)!

@aronisstav
Copy link
Member

aronisstav commented Dec 6, 2018

This could be sufficiently solved by adding a suitable call to

concuerror_inspect:inspect(receive, ..., ignored)

above here https://github.com/parapluu/Concuerror/blob/master/src/concuerror_callback.erl#L899.

@dszoboszlay
Copy link
Contributor Author

dszoboszlay commented Jan 10, 2019

I tried to implement this feature according to your advise in b76b9ab, but it doesn't work.

I have a test case that tries all three constructs that should be governed by --after-timeout: receive, send_after and start_timer. But only receive works as expected. For the other two cases Concuerror finds a race and then crashes like this:

################################################################################
Interleaving #1
--------------------------------------------------------------------------------
Errors found:
* Blocked at a 'receive' ("deadlocked"; other processes have exited):
    <P> in after_timeout.erl line 34
     Mailbox contents: []
--------------------------------------------------------------------------------
Event trace:
   1: <P>: #Ref<0.0.1.291> = erlang:start_timer(1000, <P>, foo)
    in after_timeout.erl line 32
   2: <P>: 1 = erlang:cancel_timer(#Ref<0.0.1.291>)
    in after_timeout.erl line 33
   3: <Timer #Ref<0.0.1.291>>: is removed
--------------------------------------------------------------------------------
New races found:
*    2: <P>: 1 = erlang:cancel_timer(#Ref<0.0.1.291>)
     3: <Timer #Ref<0.0.1.291>>: is removed

################################################################################
Interleaving #2
--------------------------------------------------------------------------------
Errors found:
* Concuerror crashed
--------------------------------------------------------------------------------
Event trace:
   1: <P>: #Ref<0.0.1.291> = erlang:start_timer(1000, <P>, foo)
    in after_timeout.erl line 32
################################################################################
Exploration completed!
################################################################################
Errors:
--------------------------------------------------------------------------------
* On step 2, replaying a built-in returned a different result than expected:
  original:
    <0.84.0>: is removed
  new:
    blocked
 Please notify the developers, as this is a bug of Concuerror.

(I tweaked the test case a bit, so that <P> blocks and therefore I get an event trace for the first interleaving too.)

I would appreciate some help on which part of the codebase to poke around with.

@aronisstav
Copy link
Member

Ah, very sorry for the bad lead.

Another acceptable way to do this is to add an extra case for timer-related primitives, that checks the after_timeout setting and just returns a reference (via the relevant callback), but also emitting a warning similar to the one for 'ignoring after timeouts' for receives.

This case should be placed before this: https://github.com/parapluu/Concuerror/blob/master/src/concuerror_callback.erl#L855

@dszoboszlay
Copy link
Contributor Author

But that would mean cancel_timer wouldn't find a running timer with that reference and return false instead of an integer here: https://github.com/parapluu/Concuerror/blob/master/src/concuerror_callback.erl#L822-L823

I think my approach should work too, but I miss some rules about what events race in a timer-specific scenario. The test case with the receive after construct does exactly the same thing internally: it spawns a new process that gets blocked in a receive statement. Only that this time the new process is a special timer process, and therefore Concuerror doesn't recognise the causality between stop_timer and the process terminating.

@aronisstav
Copy link
Member

aronisstav commented Jan 10, 2019

Eek. Didn't think about that case.

Indeed the assumption has been that one can always reverse cancel_timer bifs and fake timer processes dying. With a quick look I think the relevant line is:

https://github.com/parapluu/Concuerror/blob/master/src/concuerror_dependencies.erl#L332

and the "time order" is irrelevant due to:

https://github.com/parapluu/Concuerror/blob/master/src/concuerror_dependencies.erl#L75

Unfortunately I can't think of an easy solution.

Faking the rest of the timer API so that e.g. 'cancel_timer' would be an exit signal, and 'read_timer' a 'is_process_alive' requires an extra abstraction layer above built-in ops. This would get you correct dependencies "for free".

Another, 'ugly traditional' way to fix such things in the past has been to add more info to events in the extra field and using it for dependency detection. So you could add the after_timeout to some/all exit events and then fix the offending lines above to only have a dependency with the ones that can really be reversed. But the code in _dependencies has reached the point where this is really painful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants