Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition or deadlock in Close() #72

Open
jpatters opened this issue Mar 31, 2019 · 5 comments
Open

Race condition or deadlock in Close() #72

jpatters opened this issue Mar 31, 2019 · 5 comments

Comments

@jpatters
Copy link

I am having a pretty terrible time trying to debug this. Things seem to get hung up in multiple places inside of Close().
There are times when it fails to obtain a lock and just gets stuck there forever. And there are times when it can't seem to send on w.close. Any help would be appreciated.

I've added some logging to Close() like so.

func (w *Watcher) Close() {
	fmt.Println("=== starting close")
	w.mu.Lock()
	fmt.Println("=== have lock in close")
	if !w.running {
		w.mu.Unlock()
		return
	}
	w.running = false
	w.files = make(map[string]os.FileInfo)
	w.names = make(map[string]bool)
	w.mu.Unlock()
	// Send a close signal to the Start method.
	fmt.Println("=== sending close")
	w.close <- struct{}{}
	fmt.Println("=== returning from close")
}

In some instances, the last message I get is === starting close. And in some instances it is === sending close. And then other times it works just fine.

Here is the code I am using. Let me know if you need to see more.

func listenForChanges(ctx context.Context, w *watcher.Watcher) {
	w.AddRecursive(outputDirectory())
	go func() {
		haveChange := false
		for {
			select {
			case <-ctx.Done():
				logger.Println("timed out")
				w.Close()
				return
			case err := <-w.Error:
				logger.Println(err)
			case event := <-w.Event:
				logger.Println("found change", event.Name())
				haveChange = true
			case <-w.Closed:
				return
			case <-time.After(200 * time.Millisecond):
				if haveChange {
					logger.Println("we have enough changes")
					w.Close() // This is the line I am calling it from most often when it fails
					return
				}
			}
		}
	}()
}

It is probably worth noting that the directory I am watching contains about 6000 files.

Thanks in advance.

@jpatters
Copy link
Author

This seems to happen when a file event comes in after Close() has been called

@jpatters
Copy link
Author

There is at least one deadlock going on here. When Close() is called, if events are still being published, Close() is trying to obtain a lock (https://github.com/radovskyb/watcher/blob/master/watcher.go#L702) but an event as been published to w.Event (https://github.com/radovskyb/watcher/blob/master/watcher.go#L595) which means that it has the lock.
At the same time there is nothing reading from w.Event because we are still trying to complete Close(). So Close() can never get the lock because nothing is reading from w.Event. And nothing is reading from w.Event because we are trying to Close().
I am unsure if this is caused by my code or something that should be handled by watcher.

I was able to work around it by dumping the 'extra' events around the call to Close() like so.

case <-time.After(200 * time.Millisecond):
  if haveChange {
    logger.Println("we have enough changes")
    quit := make(chan bool)
    go func() {
      for {
        select {
        case event := <-w.Event:
          logger.Println("extra event:", event.Name())
        case <-quit:
          return
        }
      }
    }()
    w.Close()
    quit <- true
    return
  }

It feels like this is something that watcher should handle. If I have time I'll see if I can fix it in a more elegant way inside of watcher, and submit a pr.

@ykumar-rb
Copy link

Even I am facing same issue with my application. Application hangs and unable to process new event .I believe this should be handled inside watcher package .

@tul
Copy link

tul commented Jan 10, 2020

It's not documented, but essentially in the current implementation it's not supported to call Close() from your event polling go routine, or in any way that causes your event handling routine to stop draining w.Event. I think the only safe way to do it without changing the code would be to run Close() in a go routine then have your event handler continue pulling events but as soon as you get a read from w.Closed you can safely exit.
I played around with fixing this earlier today actually, it wasn't such a big change but I discarded it in the end as I didn't really need it. Essentially I changed the places that were writing to w.Event to be in a select which was also reading from w.close. I also changed Close() to do close(w.close) so it could be read multiple times.

@tul
Copy link

tul commented Jan 12, 2020

Regarding allowing Close() to be called from the polling go routine, I just looked at the code again now. There is a second issue which is that retrieveFileList() holds the mutex for the duration of the iteration, which will prevent Close() from being able to execute. So more work will be needed beyond what I outlined above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants