Race condition or deadlock in Close() #72

jpatters · 2019-03-31T12:20:24Z

I am having a pretty terrible time trying to debug this. Things seem to get hung up in multiple places inside of Close().
There are times when it fails to obtain a lock and just gets stuck there forever. And there are times when it can't seem to send on w.close. Any help would be appreciated.

I've added some logging to Close() like so.

func (w *Watcher) Close() {
	fmt.Println("=== starting close")
	w.mu.Lock()
	fmt.Println("=== have lock in close")
	if !w.running {
		w.mu.Unlock()
		return
	}
	w.running = false
	w.files = make(map[string]os.FileInfo)
	w.names = make(map[string]bool)
	w.mu.Unlock()
	// Send a close signal to the Start method.
	fmt.Println("=== sending close")
	w.close <- struct{}{}
	fmt.Println("=== returning from close")
}

In some instances, the last message I get is === starting close. And in some instances it is === sending close. And then other times it works just fine.

Here is the code I am using. Let me know if you need to see more.

func listenForChanges(ctx context.Context, w *watcher.Watcher) {
	w.AddRecursive(outputDirectory())
	go func() {
		haveChange := false
		for {
			select {
			case <-ctx.Done():
				logger.Println("timed out")
				w.Close()
				return
			case err := <-w.Error:
				logger.Println(err)
			case event := <-w.Event:
				logger.Println("found change", event.Name())
				haveChange = true
			case <-w.Closed:
				return
			case <-time.After(200 * time.Millisecond):
				if haveChange {
					logger.Println("we have enough changes")
					w.Close() // This is the line I am calling it from most often when it fails
					return
				}
			}
		}
	}()
}

It is probably worth noting that the directory I am watching contains about 6000 files.

Thanks in advance.

The text was updated successfully, but these errors were encountered:

jpatters · 2019-03-31T14:12:20Z

This seems to happen when a file event comes in after Close() has been called

jpatters · 2019-03-31T19:08:41Z

There is at least one deadlock going on here. When Close() is called, if events are still being published, Close() is trying to obtain a lock (https://github.com/radovskyb/watcher/blob/master/watcher.go#L702) but an event as been published to w.Event (https://github.com/radovskyb/watcher/blob/master/watcher.go#L595) which means that it has the lock.
At the same time there is nothing reading from w.Event because we are still trying to complete Close(). So Close() can never get the lock because nothing is reading from w.Event. And nothing is reading from w.Event because we are trying to Close().
I am unsure if this is caused by my code or something that should be handled by watcher.

I was able to work around it by dumping the 'extra' events around the call to Close() like so.

case <-time.After(200 * time.Millisecond):
  if haveChange {
    logger.Println("we have enough changes")
    quit := make(chan bool)
    go func() {
      for {
        select {
        case event := <-w.Event:
          logger.Println("extra event:", event.Name())
        case <-quit:
          return
        }
      }
    }()
    w.Close()
    quit <- true
    return
  }

It feels like this is something that watcher should handle. If I have time I'll see if I can fix it in a more elegant way inside of watcher, and submit a pr.

ykumar-rb · 2019-10-22T08:35:11Z

Even I am facing same issue with my application. Application hangs and unable to process new event .I believe this should be handled inside watcher package .

tul · 2020-01-10T21:01:03Z

It's not documented, but essentially in the current implementation it's not supported to call Close() from your event polling go routine, or in any way that causes your event handling routine to stop draining w.Event. I think the only safe way to do it without changing the code would be to run Close() in a go routine then have your event handler continue pulling events but as soon as you get a read from w.Closed you can safely exit.
I played around with fixing this earlier today actually, it wasn't such a big change but I discarded it in the end as I didn't really need it. Essentially I changed the places that were writing to w.Event to be in a select which was also reading from w.close. I also changed Close() to do close(w.close) so it could be read multiple times.

tul · 2020-01-12T16:26:57Z

Regarding allowing Close() to be called from the polling go routine, I just looked at the code again now. There is a second issue which is that retrieveFileList() holds the mutex for the duration of the iteration, which will prevent Close() from being able to execute. So more work will be needed beyond what I outlined above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition or deadlock in Close() #72

Race condition or deadlock in Close() #72

jpatters commented Mar 31, 2019

jpatters commented Mar 31, 2019

jpatters commented Mar 31, 2019

ykumar-rb commented Oct 22, 2019

tul commented Jan 10, 2020

tul commented Jan 12, 2020

Race condition or deadlock in Close() #72

Race condition or deadlock in Close() #72

Comments

jpatters commented Mar 31, 2019

jpatters commented Mar 31, 2019

jpatters commented Mar 31, 2019

ykumar-rb commented Oct 22, 2019

tul commented Jan 10, 2020

tul commented Jan 12, 2020