-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethink how we sleep in StageTask poll() #13
Comments
I'd like to share my experience here. Note that I had adapted it to fit TRIUMF tape bakend's workflow in 2015 so there is slight difference but overall the fundamental logic is still same. In 2019 TRIUMF had similar issue of slow moving of staged file from /in/ dir to pool dir. I looked at all the possible causes and found the followings;
It seemed to me that the difference between file reading speed and ENDIT's polling speed was the main culprit of frequent OVERFLOW, which in turn cause a new OVERFLOW event, repeating until the system getting relatively quiet. I don't know much about dCache internal and Java threads, but the debug messages that I put to the code showed that, for some large files, Since watcher's serial nature, the events will be handled one-by-one regardless of number of threads for the tasks. While watcher was doing As a result, the buffer OVERFLOWs soon, and While looping for all the requests, more files were read and the buffer gets OVERFLOW soon again. For As I mentioned, I have very limited knowledge on dCache and JAVA threads so my diagnosis may be incorrect. But I thought it may be worth to share my experience. |
I've also been thinking of using ZooKeeper's watch service to replace Java's |
In order to allow for
dsmc
to finish setting attributes etc there is asleep()
in the StageTaskpoll()
:dcache-endit-provider/src/main/java/org/ndgf/endit/StageTask.java
Line 121 in 71cf6b2
I suspect that this
sleep()
might be the reason for the somewhat unexpected behavior that the WatchingProvider is so slow, and the observation that the PollingProvider performs much better provided that we allocate a LOT of threads to it.My reasoning is that although it's a
Thread.sleep()
it still suspends execution of the thread. This will wreak havoc with the watching provider performance and also increases the likelihood for event overflows. For the polling provider theGRACE_PERIOD
of 1000 ms is a direct correlation to the observed performance of the 1-thread-per-Hz of staging performance.What we really ought to do is something along the lines of:
GRACE_PERIOD
I believe this would allow threads to do actual work instead of sleeping all the time.
The text was updated successfully, but these errors were encountered: