-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add warning log on dropped spans in batch span processor #3289
Add warning log on dropped spans in batch span processor #3289
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add changelog entries for the new log level and dropped span handling. Tests for the new log entries will also be required.
@Aneurysm9 thank you for the feedback here. One thing to draw specific scrutiny towards, unfortunately we can't use the During the test, it consistently seems to flake if I run it with a small Setting it to 100 has consistently passed but understand if there is concern with using a known flappy configuration. Happy to revisit this test if you have suggestions. |
I think using the non-blocking-on-full-queue mode for this test is probably appropriate, but would love to get the opinion of @open-telemetry/go-approvers. This component has a long history of flaky tests, but it looks like the test you have proposed should be fairly reliable. Please ensure that all commits are associated with a user that has signed the CLA as we will be unable to proceed further (and some approvers may not review the PR) until the CLA is signed. |
0052a23
to
c97924f
Compare
It looks like the added test is quite flaky or does not succeed on windows: https://github.com/open-telemetry/opentelemetry-go/actions/runs/3380083340/jobs/5613037464 We need our testing system to be solid and therefore this needs to be refactored or redesigned to ensure tests pass. |
Signed-off-by: Aaron Clawson <[email protected]>
I updated the test to make it more robust. @MrAlias and @Aneurysm9 can you please review to see if this would be acceptable? I didn't test in windows, but I found the old version would often fail if ran with |
Signed-off-by: Aaron Clawson <[email protected]>
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #3289 +/- ##
=======================================
+ Coverage 78.0% 78.2% +0.2%
=======================================
Files 165 165
Lines 11755 11762 +7
=======================================
+ Hits 9177 9207 +30
+ Misses 2382 2359 -23
Partials 196 196
|
Looks like the tests are still failing: https://github.com/open-telemetry/opentelemetry-go/actions/runs/3586714280/jobs/6036245937 I'm more in favor of stable testing than accepting these changes as is. |
Moved the functionality of the testing stuckExporter into the blockingExporter.
Me too, the problem here is to observe dropped spans we can't use the blocking BSP, because it never drops any spans. |
The test race appears to be fixed by 9af553a. Are there more changes required before this can merge? |
reportedDropped := atomic.SwapUint32(&bsp.reportedDropped, dropped) | ||
if dropped > reportedDropped { | ||
droppedThisBatch := dropped - reportedDropped | ||
global.Info("dropped spans", "total", dropped, "this_batch", droppedThisBatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not call global.Warn
?
global.Info("dropped spans", "total", dropped, "this_batch", droppedThisBatch) | |
global.Warn("dropped spans", "total", dropped, "this_batch", droppedThisBatch) |
"This branch has conflicts that must be resolved" 😉 |
@prodion23 Are you planning to refresh the PR and address the comment(s)? |
Closing this PR as stale, and with conflicts that were not resolved. |
This aims to add warning logging during batch span processor dropping spans.
#3264
One thing I'm unsure about is the logger doesn't offer a
Warn
method already, I added one here - however, unsure if we want to keep that change.https://github.com/open-telemetry/opentelemetry-go/blob/main/internal/global/internal_logging.go#L45-L63
Maybe the info log should use a low priority number so that we have somewhere to fit warn into this?