-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: flaky results: same start and end marker #413
base: main
Are you sure you want to change the base?
Conversation
What you want already exists:
I see one possible issue: when the log is not flushed immediately,
We can solve this problem by using a suffix to the UUID (I'd like to keep a single UUID to make it easier to identify the matching markers when reading the log manually). E.g., the start marker could be |
Thanks Max!
In this scenario
This logic does no help in this scenario: the start marker is correctly found, so no retries are performed.
If I'm getting the code right,
Going step by step:
We can notably see that we don't have yet a trailing marker with
|
Thanks for the analysis. You're right, in that case both markers would be the same line. Both cases, the one you described and the one I came up with, would be fixed by using identifying suffixes for the start and end markers, since |
1adaf21
to
22f5749
Compare
0f798b0
to
5b3386a
Compare
Thank you! It sounds like a very reasonable idea to have the same
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a test yet, that specifically reproduces the issue, right? Could you add one?
@@ -166,7 +173,8 @@ func RunStage(runContext *TestRunContext, ftwCheck *check.FTWCheck, testCase sch | |||
} | |||
|
|||
if notRunningInCloudMode(ftwCheck) { | |||
startMarker, err := markAndFlush(runContext, &testInput, stageId) | |||
startUUID := stageId + startUUIDSuffix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startUUID := stageId + startUUIDSuffix | |
startId := stageId + startUUIDSuffix |
startUUIDSuffix = "-s" | ||
endUUIDSuffix = "-e" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startUUIDSuffix = "-s" | |
endUUIDSuffix = "-e" | |
startUuidSuffix = "-s" | |
endUuidSuffix = "-e" |
@@ -190,7 +198,8 @@ func RunStage(runContext *TestRunContext, ftwCheck *check.FTWCheck, testCase sch | |||
} | |||
|
|||
if notRunningInCloudMode(ftwCheck) { | |||
endMarker, err := markAndFlush(runContext, &testInput, stageId) | |||
endUUID := stageId + endUUIDSuffix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endUUID := stageId + endUUIDSuffix | |
endId := stageId + endUUIDSuffix |
endFound = true | ||
|
||
if !endFound { | ||
// Skip lines until we find the end marker. Reading backwards, the line we are looking for are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Skip lines until we find the end marker. Reading backwards, the line we are looking for are | |
// Skip lines until we find the end marker. Reading backwards, the lines we are looking for are |
break | ||
} | ||
|
||
saneCopy := make([]byte, len(line)) | ||
copy(saneCopy, line) | ||
ll.markedLines = append(ll.markedLines, saneCopy) | ||
} | ||
if !startFound { | ||
log.Debug().Msgf("starter marker not found while getting marked lines") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Debug().Msgf("starter marker not found while getting marked lines") | |
log.Debug().Msg("starter marker not found while getting marked lines") |
|
||
return ll.markedLines | ||
} | ||
|
||
// CheckLogForMarker reads the log file and searches for a marker line. | ||
// stageID is the ID of the current stage, which is part of the marker line | ||
// At this point, stageID includes also the start or end suffix. | ||
// readLimit is the maximum numbers of lines to check | ||
func (ll *FTWLogLines) CheckLogForMarker(stageID string, readLimit uint) []byte { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func (ll *FTWLogLines) CheckLogForMarker(stageID string, readLimit uint) []byte { | |
func (ll *FTWLogLines) CheckLogForMarker(markerId string, readLimit uint) []byte { |
// stageID is the ID of the current stage, which is part of the marker line | ||
// At this point, stageID includes also the start or end suffix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// stageID is the ID of the current stage, which is part of the marker line | |
// At this point, stageID includes also the start or end suffix. | |
// markerId is the ID of the current stage + suffix (for start / end), which is part of the marker line |
@@ -201,12 +201,15 @@ func (s *readTestSuite) TestFTWLogLines_Contains() { | |||
s.NotNil(cfg) | |||
|
|||
stageID := "dead-beaf-deadbeef-deadbeef-dead" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stageID := "dead-beaf-deadbeef-deadbeef-dead" | |
stageId := "dead-beaf-deadbeef-deadbeef-dead" |
@@ -368,12 +373,13 @@ func (s *readTestSuite) TestFindAllIdsInLogs() { | |||
s.NotNil(cfg) | |||
|
|||
stageID := "dead-beaf-deadbeef-deadbeef-dead" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stageID := "dead-beaf-deadbeef-deadbeef-dead" | |
stageId := "dead-beaf-deadbeef-deadbeef-dead" |
I'm investigating some flakiness and found that the same marker might be retrieved for both the starter and the final one. When this happens, the function responsible for retrieving all the marked lines can not locate the starter marker. As a result, it uses all the logs up to the beginning of the file and processes IDs from lines belonging to previous tests, leading to incorrect results.
My idea is that it can happen when go-ftw is too fast and reads again the same line before the engine/connector manages to write/flush the new logs (which include the final marker) into the file.
From a visual perspective (Note: The read is performed backward.):
] Expected scenario:
start marker:
8888-8888
final marker:
9999-9999
] Assuming the case that both the start marker and final marker have the same value (equal to the start marker):
start marker:
8888-8888
final marker:
8888-8888
8888-8888
matcheslog id:11
is not collected, but ratherlog id:10
and all the previous ones are collected.The PR proposes to:
It would allow us to have more visibility on the issue and bring in some more resiliency. Anyways I think that it is not expected to have the same start and final marker line, therefore I think adding a check would be beneficial.
I will follow with a test/reproducer whenever I manage to reproduce it in an easy and consistent way