Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky test: TestUsernsCheckpoint #4273

Open
lifubang opened this issue May 7, 2024 · 2 comments
Open

flaky test: TestUsernsCheckpoint #4273

lifubang opened this issue May 7, 2024 · 2 comments

Comments

@lifubang
Copy link
Member

lifubang commented May 7, 2024

I saw this happend many times in centos7.

=== RUN   TestUsernsCheckpoint
time="2024-05-07T10:08:51Z" level=warning msg="--- Quoting \"/tmp/TestUsernsCheckpoint611938415/003/criu-parent/dump.log\""
time="2024-05-07T10:08:51Z" level=warning msg="116:(09.514467) freezer.state=FREEZING"
time="2024-05-07T10:08:51Z" level=warning msg="117:(09.614644) freezer.state=FREEZING"
time="2024-05-07T10:08:51Z" level=warning msg="118:(09.714816) freezer.state=FREEZING"
time="2024-05-07T10:08:51Z" level=warning msg="119:(09.814957) freezer.state=FREEZING"
time="2024-05-07T10:08:51Z" level=warning msg="120:(09.915110) freezer.state=FREEZING"
time="2024-05-07T10:08:51Z" level=warning msg="121:(10.000432) Error (criu/cr-dump.c:1467): Timeout reached. Try to interrupt: 0"
time="2024-05-07T10:08:51Z" level=warning msg="122:(10.000563) freezer.state=FREEZING"
time="2024-05-07T10:08:51Z" level=warning msg="123:(10.000694) Error (compel/src/lib/infect.c:234): Unseizable non-zombie 9017 found, state D, err -1/10"
time="2024-05-07T10:08:51Z" level=warning msg="124:(10.000773) Unfreezing tasks into 1"
time="2024-05-07T10:08:51Z" level=warning msg="125:(10.000778) \tUnseizing 9017 into 1"
time="2024-05-07T10:08:51Z" level=warning msg="126:(10.000783) Error (compel/src/lib/infect.c:355): Unable to detach from 9017: No such process"
time="2024-05-07T10:08:51Z" level=warning msg="127:(10.000800) Writing image inventory (version 1)"
time="2024-05-07T10:08:51Z" level=warning msg="128:(10.000976) Error (criu/cr-dump.c:1581): Pre-dumping FAILED."
time="2024-05-07T10:08:51Z" level=warning msg=---
    checkpoint_test.go:115: === /tmp/TestUsernsCheckpoint611938415/003/criu-parent/dump.log ===
    checkpoint_test.go:115: (00.000052) Version: 3.16 (gitid 0)
    checkpoint_test.go:115: (00.000067) Running on cirrus-task-5639495050067968 Linux 3.10.0-1160.114.2.el7.x86_64 #1 SMP Wed Mar 20 15:54:52 UTC 2024 x86_64
    checkpoint_test.go:115: (00.000070) Would overwrite RPC settings with values from /etc/criu/runc.conf
    checkpoint_test.go:115: (00.000094) Loaded kdat cache from /run/criu/criu.kdat
    checkpoint_test.go:115: (00.000142) rlimit: RLIMIT_NOFILE unlimited for self
    checkpoint_test.go:115: (00.000148) Enforcing memory tracking for pre-dump.
    checkpoint_test.go:115: (00.000156) Enforcing tasks run after pre-dump.
    checkpoint_test.go:115: (00.000170) irmap: Searching irmap cache in work dir
    checkpoint_test.go:115: (00.000180) No irmap-cache image
    checkpoint_test.go:115: (00.000181) irmap: Searching irmap cache in parent
    checkpoint_test.go:115: (00.000185) No parent images directory provided
    checkpoint_test.go:115: (00.000187) irmap: No irmap cache
    checkpoint_test.go:115: (00.000205) cpu: x86_family 25 x86_vendor_id AuthenticAMD x86_model_id AMD EPYC 7B13
    checkpoint_test.go:115: (00.000210) cpu: fpu: xfeatures_mask 0x5 xsave_size 832 xsave_size_max 2440 xsaves_size 832
    checkpoint_test.go:115: (00.000213) cpu: fpu: x87 floating point registers     xstate_offsets      0 / 0      xstate_sizes    160 / 160   
    checkpoint_test.go:115: (00.000215) cpu: fpu: AVX registers                    xstate_offsets    576 / 576    xstate_sizes    256 / 256   
    checkpoint_test.go:115: (00.000217) cpu: fpu:1 fxsr:1 xsave:1 xsaveopt:1 xsavec:1 xgetbv1:1 xsaves:0
    checkpoint_test.go:115: (00.000338) Detected cgroup V1 freezer
    checkpoint_test.go:115: (00.000340) freezing processes: 100000 attempts with 100 ms steps
    checkpoint_test.go:115: (00.000351) freezer.state=THAWED
    checkpoint_test.go:115: (00.000358) freezer.state=FREEZING
    checkpoint_test.go:115: (00.100446) freezer.state=FREEZING
    checkpoint_test.go:115: (00.201766) freezer.state=FREEZING
    checkpoint_test.go:115: (00.301871) freezer.state=FREEZING
    checkpoint_test.go:115: (00.401990) freezer.state=FREEZING
    checkpoint_test.go:115: (00.502110) freezer.state=FREEZING
    checkpoint_test.go:115: (00.602214) freezer.state=FREEZING
    checkpoint_test.go:115: (00.702313) freezer.state=FREEZING
    checkpoint_test.go:115: (00.802425) freezer.state=FREEZING
    checkpoint_test.go:115: (00.902531) freezer.state=FREEZING
    checkpoint_test.go:115: (01.002635) freezer.state=FREEZING
    checkpoint_test.go:115: (01.102755) freezer.state=FREEZING
    checkpoint_test.go:115: (01.202870) freezer.state=FREEZING
    checkpoint_test.go:115: (01.303058) freezer.state=FREEZING
    checkpoint_test.go:115: (01.403208) freezer.state=FREEZING
    checkpoint_test.go:115: (01.503308) freezer.state=FREEZING
    checkpoint_test.go:115: (01.603429) freezer.state=FREEZING
    checkpoint_test.go:115: (01.703589) freezer.state=FREEZING
    checkpoint_test.go:115: (01.803726) freezer.state=FREEZING
    checkpoint_test.go:115: (01.903872) freezer.state=FREEZING
    checkpoint_test.go:115: (02.004022) freezer.state=FREEZING
    checkpoint_test.go:115: (02.104139) freezer.state=FREEZING
    checkpoint_test.go:115: (02.204270) freezer.state=FREEZING
    checkpoint_test.go:115: (02.304422) freezer.state=FREEZING
    checkpoint_test.go:115: (02.404578) freezer.state=FREEZING
    checkpoint_test.go:115: (02.504717) freezer.state=FREEZING
    checkpoint_test.go:115: (02.604860) freezer.state=FREEZING
    checkpoint_test.go:115: (02.704987) freezer.state=FREEZING
    checkpoint_test.go:115: (02.805144) freezer.state=FREEZING
    checkpoint_test.go:115: (02.905275) freezer.state=FREEZING
    checkpoint_test.go:115: (03.005410) freezer.state=FREEZING
    checkpoint_test.go:115: (03.105546) freezer.state=FREEZING
    checkpoint_test.go:115: (03.205676) freezer.state=FREEZING
    checkpoint_test.go:115: (03.305821) freezer.state=FREEZING
    checkpoint_test.go:115: (03.405941) freezer.state=FREEZING
    checkpoint_test.go:115: (03.506057) freezer.state=FREEZING
    checkpoint_test.go:115: (03.606181) freezer.state=FREEZING
    checkpoint_test.go:115: (03.706322) freezer.state=FREEZING
    checkpoint_test.go:115: (03.806446) freezer.state=FREEZING
    checkpoint_test.go:115: (03.906569) freezer.state=FREEZING
    checkpoint_test.go:115: (04.006738) freezer.state=FREEZING
    checkpoint_test.go:115: (04.106903) freezer.state=FREEZING
    checkpoint_test.go:115: (04.207032) freezer.state=FREEZING
    checkpoint_test.go:115: (04.307154) freezer.state=FREEZING
    checkpoint_test.go:115: (04.407273) freezer.state=FREEZING
    checkpoint_test.go:115: (04.507399) freezer.state=FREEZING
    checkpoint_test.go:115: (04.607502) freezer.state=FREEZING
    checkpoint_test.go:115: (04.707592) freezer.state=FREEZING
    checkpoint_test.go:115: (04.807698) freezer.state=FREEZING
    checkpoint_test.go:115: (04.907829) freezer.state=FREEZING
    checkpoint_test.go:115: (05.007957) freezer.state=FREEZING
    checkpoint_test.go:115: (05.108092) freezer.state=FREEZING
    checkpoint_test.go:115: (05.208199) freezer.state=FREEZING
    checkpoint_test.go:115: (05.308309) freezer.state=FREEZING
    checkpoint_test.go:115: (05.408418) freezer.state=FREEZING
    checkpoint_test.go:115: (05.508566) freezer.state=FREEZING
    checkpoint_test.go:115: (05.608724) freezer.state=FREEZING
    checkpoint_test.go:115: (05.708885) freezer.state=FREEZING
    checkpoint_test.go:115: (05.809035) freezer.state=FREEZING
    checkpoint_test.go:115: (05.909159) freezer.state=FREEZING
    checkpoint_test.go:115: (06.009283) freezer.state=FREEZING
    checkpoint_test.go:115: (06.109410) freezer.state=FREEZING
    checkpoint_test.go:115: (06.209537) freezer.state=FREEZING
    checkpoint_test.go:115: (06.309662) freezer.state=FREEZING
    checkpoint_test.go:115: (06.409787) freezer.state=FREEZING
    checkpoint_test.go:115: (06.509905) freezer.state=FREEZING
    checkpoint_test.go:115: (06.610031) freezer.state=FREEZING
    checkpoint_test.go:115: (06.710165) freezer.state=FREEZING
    checkpoint_test.go:115: (06.810288) freezer.state=FREEZING
    checkpoint_test.go:115: (06.910416) freezer.state=FREEZING
    checkpoint_test.go:115: (07.010552) freezer.state=FREEZING
    checkpoint_test.go:115: (07.110678) freezer.state=FREEZING
    checkpoint_test.go:115: (07.210806) freezer.state=FREEZING
    checkpoint_test.go:115: (07.310933) freezer.state=FREEZING
    checkpoint_test.go:115: (07.411069) freezer.state=FREEZING
    checkpoint_test.go:115: (07.511252) freezer.state=FREEZING
    checkpoint_test.go:115: (07.611415) freezer.state=FREEZING
    checkpoint_test.go:115: (07.711588) freezer.state=FREEZING
    checkpoint_test.go:115: (07.811742) freezer.state=FREEZING
    checkpoint_test.go:115: (07.911897) freezer.state=FREEZING
    checkpoint_test.go:115: (08.012029) freezer.state=FREEZING
    checkpoint_test.go:115: (08.112217) freezer.state=FREEZING
    checkpoint_test.go:115: (08.212392) freezer.state=FREEZING
    checkpoint_test.go:115: (08.312553) freezer.state=FREEZING
    checkpoint_test.go:115: (08.412734) freezer.state=FREEZING
    checkpoint_test.go:115: (08.512909) freezer.state=FREEZING
    checkpoint_test.go:115: (08.613067) freezer.state=FREEZING
    checkpoint_test.go:115: (08.713220) freezer.state=FREEZING
    checkpoint_test.go:115: (08.813373) freezer.state=FREEZING
    checkpoint_test.go:115: (08.913548) freezer.state=FREEZING
    checkpoint_test.go:115: (09.013704) freezer.state=FREEZING
    checkpoint_test.go:115: (09.113850) freezer.state=FREEZING
    checkpoint_test.go:115: (09.213999) freezer.state=FREEZING
    checkpoint_test.go:115: (09.314151) freezer.state=FREEZING
    checkpoint_test.go:115: (09.414305) freezer.state=FREEZING
    checkpoint_test.go:115: (09.514467) freezer.state=FREEZING
    checkpoint_test.go:115: (09.614644) freezer.state=FREEZING
    checkpoint_test.go:115: (09.714816) freezer.state=FREEZING
    checkpoint_test.go:115: (09.814957) freezer.state=FREEZING
    checkpoint_test.go:115: (09.915110) freezer.state=FREEZING
    checkpoint_test.go:115: (10.000432) Error (criu/cr-dump.c:1467): Timeout reached. Try to interrupt: 0
    checkpoint_test.go:115: (10.000563) freezer.state=FREEZING
    checkpoint_test.go:115: (10.000694) Error (compel/src/lib/infect.c:234): Unseizable non-zombie 9017 found, state D, err -1/10
    checkpoint_test.go:115: (10.000773) Unfreezing tasks into 1
    checkpoint_test.go:115: (10.000778) 	Unseizing 9017 into 1
    checkpoint_test.go:115: (10.000783) Error (compel/src/lib/infect.c:355): Unable to detach from 9017: No such process
    checkpoint_test.go:115: (10.000800) Writing image inventory (version 1)
    checkpoint_test.go:115: (10.000976) Error (criu/cr-dump.c:1581): Pre-dumping FAILED.
    checkpoint_test.go:115: === END ===
    checkpoint_test.go:119: criu failed: type PRE_DUMP errno 0
--- FAIL: TestUsernsCheckpoint (10.31s)
@kolyshkin
Copy link
Contributor

I've seen this a few times, too.

@lifubang this means that the kernel can't freeze the cgroup despite the repeated attempts, so criu gives up.

Alas, this might be a kernel issue, and the CentOS 7 kernel is too old. In general, cgroup freezer is not very reliable, I previously had to implement some hacks in runc to work around it (see #2941 and the earlier PRs linked from there).

We can either try to add similar kludges to https://github.com/checkpoint-restore/criu, or skip these tests on CentOS 7.

@lifubang
Copy link
Member Author

lifubang commented Jun 1, 2024

skip these tests on CentOS 7.

I have to rerun the centos 7 tests manually for many times, so let’s skip them in centos 7?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants