Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix runc's poststart behaviour doesn't match the runtime-spec #4348

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 19 additions & 14 deletions libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -236,7 +236,11 @@ func (c *Container) exec() error {
for {
select {
case result := <-blockingFifoOpenCh:
return handleFifoResult(result)
err := handleFifoResult(result)
if err != nil {
return err
}
return c.postStart()

case <-time.After(time.Millisecond * 100):
stat, err := system.Stat(pid)
Expand All @@ -252,6 +256,20 @@ func (c *Container) exec() error {
}
}

func (c *Container) postStart() error {
s, err := c.currentOCIState()
if err != nil {
return err
}
if c.config.Hooks != nil {
if err := c.config.Hooks.Run(configs.Poststart, s); err != nil {
Copy link
Member

@lifubang lifubang Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the runtime-spec said:
The poststart hooks MUST be invoked by the runtime. If any poststart hook fails, the runtime MUST log a warning, but the remaining hooks and lifecycle continue as if the hook had succeeded.

So maybe this is another bug, we should write a new function to run poststart and poststop hooks.
@kolyshkin WDYT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, please help to open a new issue to track this bug, thanks.

This comment was marked as outdated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the current code logs a warning (like the spec says) but also terminates container's init and returns an error (unlike the spec says).

I'm not sure whether we should change the spec or runc behavior. In theory spec should be left as is, and runc should be changed to follow the spec. In practice, though, changing runc might result in backward incompatibilities and complexities in upper runtime. So yes, we have a choice to make here.

It's complicated 😿

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I researched the history a little bit.

  1. The initial implementation of poststart hooks was done by @mrunalp in 2015 (commit 452e8a7, PR Add poststart hooks #392, merged Nov 2015). If a hook was returning an error, the start was terminated and the error was returned.
  2. The "start must continue even if poststart hook failed" clause was added to runtime-spec by @RenaudWasTaken in Add create-runtime, create-container, start-container hooks runtime-spec#1008 (merged Dec 30 2019).
  3. More hooks were added to runc by @RenaudWasTaken (commit ccdd757, PR Add CreateRuntime, CreateContainer and StartContainer Hooks #2229, merged Jun 2020). The same commit touched the poststart code but didn't change it to not return an error if a hook failed.

In my mind, this is the spec that needs to be changed. The reasons are:

  • initial implementation predates the spec wording by a few years;
  • the wording in the spec was never implemented in runc;
  • returning an error (and stopping the container) seems like a more versatile approach (a hook can choose whether to return an error or not).

Opened opencontainers/runtime-spec#1262

Copy link
Author

@ningmingxiao ningmingxiao Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how to stop container by using SIGTERM ,SIGKILL or both use?

logrus.Errorf("run poststart hook: %v", err)
return fmt.Errorf("run poststart hook: %w", err)
}
}
return nil
}

func readFromExecFifo(execFifo io.Reader) error {
data, err := io.ReadAll(execFifo)
if err != nil {
Expand Down Expand Up @@ -353,19 +371,6 @@ func (c *Container) start(process *Process) (retErr error) {

if process.Init {
c.fifo.Close()
if c.config.Hooks != nil {
s, err := c.currentOCIState()
if err != nil {
return err
}

if err := c.config.Hooks.Run(configs.Poststart, s); err != nil {
if err := ignoreTerminateErrors(parent.terminate()); err != nil {
logrus.Warn(fmt.Errorf("error running poststart hook: %w", err))
}
return err
}
}
}
return nil
}
Expand Down
Loading