Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WINC-1222: WMCO applies image mirroring settings to Windows instances #2163

Conversation

saifshaikh48
Copy link
Contributor

@saifshaikh48 saifshaikh48 commented May 7, 2024

  • WINC-1222: WMCO applies image mirroring settings to Windows instances

This PR transfers containerd registry config files generated by WMCO to
each Windows instance based on the cluster's global image mirroring settings,
allowing containerd to pull Windows images from user-defined mirror locations.

The containerd daemon picks up the registry config without requiring restart.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2024
@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch 6 times, most recently from b16235f to 3f2e966 Compare May 9, 2024 22:24
@saifshaikh48 saifshaikh48 changed the title [WIP] WINC-1222 WMCO applies image mirroring settings to Windows instances [WIP] WINC-1222: WMCO applies image mirroring settings to Windows instances May 10, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 10, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 10, 2024

@saifshaikh48: This pull request references WINC-1222 which is a valid jira issue.

In response to this:

This PR transfers containerd registry config files generated by WMCO to
each Windows instance based on the cluster's global image mirroring settings,
allowing containerd to pull Windows images from user-defined mirror locations.

The containerd daemon picks up the registry config without requiring restart.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@saifshaikh48
Copy link
Contributor Author

/test vsphere-disconnected-e2e-operator

for early feedback

@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch 4 times, most recently from a5d30bd to ad79261 Compare May 10, 2024 17:20
build/Dockerfile Outdated
Comment on lines 173 to 175
# Create directory for generated registry config files with open permissions
RUN mkdir /payload/generated/cri_registry_config
RUN chmod 0777 /payload/generated/cri_registry_config
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to update midstream Dockerfile

@saifshaikh48 saifshaikh48 changed the title [WIP] WINC-1222: WMCO applies image mirroring settings to Windows instances WINC-1222: WMCO applies image mirroring settings to Windows instances May 10, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 10, 2024
@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch 2 times, most recently from 2e77c98 to 5101dc3 Compare May 10, 2024 20:32
@saifshaikh48
Copy link
Contributor Author

/test aws-e2e-operator
/test azure-e2e-operator
/test vsphere-disconnected-e2e-operator

@saifshaikh48
Copy link
Contributor Author

/test aws-e2e-operator
/test vsphere-disconnected-e2e-operator
/test gcp-e2e-operator
/test nutanix-e2e-operator

passed on azure, couldn't find anything strange so trying other platforms

@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch from 5101dc3 to 604d204 Compare May 13, 2024 17:59
@saifshaikh48
Copy link
Contributor Author

/test aws-e2e-operator
/test vsphere-e2e-operator

azure, gcp, nutanix CI all passed. aws passed locally so seeing what's up in CI
vsphere-disconnected hitting issues since golden images are currently missing from devqe env. other vsphere jobs should not hit this though

@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch from 604d204 to 617ac10 Compare May 13, 2024 20:39
Copy link
Contributor

@sebsoto sebsoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. Please look at my comments.

pkg/registries/registries.go Show resolved Hide resolved

// extractHost extracts just the initial host repo from a full image location
// e.g. mcr.microsoft.com would be extracted from mcr.microsoft.com/oss/kubernetes/pause:3.9
func extractHost(fullImage string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Host -> Hostname

build/Dockerfile Outdated
@@ -112,6 +112,7 @@ RUN make build-daemon
#│ ├── csi-proxy.exe
#├── ecr-credential-provider.exe
#├── generated/
#│ └── cri_registry_config/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah it doesn't, vestige of the previous solution. I will remove it

test/e2e/mirror_test.go Show resolved Hide resolved
// No point in running the rest of the tests if settings aren't applied
return
}
t.Run("Workload using a mirrored container image", tc.testWorkloadWithMirroredImage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not make this a separate test.
Rather setup the mirrors as part of the test setup, and we should see the tests on a disconnected job pass as normal.

You can make the mirror setup configurable via a CMD line option,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what you mean here

  • Should we only test mirroring through the disconnected job?
  • Do you forsee the mirror_test suite to fail in the disconnected job?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we only test mirroring through the disconnected job?

I don't see it necessary to test it elsewhere if the disconnected job is required.

Do you forsee the mirror_test suite to fail in the disconnected job?

I'm not sure what you mean by this.

@@ -361,11 +369,16 @@ func (vm *windows) EnsureFile(file *payload.FileInfo, remoteDir string) error {
vm.log.Error(err, "error closing local file", "file", file.Path)
}
}()
vm.log.V(1).Info("copy", "local file", file.Path, "remote dir", remoteDir)
if err := vm.interact.transfer(f, filepath.Base(file.Path), remoteDir); err != nil {
vm.log.V(1).Info("copying", "local file", file.Path, "remote dir", remoteDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert

@@ -334,11 +337,16 @@ func (vm *windows) EnsureFileContent(contents []byte, filename string, remoteDir
// The file already exists with the expected content, do nothing
return nil
}
vm.log.V(1).Info("copy", "file content", filename, "remote dir", remoteDir)
if err := vm.interact.transfer(bytes.NewReader(contents), filename, remoteDir); err != nil {
vm.log.V(1).Info("copying", "file content", filename, "remote dir", remoteDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert


c, err := vm.interact.createSFTPClient()
if err != nil {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please wrap error for easier debugging

dstPath := remoteDir + "\\" + workingPath
// Get the directory that will be written into, without trailing slash, and the base file name
splitIndex := strings.LastIndexByte(dstPath, '\\')
writeDir, filename := dstPath[:splitIndex], dstPath[splitIndex+1:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please separate this into two lines for readability


func (c *sshConnectivity) transferFiles(sftpClient *sftp.Client, files map[string][]byte, remoteDir string) error {
for workingPath, content := range files {
// Load file content into io.Reader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unecessary comment

@@ -23,7 +23,7 @@ metadata:
operatorframework.io/suggested-namespace: openshift-windows-machine-config-operator
operators.openshift.io/valid-subscription: '["Red Hat OpenShift support for Windows
Containers"]'
operators.operatorframework.io/builder: operator-sdk-v1.32.0+git
operators.operatorframework.io/builder: operator-sdk-v1.32.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove this, if not required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required but is there any reason for the +git?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are generated when running make bundle, and need to be reverted every time. Make bundle removes the +git.

@@ -152,6 +153,9 @@ func (nc *nodeConfig) Configure() error {
if err := nc.createBootstrapFiles(); err != nil {
return err
}
if err := nc.createRegistryConfigFiles(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are creating the Containerd directory inside nc.Windows.Boostrap(), why don't we call this function after it is created? If needed, we can also split the bootstrap function into two. Let me know if there is another reason for doing this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts here were that createBootstrapFiles() sets up files needed for kubelet and createRegistryConfigFiles() basically does the same for containerd (and containerd is a pre-req service to kubelet) so the funcs should be called together

we are creating the Containerd directory inside nc.Windows.Boostrap()

This is a good point though, I will explore your suggestion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this more and decided to keep it where it is (though I can still be swayed)
Additional reasoning was

  • this way the registry config is created before containerd service actually starts so it doesn't have to do any additional config loading during runtime
  • createRegistryConfigFiles() creates the required directories if needed so no issue

@@ -370,6 +374,15 @@ func (nc *nodeConfig) write(pathToData map[string]string) error {
return nil
}

// createBootstrapFiles creates all files on the node required for containerd to mirror images
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc needs update

@@ -391,6 +404,26 @@ func (vm *windows) FileExists(path, checksum string) (bool, error) {
return false, nil
}

func (vm *windows) ReplaceDir(files map[string][]byte, remoteDir string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc'd in the Windows interface

@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch from 617ac10 to 8e74444 Compare May 14, 2024 17:07
@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch from c914cc9 to 75cf57f Compare June 3, 2024 16:24
@saifshaikh48
Copy link
Contributor Author

/retest-required

@saifshaikh48
Copy link
Contributor Author

/test remaining-required

1 similar comment
@saifshaikh48
Copy link
Contributor Author

/test remaining-required

@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch 3 times, most recently from dcd1dd0 to 8fa3636 Compare June 4, 2024 14:29
@saifshaikh48
Copy link
Contributor Author

/test remaining-required

@saifshaikh48
Copy link
Contributor Author

/test aws-e2e-operator

badPowershellImageLocation = "test.registry.io/powershell"
)

// numBaseRegistryConfigFiles represents the number registry config files that exist before this test suite's steps run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of*

require.NoError(t, err, "unable to get node address")

// retry to give time for registry controller to transfer generated config to all nodes
err = wait.PollUntilContextTimeout(context.TODO(), 15*time.Second, 5*time.Minute, true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the retry time same as retry.Interval?

if err = vm.interact.transferFiles(sftpClient, files, remoteDir); err != nil {
return err
}
return sftpClient.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the client be closed even if we return error in between? This also applies to a few other places in this commit.

This commit transfers containerd mirror registry config files generated by WMCO
to each Windows instance based on the cluster's global image mirroring settings.
This is done both as part of node configuration and when IDMS/ITMSs change.
The containerd daemon picks up the registry config without requiring restart.

Edits the way file transfers work for easier reuse of FTP clients.
Includes tests to verify registry config file creation on Windows nodes
when cluster mirror registry settings are added, and file deletion when
cluster settings are cleared.

Mirror setting clearing is tested in the delete suite.
This is done to avoid a timing issue in the e2e test suite where we were
doing too many drain-causing actions too quickly. Adding/removing
registry config to the cluster can cause linux nodes to drain which
continuously disrupted the WMCO pod while it was trying to reconcile.
Adds an explicit predicate to react to all events on IDMS/ITMS resources
in case the default behavior controller manager watch before changes.
@saifshaikh48 saifshaikh48 force-pushed the mirror-registry-config-containerd branch from 8fa3636 to 714564b Compare June 4, 2024 19:44
@mansikulkarni96
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 4, 2024
@jrvaldes
Copy link
Contributor

jrvaldes commented Jun 4, 2024

/lgtm

@saifshaikh48 please fetch and rebase the latest from master before triggering the rest of the tests.

@mtnbikenc
Copy link
Member

/lgtm

@saifshaikh48 please fetch and rebase the latest from master before triggering the rest of the tests.

@jrvaldes Why is rebasing against master necessary here? Tests are always run against the latest master with the current PR applied?

@saifshaikh48
Copy link
Contributor Author

/test remaining-required

@saifshaikh48
Copy link
Contributor Author

/cherry-pick release-4.16

@openshift-cherrypick-robot

@saifshaikh48: once the present PR merges, I will cherry-pick it on top of release-4.16 in a new PR and assign it to you.

In response to this:

/cherry-pick release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jrvaldes
Copy link
Contributor

jrvaldes commented Jun 4, 2024

/lgtm

@saifshaikh48 please fetch and rebase the latest from master before triggering the rest of the tests.

@jrvaldes Why is rebasing against master necessary here? Tests are always run against the latest master with the current PR applied?

No need, just a nice to have.

Lately, I'm noticing the remaining-required re-testing twice if there is a diff from the head of master.

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD dbb9316 and 2 for PR HEAD 714564b in total

@sebsoto
Copy link
Contributor

sebsoto commented Jun 5, 2024

/test aws-e2e-operator

@jrvaldes
Copy link
Contributor

jrvaldes commented Jun 5, 2024

/test aws-e2e-operator

ipi-install failed

@saifshaikh48
Copy link
Contributor Author

/test aws-e2e-operator

Copy link
Contributor

openshift-ci bot commented Jun 5, 2024

@saifshaikh48: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 0d0867b into openshift:master Jun 5, 2024
17 checks passed
@openshift-cherrypick-robot

@saifshaikh48: new pull request created: #2199

In response to this:

/cherry-pick release-4.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants