-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WINC-1222: WMCO applies image mirroring settings to Windows instances #2163
WINC-1222: WMCO applies image mirroring settings to Windows instances #2163
Conversation
b16235f
to
3f2e966
Compare
@saifshaikh48: This pull request references WINC-1222 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test vsphere-disconnected-e2e-operator for early feedback |
a5d30bd
to
ad79261
Compare
build/Dockerfile
Outdated
# Create directory for generated registry config files with open permissions | ||
RUN mkdir /payload/generated/cri_registry_config | ||
RUN chmod 0777 /payload/generated/cri_registry_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to update midstream Dockerfile
2e77c98
to
5101dc3
Compare
/test aws-e2e-operator |
/test aws-e2e-operator passed on azure, couldn't find anything strange so trying other platforms |
5101dc3
to
604d204
Compare
/test aws-e2e-operator azure, gcp, nutanix CI all passed. aws passed locally so seeing what's up in CI |
604d204
to
617ac10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Please look at my comments.
pkg/registries/registries.go
Outdated
|
||
// extractHost extracts just the initial host repo from a full image location | ||
// e.g. mcr.microsoft.com would be extracted from mcr.microsoft.com/oss/kubernetes/pause:3.9 | ||
func extractHost(fullImage string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Host -> Hostname
build/Dockerfile
Outdated
@@ -112,6 +112,7 @@ RUN make build-daemon | |||
#│ ├── csi-proxy.exe | |||
#├── ecr-credential-provider.exe | |||
#├── generated/ | |||
#│ └── cri_registry_config/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah it doesn't, vestige of the previous solution. I will remove it
test/e2e/mirror_test.go
Outdated
// No point in running the rest of the tests if settings aren't applied | ||
return | ||
} | ||
t.Run("Workload using a mirrored container image", tc.testWorkloadWithMirroredImage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not make this a separate test.
Rather setup the mirrors as part of the test setup, and we should see the tests on a disconnected job pass as normal.
You can make the mirror setup configurable via a CMD line option,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what you mean here
- Should we only test mirroring through the disconnected job?
- Do you forsee the mirror_test suite to fail in the disconnected job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we only test mirroring through the disconnected job?
I don't see it necessary to test it elsewhere if the disconnected job is required.
Do you forsee the mirror_test suite to fail in the disconnected job?
I'm not sure what you mean by this.
pkg/windows/windows.go
Outdated
@@ -361,11 +369,16 @@ func (vm *windows) EnsureFile(file *payload.FileInfo, remoteDir string) error { | |||
vm.log.Error(err, "error closing local file", "file", file.Path) | |||
} | |||
}() | |||
vm.log.V(1).Info("copy", "local file", file.Path, "remote dir", remoteDir) | |||
if err := vm.interact.transfer(f, filepath.Base(file.Path), remoteDir); err != nil { | |||
vm.log.V(1).Info("copying", "local file", file.Path, "remote dir", remoteDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert
pkg/windows/windows.go
Outdated
@@ -334,11 +337,16 @@ func (vm *windows) EnsureFileContent(contents []byte, filename string, remoteDir | |||
// The file already exists with the expected content, do nothing | |||
return nil | |||
} | |||
vm.log.V(1).Info("copy", "file content", filename, "remote dir", remoteDir) | |||
if err := vm.interact.transfer(bytes.NewReader(contents), filename, remoteDir); err != nil { | |||
vm.log.V(1).Info("copying", "file content", filename, "remote dir", remoteDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert
pkg/windows/windows.go
Outdated
|
||
c, err := vm.interact.createSFTPClient() | ||
if err != nil { | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please wrap error for easier debugging
pkg/windows/connectivity.go
Outdated
dstPath := remoteDir + "\\" + workingPath | ||
// Get the directory that will be written into, without trailing slash, and the base file name | ||
splitIndex := strings.LastIndexByte(dstPath, '\\') | ||
writeDir, filename := dstPath[:splitIndex], dstPath[splitIndex+1:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please separate this into two lines for readability
pkg/windows/connectivity.go
Outdated
|
||
func (c *sshConnectivity) transferFiles(sftpClient *sftp.Client, files map[string][]byte, remoteDir string) error { | ||
for workingPath, content := range files { | ||
// Load file content into io.Reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unecessary comment
@@ -23,7 +23,7 @@ metadata: | |||
operatorframework.io/suggested-namespace: openshift-windows-machine-config-operator | |||
operators.openshift.io/valid-subscription: '["Red Hat OpenShift support for Windows | |||
Containers"]' | |||
operators.operatorframework.io/builder: operator-sdk-v1.32.0+git | |||
operators.operatorframework.io/builder: operator-sdk-v1.32.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can remove this, if not required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required but is there any reason for the +git
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are generated when running make bundle, and need to be reverted every time. Make bundle removes the +git.
@@ -152,6 +153,9 @@ func (nc *nodeConfig) Configure() error { | |||
if err := nc.createBootstrapFiles(); err != nil { | |||
return err | |||
} | |||
if err := nc.createRegistryConfigFiles(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are creating the Containerd directory inside nc.Windows.Boostrap(), why don't we call this function after it is created? If needed, we can also split the bootstrap function into two. Let me know if there is another reason for doing this here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thoughts here were that createBootstrapFiles() sets up files needed for kubelet and createRegistryConfigFiles() basically does the same for containerd (and containerd is a pre-req service to kubelet) so the funcs should be called together
we are creating the Containerd directory inside nc.Windows.Boostrap()
This is a good point though, I will explore your suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this more and decided to keep it where it is (though I can still be swayed)
Additional reasoning was
- this way the registry config is created before containerd service actually starts so it doesn't have to do any additional config loading during runtime
- createRegistryConfigFiles() creates the required directories if needed so no issue
pkg/nodeconfig/nodeconfig.go
Outdated
@@ -370,6 +374,15 @@ func (nc *nodeConfig) write(pathToData map[string]string) error { | |||
return nil | |||
} | |||
|
|||
// createBootstrapFiles creates all files on the node required for containerd to mirror images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc needs update
@@ -391,6 +404,26 @@ func (vm *windows) FileExists(path, checksum string) (bool, error) { | |||
return false, nil | |||
} | |||
|
|||
func (vm *windows) ReplaceDir(files map[string][]byte, remoteDir string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc'd in the Windows interface
617ac10
to
8e74444
Compare
c914cc9
to
75cf57f
Compare
/retest-required |
/test remaining-required |
1 similar comment
/test remaining-required |
dcd1dd0
to
8fa3636
Compare
/test remaining-required |
/test aws-e2e-operator |
test/e2e/mirror_test.go
Outdated
badPowershellImageLocation = "test.registry.io/powershell" | ||
) | ||
|
||
// numBaseRegistryConfigFiles represents the number registry config files that exist before this test suite's steps run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of*
test/e2e/mirror_test.go
Outdated
require.NoError(t, err, "unable to get node address") | ||
|
||
// retry to give time for registry controller to transfer generated config to all nodes | ||
err = wait.PollUntilContextTimeout(context.TODO(), 15*time.Second, 5*time.Minute, true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't the retry time same as retry.Interval?
pkg/windows/windows.go
Outdated
if err = vm.interact.transferFiles(sftpClient, files, remoteDir); err != nil { | ||
return err | ||
} | ||
return sftpClient.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the client be closed even if we return error in between? This also applies to a few other places in this commit.
This commit transfers containerd mirror registry config files generated by WMCO to each Windows instance based on the cluster's global image mirroring settings. This is done both as part of node configuration and when IDMS/ITMSs change. The containerd daemon picks up the registry config without requiring restart. Edits the way file transfers work for easier reuse of FTP clients.
Includes tests to verify registry config file creation on Windows nodes when cluster mirror registry settings are added, and file deletion when cluster settings are cleared. Mirror setting clearing is tested in the delete suite. This is done to avoid a timing issue in the e2e test suite where we were doing too many drain-causing actions too quickly. Adding/removing registry config to the cluster can cause linux nodes to drain which continuously disrupted the WMCO pod while it was trying to reconcile.
Adds an explicit predicate to react to all events on IDMS/ITMS resources in case the default behavior controller manager watch before changes.
8fa3636
to
714564b
Compare
/lgtm |
/lgtm @saifshaikh48 please fetch and rebase the latest from master before triggering the rest of the tests. |
@jrvaldes Why is rebasing against master necessary here? Tests are always run against the latest master with the current PR applied? |
/test remaining-required |
/cherry-pick release-4.16 |
@saifshaikh48: once the present PR merges, I will cherry-pick it on top of release-4.16 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
No need, just a nice to have. Lately, I'm noticing the |
/test aws-e2e-operator |
/test aws-e2e-operator ipi-install failed |
/test aws-e2e-operator |
@saifshaikh48: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@saifshaikh48: new pull request created: #2199 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR transfers containerd registry config files generated by WMCO to
each Windows instance based on the cluster's global image mirroring settings,
allowing containerd to pull Windows images from user-defined mirror locations.
The containerd daemon picks up the registry config without requiring restart.