Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1881703: Revert https://github.com/openshift/machine-config-operator/pull/1792 #2126

Merged
merged 1 commit into from
Oct 2, 2020

Conversation

runcom
Copy link
Member

@runcom runcom commented Sep 29, 2020

This PR just reverts Manage the ignition stub config #1792
More information can be found at https://docs.google.com/document/d/1TnnfPgFim-e895MD0msOGN9tyPBk3CM8cmSewFmgzDQ/edit#

TL;DR; users can customize the pointer ignition config adding files and whatever ignition supports. We weren't aware of that and assumed only the MCS endpoint and the cert would be managed by the installer - given where we are and the assessment that it's safe to just revert, let's do it and re-think the work in light of the new findings (as it'll require a translation as well..)

Signed-off-by: Antonio Murdaca [email protected]

@openshift-ci-robot
Copy link
Contributor

@runcom: An error was encountered adding this pull request to the external tracker bugs for bug 1881703 on the Bugzilla server at https://bugzilla.redhat.com:

JSONRPC error 32000: There was an error reported for a GitHub REST call. URL: https://api.github.com/repos/openshift/machine-config-operator/pulls/2126 Error: 403 Forbidden at /loader/0x5589446ba9a8/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 111. at /loader/0x5589446ba9a8/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 111. eval {...} called at /loader/0x5589446ba9a8/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 98 Bugzilla::Extension::ExternalBugs::Type::GitHub::_do_rest_call('Bugzilla::Extension::ExternalBugs::Type::GitHub=HASH(0x55894c...', 'https://api.github.com/repos/openshift/machine-config-operato...', 'GET') called at /loader/0x5589446ba9a8/Bugzilla/Extension/ExternalBugs/Type/GitHub.pm line 62 Bugzilla::Extension::ExternalBugs::Type::GitHub::get_data('Bugzilla::Extension::ExternalBugs::Type::GitHub=HASH(0x55894c...', 'Bugzilla::Extension::ExternalBugs::Bug=HASH(0x55894d3d92b8)') called at /loader/0x5589446ba9a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 302 eval {...} called at /loader/0x5589446ba9a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 302 Bugzilla::Extension::ExternalBugs::Bug::update_ext_info('Bugzilla::Extension::ExternalBugs::Bug=HASH(0x55894d3d92b8)', 1) called at /loader/0x5589446ba9a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 125 Bugzilla::Extension::ExternalBugs::Bug::create('Bugzilla::Extension::ExternalBugs::Bug', 'HASH(0x55894da29f00)') called at /var/www/html/bugzilla/extensions/ExternalBugs/Extension.pm line 877 Bugzilla::Extension::ExternalBugs::bug_start_of_update('Bugzilla::Extension::ExternalBugs=HASH(0x55894cb7c808)', 'HASH(0x55894da03d50)') called at /var/www/html/bugzilla/Bugzilla/Hook.pm line 21 Bugzilla::Hook::process('bug_start_of_update', 'HASH(0x55894da03d50)') called at /var/www/html/bugzilla/Bugzilla/Bug.pm line 1170 Bugzilla::Bug::update('Bugzilla::Bug=HASH(0x55894c6eb600)') called at /loader/0x5589446ba9a8/Bugzilla/Extension/ExternalBugs/WebService.pm line 88 Bugzilla::Extension::ExternalBugs::WebService::add_external_bug('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55894da92908)') called at (eval 3267) line 1 eval ' $procedure->{code}->($self, @params) ;' called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 220 JSON::RPC::Legacy::Server::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55894d473fb8)') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 295 Bugzilla::WebService::Server::JSONRPC::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55894d473fb8)') called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 126 JSON::RPC::Legacy::Server::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 70 Bugzilla::WebService::Server::JSONRPC::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/jsonrpc.cgi line 31 ModPerl::ROOT::Bugzilla::ModPerl::ResponseHandler::var_www_html_bugzilla_jsonrpc_2ecgi::handler('Apache2::RequestRec=SCALAR(0x55894ce8d068)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207 eval {...} called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207 ModPerl::RegistryCooker::run('Bugzilla::ModPerl::ResponseHandler=HASH(0x55894d336eb8)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 173 ModPerl::RegistryCooker::default_handler('Bugzilla::ModPerl::ResponseHandler=HASH(0x55894d336eb8)') called at /usr/lib64/perl5/vendor_perl/ModPerl/Registry.pm line 32 ModPerl::Registry::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x55894ce8d068)') called at /var/www/html/bugzilla/mod_perl.pl line 139 Bugzilla::ModPerl::ResponseHandler::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x55894ce8d068)') called at (eval 3267) line 0 eval {...} called at (eval 3267) line 0
Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

In response to this:

Bug 1881703: Revert #1792

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 29, 2020
@runcom
Copy link
Member Author

runcom commented Sep 29, 2020

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Sep 29, 2020
@openshift-ci-robot
Copy link
Contributor

@runcom: This pull request references Bugzilla bug 1881703, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think given the timelines I'm +1 for the revert. Since this isn't an exact one-to-one I just want to make sure the diffs are good:

  1. We did not seem to have removed the KubeMAOSharedInformer in controller_context.go
  2. We did not revert renderAsset func typing in render.go
  3. We didn't remove one line from go.sum
  4. We didn't remove variable definitions (this is fine)
    Those are the diffs I saw

@cgwalters
Copy link
Member

TL;DR; users can customize the pointer ignition config adding files and whatever ignition supports.

Right, we do need to support that. It is actually one of the only sane ways to have per machine data (see also #1720 ).

@cgwalters
Copy link
Member

Are you proposing to revert https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/user-data-secret-managed.md fully for 4.6?

The more I look at this BZ the more I feel like being able to hand-edit the Ignition in a machineAPI managed setup was somewhat of an accidental feature. As I commented in the BZ I think they should be using MachineConfig which would transparently just work.

(Actually at a higher level I think they could be taking better advantage of all the networking enhancements in 4.6 but it looks like they're using baremetal IPI which doesn't use those yet)

@runcom
Copy link
Member Author

runcom commented Sep 29, 2020

Are you proposing to revert https://github.com/openshift/enhancements/blob/master/enhancements/machine-config/user-data-secret-managed.md fully for 4.6?

yes, while the BZ can be "fixed" by shipping an MC I'm more worried about the change in behavior that we've effectively introduced with an half-baked feature like that - we should really take into account the translation from 4.5 or just v2 and also we don't really know if users are using this but since it's kind of advertised it's a safe assumption.

The thing is there could be clusters where we would just overwrite any pointer ignition config customization done at installation so I think it's safe to revert and perfection the implementation in a future release, does it sound good?

@kikisdeliveryservice
Copy link
Contributor

+1 on removing and pushing a redo later

@cgwalters
Copy link
Member

There's definitely risk to trying to back this out at the last minute - if the issue can be addressed by having the user provide MCs, it also seems reasonable to me to mark this as a known issue and move on.

@kikisdeliveryservice
Copy link
Contributor

There's definitely risk to trying to back this out at the last minute - if the issue can be addressed by having the user provide MCs, it also seems reasonable to me to mark this as a known issue and move on.

@cgwalters just to clarify: you're saying leave it as-is and say "the supported way to do this is via a machine-config" if we receive a bz? But later fix or not ?

@runcom
Copy link
Member Author

runcom commented Sep 30, 2020

There's definitely risk to trying to back this out at the last minute - if the issue can be addressed by having the user provide MCs, it also seems reasonable to me to mark this as a known issue and move on.

right, there's definitely some risk at reverting as well after ~1 month that this is soaking into master... I think I'd be open writing something in bold that says "if you were doing this, use an MC from now on when installing a cluster" - but then, what do we do about already installed clusters? we're still screwing up their scale-up but I think we can also document a way out of that (if there's one?)?

@cgwalters
Copy link
Member

cgwalters commented Sep 30, 2020

therefore we can have clusters upgrading from 4.5 with customized pointer configuration that we’ll just leave behind, effectively breaking any scale up scenario.

Hmmm. That is indeed more of a potential problem. The BZ reporter was talking about new installs, but we also clearly want to support them upgrading in-place to 4.6.

So far we only have evidence that 4.5 baremetal users are customizing the pointer config, but not for other platforms. I strongly suspect the percentage of cloud/IaaS users (azure/gcp/aws/etc) doing this is effectively zero.

@cgwalters just to clarify: you're saying leave it as-is and say "the supported way to do this is via a machine-config" if we receive a bz? But later fix or not ?

Honestly I'm mostly arguing for a bit more consideration/debate, I just want to be sure we have explored the alternatives and weighed the drawbacks a bit more.

To clarify: if after this the rest of the team is still +1 on reverting, that's fine by me!

@cgwalters
Copy link
Member

cgwalters commented Sep 30, 2020

So if we do the revert...it means that the MCO won't manage the worker userdata, which shouldn't break anything in new 4.6 installs or upgrades, all that we have lost is future support for updating bootimages per

when we'll grow the ability to update bootimages, the new bootimages will understand v3 and they'll grab the managed secret which is v3

from the enhancement. Right?

@runcom
Copy link
Member Author

runcom commented Sep 30, 2020

Honestly I'm mostly arguing for a bit more consideration/debate, I just want to be sure we have explored the alternatives and weighed the drawbacks a bit more.

oh yeah, I think I didn't mean to push the revert (and sorry if I did) - I opened the reverts to have something immediately actionable given where we are time-wise in the release - let me put an hold on this meanwhile we discuss

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2020
@runcom
Copy link
Member Author

runcom commented Sep 30, 2020

So if we do the revert...it means that the MCO won't manage the worker userdata, which shouldn't break anything in new 4.6 installs or upgrades, all that we have lost is future support for updating bootimages per

yes, when doing the bootimages upgrade feature, we'd need to re-take this into account

@cgwalters
Copy link
Member

OK I'm leaning towards revert indeed, let's just be "on point" to make sure that everything works afterwards.

On this specific PR - we could also just comment out the code doing the secret sync right? That'd make it less of a conflict-fest to reintroduce later in 4.7. But I'm fine as is too.

@yuqi-zhang
Copy link
Contributor

I'm still +1 on the revert here too. Just to make sure:

  1. comments in Bug 1881703: Revert https://github.com/openshift/machine-config-operator/pull/1792 #2126 (review)
  2. we should revert machine-api first, then installer, then this

@yuqi-zhang
Copy link
Contributor

cross linking openshift/machine-api-operator#715 and openshift/installer#4228

machine-api is merged, I will approve installer, and we can lgtm this after we're sure no regressions happen

Signed-off-by: Antonio Murdaca <[email protected]>
@runcom
Copy link
Member Author

runcom commented Oct 1, 2020

  • We did not seem to have removed the KubeMAOSharedInformer in controller_context.go

removed

  • We did not revert renderAsset func typing in render.go

this shouldn't affect anything, we use the type as interface and it made more sense to type the func with that - reverted tho

  • We didn't remove one line from go.sum

I've run make go-deps trusting it did all the things so that won't affect runtime right?

  • We didn't remove variable definitions (this is fine)
    Those are the diffs I saw

should be ok to go!

@yuqi-zhang
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 1, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: runcom, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@runcom
Copy link
Member Author

runcom commented Oct 2, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 2, 2020
@runcom
Copy link
Member Author

runcom commented Oct 2, 2020

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit d2a2e4f into openshift:master Oct 2, 2020
@openshift-ci-robot
Copy link
Contributor

@runcom: All pull requests linked via external trackers have merged:

Bugzilla bug 1881703 has been moved to the MODIFIED state.

In response to this:

Bug 1881703: Revert #1792

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants