ODH release automation #988

AjayJagan · 2024-04-26T14:42:39Z

A set of github actions/ scripts to automate the entire release process in github.

Description

This change enables us to automate the odh release process through a set of github actions and scripts.
A detailed explanation can be found here.

openshift-ci · 2024-04-26T14:42:45Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

AjayJagan · 2024-04-26T14:43:42Z

The idea here is to create a draft PR so that we can start the review process and also find any misfits in the code.

AjayJagan · 2024-04-26T14:46:25Z

.github/workflows/release-gh-publish.yaml

+ uses: peter-evans/create-pull-request@v6
+ with:
+ path: ./community-operators-prod
+ token: <PAT> # We need a token with repo rights


Here, for some reason the token generated by github app does not work. I created an issue in the library: peter-evans/create-pull-request#2848 .
Also I created a shell script to mimic the pr creation process, but unfortunately even that flow works with a PAT and not with a token generated from GH app. Any suggestions/help here is much appreciated.

the PAT to be used needs the repo scope.

bartoszmajsak · 2024-04-26T16:24:11Z

Happy to review it once I'm back in a week. Assuming it can wait.

VaishnaviHire · 2024-04-26T17:06:47Z

@AjayJagan Feel free to also add example tracker here for testing

AjayJagan · 2024-04-26T17:13:20Z

@AjayJagan Feel free to also add example tracker here for testing

AjayJagan/notes-operator#2

zdtsw · 2024-04-29T13:46:55Z

.github/scripts/wait-for-checks.sh

+
+while $(gh pr checks "$1" | grep -q -v 'tide' | grep -q 'pending'); do
+ printf ":stopwatch: PR checks still pending, retrying in 10 seconds...\n"
+ sleep 10


to wait for e2e test in openshift-ci done normally take more than 1hr, so we really want to have a 10s sleep to print 400+ lines of retrying?

also i think search for 'pending' as keyword to identify if it is under test, is not accurate. if you use any real PR from opendatahub-io (not your mock PR) you will see even the green PR has pending returned.

true. We could increase the interval to 10 minutes once or so.
I checked the gh pr checks for #991
and here is what I get
for gh pr checks 991 -R opendatahub-io/opendatahub-operator | grep pending

I get

tide pending 0 https://prow.ci.openshift.org/pr?query=is%3Apr+repo%3Aopendatahub-io%2Fopendatahub-operator+author%3Azdtsw+head%3Ajira%2F443_3 Not mergeable. Needs approved, lgtm labels.

so I though we could do an inverse of tide(which will remove tide from the checks) and grep for pending... if present we could sleep and repeat again

checking pr #992 which is currently running checks
for gh pr checks 992 -R opendatahub-io/opendatahub-operator | grep -v tide | grep pending

I get

ci/prow/opendatahub-operator-e2e pending 0 https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_opendatahub-operator/992/pull-ci-opendatahub-io-opendatahub-operator-incubation-opendatahub-operator-e2e/1785239301773070336 Job triggered. BaseSHA:ec07da31c18b26885daa006c964ae3ea450a0e6f```

the other way we could handle this is, manually wait for the checks to complete and a pr close with a particular comment should trigger it?

zdtsw · 2024-04-29T14:04:55Z

.github/workflows/release-branch.yaml

@@ -0,0 +1,31 @@
+name: "Release: Create release branch"


this workflow basically is
triggered by a closed PR from "dry run" and create one branch then new PR
why not combine it with the next one release-gh-publish into one?
plus: the name for the workflow is not accurate to what it really does

yea maybe we could use a different naming 😅
But the idea is after creating a new branch with the changes, we wait for every checks to finish manually. So I did this because, if in case there is a failure, then we can stop at this point and prevent a release in github

zdtsw · 2024-04-29T14:11:27Z

.github/workflows/release-gh-publish.yaml

+ pat-token: ${{ steps.generate-token.outputs.token }}
+ - name: Create and push version tags
+ run: |
+ git config --global user.email 41898282+github-actions[bot]@users.noreply.github.com


does "gh label create -d " works in this case?

checking now

so after taking a look. here we only tag right? Or do we create labels as well. AFAIK there is no gh cli way to create tags. Let me know if I am wrong here.

zdtsw · 2024-04-29T14:20:11Z

.github/workflows/release-test-e2e.yaml

+ - name: Create test release pr
+ uses: ./.github/actions/create-release-pr
+ with:
+ pr-branch: "odh-release/e2e-test"


should this pr-branch map to a odh-relase-${{ env.VERSION }}/e2e-test ?
otherwise just hardcoded no need pass down to action

so the create-release-pr is a reusable action used in 2 places. That is why it passes the pr-branch name.

zdtsw · 2024-04-29T14:39:30Z

Makefile

@@ -236,7 +236,7 @@ KUSTOMIZE_INSTALL_SCRIPT ?= "https://raw.githubusercontent.com/kubernetes-sigs/k
 .PHONY: kustomize
 kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary.
 $(KUSTOMIZE): $(LOCALBIN)
- test -s $(KUSTOMIZE) || { curl -s $(KUSTOMIZE_INSTALL_SCRIPT) | sh -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }
+ test -s $(KUSTOMIZE) || { curl -s $(KUSTOMIZE_INSTALL_SCRIPT) | bash -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }


any reason to change this? dont think we even use make in the PR

hmm now I dont remember why it broke and I had to change it. Let me check this once

zdtsw · 2024-04-29T15:07:58Z

.github/workflows/release-update-versions.yaml

+ sed -i -e "s|createdAt.*|createdAt: \"$(date +"%Y-%-m-%dT00:00:00Z")\"|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
+ sed -i -e "s|name: opendatahub-operator.v.*|name: opendatahub-operator.v$NEW_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
+ sed -i -e "s|version: $CURRENT_VERSION.*|version: $NEW_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
+ sed -i -e "s|replaces.*|replaces: opendatahub-operator.v$CURRENT_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml


dont think we are still using replaces.

will remove it 👍

zdtsw · 2024-04-29T15:10:08Z

.github/workflows/release-update-versions.yaml

+ sed -i -e "s|name: opendatahub-operator.v.*|name: opendatahub-operator.v$NEW_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
+ sed -i -e "s|version: $CURRENT_VERSION.*|version: $NEW_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
+ sed -i -e "s|replaces.*|replaces: opendatahub-operator.v$CURRENT_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
+ sed -i -e "s|olm.skipRange:.*|olm.skipRange: \'>=$CURRENT_VERSION <$NEW_VERSION\'|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml


if we are following
olm.skipRange: '>=1.0.0 <$NEW_VERSION'
we wont need get CURRENT_VERSION at all

agreed 👍

zdtsw · 2024-04-29T15:15:49Z

Makefile

@@ -236,7 +236,7 @@ KUSTOMIZE_INSTALL_SCRIPT ?= "https://raw.githubusercontent.com/kubernetes-sigs/k
 .PHONY: kustomize
 kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary.
 $(KUSTOMIZE): $(LOCALBIN)
- test -s $(KUSTOMIZE) || { curl -s $(KUSTOMIZE_INSTALL_SCRIPT) | sh -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }
+ test -s $(KUSTOMIZE) || { curl -s $(KUSTOMIZE_INSTALL_SCRIPT) | bash -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }


is that because ubuntu latest does ont have sh but only bash?

yes that is the reason

the error is

test -s /home/runner/work/opendatahub-operator/opendatahub-operator/bin/kustomize || { curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | sh -s -- 3.8.7 /home/runner/work/opendatahub-operator/opendatahub-operator/bin; } sh: 33: Syntax error: "(" unexpected (expecting "then")

That is actually ok. The script has bash's shebang so bash should be used to execute it. Debians by default use dash as the system shell which is pretty much posix

zdtsw · 2024-04-29T15:25:23Z

.github/workflows/release-gh-publish.yaml

@@ -0,0 +1,88 @@
+name: "Release: GH and operatorhub publish"
+on:
+ pull_request:


maybe i missed something, does previous "release-branch" runs any test?
or how it gets closed ? what if unit-test or linter failed?

so when a pr is closed(after we wait for the checks), and it has a particular title ie.
if: github.event.pull_request.merged && startsWith(github.event.pull_request.title, 'ODH') && endsWith(github.event.pull_request.title, 'Release')
Then this pr is triggered

zdtsw · 2024-04-29T15:39:40Z

.github/actions/set-shared-env/action.yaml

@@ -0,0 +1,40 @@
+name: "Set Shared env vars"


the whole idea to have this is to get the version and track url among all jobs?

but then what happens: we have a 2.10.0 release, due to certain reason it failed in e2e test for days
and we have to make a 2.10.1 to fix some bugs. => 2 release at the same time running workflow.
after this PATCH from 2.10.1 runs, it set the VERSION to 2.10.1 + new tracker URI
when workflow for 2.10.0 runs, it reads out 2.10.1 ?

zdtsw · 2024-04-30T09:52:58Z

before going to update all the small detail implementation comments, I would like to have an overall understanding for the solution.
several parts bother from yesterday's review:

the only reason I see we have multiple workflow is 1)we need a dry-run PR to check the changes with e2e tests 2)we need the formal PR with release branch and updated new version. thus, i do not know why we have split the 2nd one into 3 flows. if the plan is one flow should only start after a successful run of the other flow, cannot this be done in side of the same one but as different jobs? isnt that much easier to pass env variable or even use steps' outputs?
another one is, this shared-env part. I still would like to know why exactly chose to use do it this way.
if the dry-run workflow has inputs of version + trackerURL, cannot these be added into the PR or commit message? since you are already checking if the content of such in order to "start job" or not, these should be easier to get rather than manipulate(get mostly) actions variables

AjayJagan · 2024-04-30T10:02:06Z

before going to update all the small detail implementation comments, I would like to have an overall understanding for the solution. several parts bother from yesterday's review:

the only reason I see we have multiple workflow is 1)we need a dry-run PR to check the changes with e2e tests 2)we need the formal PR with release branch and updated new version. thus, i do not know why we have split the 2nd one into 3 flows. if the plan is one flow should only start after a successful run of the other flow, cannot this be done in side of the same one but as different jobs? isnt that much easier to pass env variable or even use steps' outputs?

another one is, this shared-env part. I still would like to know why exactly chose to use do it this way.
if the dry-run workflow has inputs of version + trackerURL, cannot these be added into the PR or commit message? since you are already checking if the content of such in order to "start job" or not, these should be easier to get rather than manipulate(get mostly) actions variables

So if I understand this correct, there can be 1 workflow with 3 jobs.
job 1 -> will do the dry run(if failed exit immediately)
job 2 -> will create the real pr for updating the manifests
job 3 -> will create the version updates
hmm this looks better and yea it removes the headache of sharing the env vars

I did not think in this angle #988 (comment). So I think shared env is not the way to do it. I can create a comment with a particular message and read from it. That would make things much easier.

Considering the above changes, can I start the work? let me know if this is okay
And thanks for the suggestions.

AjayJagan · 2024-05-02T03:56:47Z

Hey @zdtsw , I have changed to workflows and reduced them to 2 workflows with multiple jobs. Also I have made some changes in the code. Let me know if it looks good or it can be made better. Thanks :)

ykaliuta · 2024-05-02T07:15:03Z

.github/scripts/update-manifests-tags.sh

+set -euo pipefail
+
+update_tags(){
+MANIFEST_STR=$(cat get_all_manifests.sh | grep $1 | sed 's/ //g')


Does the function do

sed -i -r "/$1/s|([^:]*):([^:]*):[^:]*:(.*)|\1:\2:$2:\3|" get_all_manifests.sh

?

ykaliuta · 2024-05-02T07:16:42Z

.github/scripts/update-manifests-tags.sh

+}
+
+declare -A COMPONENT_VERSION_MAP=(
+ ["\"codeflare\""]=$1


doesn't just [codelare] work for you?

ykaliuta · 2024-05-02T07:22:26Z

.github/scripts/update-manifests-tags.sh

+ ["\"odh-model-controller\""]=$11
+ ["\"kserve\""]=$12
+ ["\"modelregistry\""]=$13
+)


Looks like we will have to sync get_all_manifests.sh with the script for new components. I do not have nice solution in my mind at the moment, will think about that.

.github/scripts/wait-for-checks.sh

.github/workflows/release.yaml

.github/workflows/pre-release.yaml

.github/workflows/release.yaml

AjayJagan · 2024-05-04T01:34:04Z

/retest

ykaliuta · 2024-05-06T06:12:21Z

.github/scripts/wait-for-checks.sh


-printf "!!An unknown error occurred!!\n"
+pr_has_status $1 fail && { echo "!!PR checks failed!!"; exit 1; }


I'm wondering, since gh pr checks has exit status 0 (success) when there are no failed tests and non-0 otherwise, if it's sufficient here to just run it without extra logic? Probably, "unknown error" or "failed checks" will be clear from the log?

VaishnaviHire · 2024-05-06T13:36:31Z

I think we should not block this for operatorhub PR automation.
Lets have the automation setup for ODH release and have a follow-up PR to update operatorhub changes.

AjayJagan · 2024-05-07T06:41:03Z

I think we should not block this for operatorhub PR automation. Lets have the automation setup for ODH release and have a follow-up PR to update operatorhub changes.

@VaishnaviHire, with the secrets set for quay and gh app, we should be good to test this :)

.github/scripts/validate-semver.sh

.github/workflows/pre-release.yaml

VaishnaviHire

/lgtm

openshift-ci · 2024-05-22T21:04:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VaishnaviHire

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [VaishnaviHire]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* ODH release automation * improve shell scripts * disable community operators pr creation * move version update to separate script

openshift-ci bot added the do-not-merge/work-in-progress label Apr 26, 2024

AjayJagan commented Apr 26, 2024

View reviewed changes

AjayJagan force-pushed the release-automation-odh branch 3 times, most recently from 48d338e to 26c8e11 Compare April 26, 2024 14:49

AjayJagan requested review from bartoszmajsak, zdtsw, ykaliuta and VaishnaviHire April 26, 2024 14:50

zdtsw reviewed Apr 29, 2024

View reviewed changes

AjayJagan force-pushed the release-automation-odh branch 2 times, most recently from 5af74ff to 1a20447 Compare May 1, 2024 07:48

ykaliuta reviewed May 2, 2024

View reviewed changes

AjayJagan force-pushed the release-automation-odh branch from ec0484a to 7018a10 Compare May 3, 2024 07:31

AjayJagan commented May 3, 2024

View reviewed changes

.github/workflows/pre-release.yaml Outdated Show resolved Hide resolved

AjayJagan commented May 3, 2024

View reviewed changes

.github/workflows/release.yaml Outdated Show resolved Hide resolved

AjayJagan commented May 3, 2024

View reviewed changes

.github/workflows/release.yaml Outdated Show resolved Hide resolved

ODH release automation

b64f6fe

AjayJagan force-pushed the release-automation-odh branch from 7018a10 to a935c61 Compare May 3, 2024 07:37

AjayJagan marked this pull request as ready for review May 3, 2024 07:37

openshift-ci bot removed the do-not-merge/work-in-progress label May 3, 2024

openshift-ci bot requested a review from andrewballantyne May 3, 2024 07:37

openshift-ci bot requested a review from RobGeada May 3, 2024 07:37

AjayJagan removed request for andrewballantyne and RobGeada May 3, 2024 07:37

ykaliuta reviewed May 6, 2024

View reviewed changes

improve shell scripts

28b971c

AjayJagan force-pushed the release-automation-odh branch from a935c61 to 28b971c Compare May 6, 2024 07:50

disable community operators pr creation

d1d7247

AjayJagan requested review from ykaliuta and zdtsw May 8, 2024 09:21

zdtsw reviewed May 8, 2024

View reviewed changes

.github/scripts/validate-semver.sh Show resolved Hide resolved

.github/workflows/pre-release.yaml Show resolved Hide resolved

.github/workflows/pre-release.yaml Outdated Show resolved Hide resolved

move version update to separate script

60d66ae

VaishnaviHire approved these changes May 22, 2024

View reviewed changes

openshift-ci bot assigned VaishnaviHire May 22, 2024

openshift-ci bot added the lgtm label May 22, 2024

openshift-ci bot added the approved label May 22, 2024

openshift-merge-bot bot merged commit 660e824 into opendatahub-io:incubation May 22, 2024
8 checks passed


		printf "!!An unknown error occurred!!\n"
		pr_has_status $1 fail && { echo "!!PR checks failed!!"; exit 1; }

ODH release automation #988

ODH release automation #988

Conversation

AjayJagan commented Apr 26, 2024

Description

openshift-ci bot commented Apr 26, 2024

AjayJagan commented Apr 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartoszmajsak commented Apr 26, 2024

VaishnaviHire commented Apr 26, 2024

AjayJagan commented Apr 26, 2024

Choose a reason for hiding this comment

AjayJagan Apr 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AjayJagan May 1, 2024 • edited Loading

Choose a reason for hiding this comment

ykaliuta May 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zdtsw commented Apr 30, 2024

AjayJagan commented Apr 30, 2024

AjayJagan commented May 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AjayJagan commented May 4, 2024

ykaliuta May 6, 2024 • edited Loading

Choose a reason for hiding this comment

VaishnaviHire commented May 6, 2024

AjayJagan commented May 7, 2024

VaishnaviHire left a comment

Choose a reason for hiding this comment

openshift-ci bot commented May 22, 2024

AjayJagan Apr 30, 2024 •

edited

Loading

AjayJagan May 1, 2024 •

edited

Loading

ykaliuta May 1, 2024 •

edited

Loading

ykaliuta May 6, 2024 •

edited

Loading