Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ODH release automation #988

Merged

Conversation

AjayJagan
Copy link
Contributor

A set of github actions/ scripts to automate the entire release process in github.

Description

This change enables us to automate the odh release process through a set of github actions and scripts.
A detailed explanation can be found here.

Copy link

openshift-ci bot commented Apr 26, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@AjayJagan
Copy link
Contributor Author

The idea here is to create a draft PR so that we can start the review process and also find any misfits in the code.

uses: peter-evans/create-pull-request@v6
with:
path: ./community-operators-prod
token: <PAT> # We need a token with repo rights
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, for some reason the token generated by github app does not work. I created an issue in the library: peter-evans/create-pull-request#2848 .
Also I created a shell script to mimic the pr creation process, but unfortunately even that flow works with a PAT and not with a token generated from GH app. Any suggestions/help here is much appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PAT to be used needs the repo scope.

@AjayJagan AjayJagan force-pushed the release-automation-odh branch 3 times, most recently from 48d338e to 26c8e11 Compare April 26, 2024 14:49
@bartoszmajsak
Copy link
Contributor

Happy to review it once I'm back in a week. Assuming it can wait.

@VaishnaviHire
Copy link
Member

@AjayJagan Feel free to also add example tracker here for testing

@AjayJagan
Copy link
Contributor Author

@AjayJagan Feel free to also add example tracker here for testing

AjayJagan/notes-operator#2


while $(gh pr checks "$1" | grep -q -v 'tide' | grep -q 'pending'); do
printf ":stopwatch: PR checks still pending, retrying in 10 seconds...\n"
sleep 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to wait for e2e test in openshift-ci done normally take more than 1hr, so we really want to have a 10s sleep to print 400+ lines of retrying?

also i think search for 'pending' as keyword to identify if it is under test, is not accurate. if you use any real PR from opendatahub-io (not your mock PR) you will see even the green PR has pending returned.

Copy link
Contributor Author

@AjayJagan AjayJagan Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. We could increase the interval to 10 minutes once or so.
I checked the gh pr checks for #991
and here is what I get
for gh pr checks 991 -R opendatahub-io/opendatahub-operator | grep pending

I get

tide pending 0 https://prow.ci.openshift.org/pr?query=is%3Apr+repo%3Aopendatahub-io%2Fopendatahub-operator+author%3Azdtsw+head%3Ajira%2F443_3 Not mergeable. Needs approved, lgtm labels.

so I though we could do an inverse of tide(which will remove tide from the checks) and grep for pending... if present we could sleep and repeat again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking pr #992 which is currently running checks
for gh pr checks 992 -R opendatahub-io/opendatahub-operator | grep -v tide | grep pending

I get

ci/prow/opendatahub-operator-e2e	pending	0	https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/opendatahub-io_opendatahub-operator/992/pull-ci-opendatahub-io-opendatahub-operator-incubation-opendatahub-operator-e2e/1785239301773070336	Job triggered.                     BaseSHA:ec07da31c18b26885daa006c964ae3ea450a0e6f```

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other way we could handle this is, manually wait for the checks to complete and a pr close with a particular comment should trigger it?

@@ -0,0 +1,31 @@
name: "Release: Create release branch"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this workflow basically is
triggered by a closed PR from "dry run" and create one branch then new PR
why not combine it with the next one release-gh-publish into one?
plus: the name for the workflow is not accurate to what it really does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea maybe we could use a different naming 😅
But the idea is after creating a new branch with the changes, we wait for every checks to finish manually. So I did this because, if in case there is a failure, then we can stop at this point and prevent a release in github

pat-token: ${{ steps.generate-token.outputs.token }}
- name: Create and push version tags
run: |
git config --global user.email 41898282+github-actions[bot]@users.noreply.github.com
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "gh label create -d " works in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so after taking a look. here we only tag right? Or do we create labels as well. AFAIK there is no gh cli way to create tags. Let me know if I am wrong here.

- name: Create test release pr
uses: ./.github/actions/create-release-pr
with:
pr-branch: "odh-release/e2e-test"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this pr-branch map to a odh-relase-${{ env.VERSION }}/e2e-test ?
otherwise just hardcoded no need pass down to action

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the create-release-pr is a reusable action used in 2 places. That is why it passes the pr-branch name.

@@ -236,7 +236,7 @@ KUSTOMIZE_INSTALL_SCRIPT ?= "https://raw.githubusercontent.com/kubernetes-sigs/k
.PHONY: kustomize
kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary.
$(KUSTOMIZE): $(LOCALBIN)
test -s $(KUSTOMIZE) || { curl -s $(KUSTOMIZE_INSTALL_SCRIPT) | sh -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }
test -s $(KUSTOMIZE) || { curl -s $(KUSTOMIZE_INSTALL_SCRIPT) | bash -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to change this? dont think we even use make in the PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm now I dont remember why it broke and I had to change it. Let me check this once

sed -i -e "s|createdAt.*|createdAt: \"$(date +"%Y-%-m-%dT00:00:00Z")\"|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
sed -i -e "s|name: opendatahub-operator.v.*|name: opendatahub-operator.v$NEW_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
sed -i -e "s|version: $CURRENT_VERSION.*|version: $NEW_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
sed -i -e "s|replaces.*|replaces: opendatahub-operator.v$CURRENT_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont think we are still using replaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove it 👍

sed -i -e "s|name: opendatahub-operator.v.*|name: opendatahub-operator.v$NEW_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
sed -i -e "s|version: $CURRENT_VERSION.*|version: $NEW_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
sed -i -e "s|replaces.*|replaces: opendatahub-operator.v$CURRENT_VERSION|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
sed -i -e "s|olm.skipRange:.*|olm.skipRange: \'>=$CURRENT_VERSION <$NEW_VERSION\'|g" config/manifests/bases/opendatahub-operator.clusterserviceversion.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are following
olm.skipRange: '>=1.0.0 <$NEW_VERSION'
we wont need get CURRENT_VERSION at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed 👍

@@ -236,7 +236,7 @@ KUSTOMIZE_INSTALL_SCRIPT ?= "https://raw.githubusercontent.com/kubernetes-sigs/k
.PHONY: kustomize
kustomize: $(KUSTOMIZE) ## Download kustomize locally if necessary.
$(KUSTOMIZE): $(LOCALBIN)
test -s $(KUSTOMIZE) || { curl -s $(KUSTOMIZE_INSTALL_SCRIPT) | sh -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }
test -s $(KUSTOMIZE) || { curl -s $(KUSTOMIZE_INSTALL_SCRIPT) | bash -s -- $(subst v,,$(KUSTOMIZE_VERSION)) $(LOCALBIN); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that because ubuntu latest does ont have sh but only bash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is the reason

Copy link
Contributor Author

@AjayJagan AjayJagan May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error is

test -s /home/runner/work/opendatahub-operator/opendatahub-operator/bin/kustomize || { curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | sh -s -- 3.8.7 /home/runner/work/opendatahub-operator/opendatahub-operator/bin; }
sh: 33: Syntax error: "(" unexpected (expecting "then")

Copy link
Contributor

@ykaliuta ykaliuta May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is actually ok. The script has bash's shebang so bash should be used to execute it. Debians by default use dash as the system shell which is pretty much posix

@@ -0,0 +1,88 @@
name: "Release: GH and operatorhub publish"
on:
pull_request:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe i missed something, does previous "release-branch" runs any test?
or how it gets closed ? what if unit-test or linter failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so when a pr is closed(after we wait for the checks), and it has a particular title ie.
if: github.event.pull_request.merged && startsWith(github.event.pull_request.title, 'ODH') && endsWith(github.event.pull_request.title, 'Release')
Then this pr is triggered

@@ -0,0 +1,40 @@
name: "Set Shared env vars"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the whole idea to have this is to get the version and track url among all jobs?

but then what happens: we have a 2.10.0 release, due to certain reason it failed in e2e test for days
and we have to make a 2.10.1 to fix some bugs. => 2 release at the same time running workflow.
after this PATCH from 2.10.1 runs, it set the VERSION to 2.10.1 + new tracker URI
when workflow for 2.10.0 runs, it reads out 2.10.1 ?

@zdtsw
Copy link
Member

zdtsw commented Apr 30, 2024

before going to update all the small detail implementation comments, I would like to have an overall understanding for the solution.
several parts bother from yesterday's review:

  • the only reason I see we have multiple workflow is 1)we need a dry-run PR to check the changes with e2e tests 2)we need the formal PR with release branch and updated new version. thus, i do not know why we have split the 2nd one into 3 flows. if the plan is one flow should only start after a successful run of the other flow, cannot this be done in side of the same one but as different jobs? isnt that much easier to pass env variable or even use steps' outputs?
  • another one is, this shared-env part. I still would like to know why exactly chose to use do it this way.
    if the dry-run workflow has inputs of version + trackerURL, cannot these be added into the PR or commit message? since you are already checking if the content of such in order to "start job" or not, these should be easier to get rather than manipulate(get mostly) actions variables

@AjayJagan
Copy link
Contributor Author

before going to update all the small detail implementation comments, I would like to have an overall understanding for the solution. several parts bother from yesterday's review:

  • the only reason I see we have multiple workflow is 1)we need a dry-run PR to check the changes with e2e tests 2)we need the formal PR with release branch and updated new version. thus, i do not know why we have split the 2nd one into 3 flows. if the plan is one flow should only start after a successful run of the other flow, cannot this be done in side of the same one but as different jobs? isnt that much easier to pass env variable or even use steps' outputs?
  • another one is, this shared-env part. I still would like to know why exactly chose to use do it this way.
    if the dry-run workflow has inputs of version + trackerURL, cannot these be added into the PR or commit message? since you are already checking if the content of such in order to "start job" or not, these should be easier to get rather than manipulate(get mostly) actions variables

So if I understand this correct, there can be 1 workflow with 3 jobs.
job 1 -> will do the dry run(if failed exit immediately)
job 2 -> will create the real pr for updating the manifests
job 3 -> will create the version updates
hmm this looks better and yea it removes the headache of sharing the env vars

I did not think in this angle #988 (comment). So I think shared env is not the way to do it. I can create a comment with a particular message and read from it. That would make things much easier.

Considering the above changes, can I start the work? let me know if this is okay
And thanks for the suggestions.

@AjayJagan AjayJagan force-pushed the release-automation-odh branch 2 times, most recently from 5af74ff to 1a20447 Compare May 1, 2024 07:48
@AjayJagan
Copy link
Contributor Author

Hey @zdtsw , I have changed to workflows and reduced them to 2 workflows with multiple jobs. Also I have made some changes in the code. Let me know if it looks good or it can be made better. Thanks :)

set -euo pipefail

update_tags(){
MANIFEST_STR=$(cat get_all_manifests.sh | grep $1 | sed 's/ //g')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the function do

sed  -i -r "/$1/s|([^:]*):([^:]*):[^:]*:(.*)|\1:\2:$2:\3|" get_all_manifests.sh

?

}

declare -A COMPONENT_VERSION_MAP=(
["\"codeflare\""]=$1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't just [codelare] work for you?

["\"odh-model-controller\""]=$11
["\"kserve\""]=$12
["\"modelregistry\""]=$13
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we will have to sync get_all_manifests.sh with the script for new components. I do not have nice solution in my mind at the moment, will think about that.

.github/scripts/wait-for-checks.sh Outdated Show resolved Hide resolved
.github/scripts/wait-for-checks.sh Outdated Show resolved Hide resolved
.github/workflows/release.yaml Outdated Show resolved Hide resolved
@AjayJagan AjayJagan marked this pull request as ready for review May 3, 2024 07:37
@AjayJagan
Copy link
Contributor Author

/retest


printf "!!An unknown error occurred!!\n"
pr_has_status $1 fail && { echo "!!PR checks failed!!"; exit 1; }
Copy link
Contributor

@ykaliuta ykaliuta May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering, since gh pr checks has exit status 0 (success) when there are no failed tests and non-0 otherwise, if it's sufficient here to just run it without extra logic? Probably, "unknown error" or "failed checks" will be clear from the log?

@VaishnaviHire
Copy link
Member

I think we should not block this for operatorhub PR automation.
Lets have the automation setup for ODH release and have a follow-up PR to update operatorhub changes.

@AjayJagan
Copy link
Contributor Author

I think we should not block this for operatorhub PR automation. Lets have the automation setup for ODH release and have a follow-up PR to update operatorhub changes.

@VaishnaviHire, with the secrets set for quay and gh app, we should be good to test this :)

@AjayJagan AjayJagan requested review from ykaliuta and zdtsw May 8, 2024 09:21
.github/scripts/validate-semver.sh Show resolved Hide resolved
.github/workflows/pre-release.yaml Show resolved Hide resolved
.github/workflows/pre-release.yaml Outdated Show resolved Hide resolved
Copy link
Member

@VaishnaviHire VaishnaviHire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link

openshift-ci bot commented May 22, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: VaishnaviHire

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 660e824 into opendatahub-io:incubation May 22, 2024
8 checks passed
bartoszmajsak pushed a commit to bartoszmajsak/opendatahub-operator that referenced this pull request May 23, 2024
* ODH release automation

* improve shell scripts

* disable community operators pr creation

* move version update to separate script
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants