Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use controller-runtime to reconsturct spark operator #2072

Merged
merged 3 commits into from
Aug 1, 2024

Conversation

ChenYi015
Copy link
Contributor

@ChenYi015 ChenYi015 commented Jun 25, 2024

Purpose of this PR

Use controller-runtime to reconsturct spark operator.

Close #547
Close #507
Close #2052

Proposed changes

Spark Operator Executable

  • Split the operator into two separate components i.e. controller and webhook.

  • Use cobra to implement spark-operator command, and it has two sub commands controller and webhook, one can start controller/webhook by spark-operator controller start and spark-operator webhook start repectively.

    $ make build-operator
    $ bin/spark-operator --help          
    Spark operator
    
    Usage:
      spark-operator [flags]
      spark-operator [command]
    
    Available Commands:
      completion  Generate the autocompletion script for the specified shell
      controller  Spark operator controller
      help        Help about any command
      version     Print version information
      webhook     Spark operator webhook
    
    Flags:
      -h, --help   help for spark-operator
    
    Use "spark-operator [command] --help" for more information about a command.
  • SparkApplication/ScheduledSparkController controller only watches resources in the spark job namespaces.

  • Add finalizer for SparkApplication to ensure all sub resources e.g. pods, services, ingress related to that app will be deleted when deleting the SparkApplication.

Helm Chart

  • Change imagePullSecrets to image.pullSecrets.
  • All controller configurations are prefixed with controller e.g. controller.replicaCount and controller.resyncInterval
  • All webhook configurations are prefixed with webhook e.g. webhook.replicaCount and `webhook.failurePolicy.
  • All monitoring configurations are prefixed with promethues e.g. promethues.metrics and promethues.podMonitor.
  • Enable leader election for controller/webhook by default to ensure only an controller instance is working when upgrading the chart.
  • The update strategy of controller/webhook deployment will be the rolling update, not recreate.
  • Deploy webhook server with a separated deployment and enable webhook server by default.
  • The webhook secret will not be created and updated by operator, and the certificates will not change during chart upgrading/rollback process.
  • Change the default of webhook timeoutSeconds from 30 to 10.
  • Change the default of webhook failurePolicy from Ignore to Fail.
  • Change the default spark job namespace from [] to ["default], thus the SparkApplication under examples directory can be running directly without creating rbac resources manually.
  • Service account are configured with controller.serviceAccount, webhook.serviceAccount and spark.serviceAccount respectively.
  • RBAC resources are configured with controller.rbac, webhook.rbac and spark.rbac respectively.
  • logLevel will be one of info, debug and error.

Change Category

Indicate the type of change by marking the applicable boxes:

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Checklist

Before submitting your PR, please review the following:

  • [] I have conducted a self-review of my own code.
  • [] I have updated documentation accordingly.
  • [] I have added tests that prove my changes are effective or that my feature works.
  • [] Existing unit tests pass locally with my changes.

Additional Notes

@ChenYi015
Copy link
Contributor Author

/assign @yuchaoran2011
/assign @vara-bonthu

@yuchaoran2011
Copy link
Contributor

Thanks @ChenYi015 for this major refactoring! One of the referenced issues was created by a former colleague of mine 5 years ago. Glad to see finally someone take a stab at it. Given the sheer size of the changes, more eyes are needed for the review. We can use help from fellow community members. Tagging some of them here @bnetzi @peter-mcclonski @jacobsalway but anyone is welcome to comment

@jacobsalway
Copy link
Member

Happy to take a look but is there a natural way we could split this PR to make it easier to review?

@ChenYi015 ChenYi015 force-pushed the controller-runtime branch from 21cbf65 to dc8cd4f Compare July 1, 2024 13:37
@ChenYi015 ChenYi015 marked this pull request as ready for review July 1, 2024 13:39
@google-oss-prow google-oss-prow bot requested a review from andreyvelich July 1, 2024 13:39
@ChenYi015 ChenYi015 force-pushed the controller-runtime branch from dc8cd4f to 04e18a9 Compare July 1, 2024 13:41
Copy link
Contributor

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the static code review, given the major refactoring, we need to ensure all tests pass. I am not certain about our current test coverage, but we should create a new image and test it locally if reviewers have the bandwidth. This would ensure it works with common examples without errors.

We could create a pre-release tagged as 2.0 to include other improvements as well. This will give users time to migrate to the newer version since it introduces breaking changes.

@jacobsalway
Copy link
Member

Apart from the static code review, given the major refactoring, we need to ensure all tests pass. I am not certain about our current test coverage, but we should create a new image and test it locally if reviewers have the bandwidth. This would ensure it works with common examples without errors.

We could create a pre-release tagged as 2.0 to include other improvements as well. This will give users time to migrate to the newer version since it introduces breaking changes.

Not directly connected to this PR but it's probably a good idea to get some end to end tests running in CI since they're currently disabled

@ChenYi015
Copy link
Contributor Author

Let me explain what I had did. First, I use kubebuilder to scaffold the project.

mkdir spark-operator
cd spark-operator
kubebuilder init --domain sparkoperator.k8s.io --repo github.com/kubeflow/spark-operator

Then, create new API version as follows:

$ # Create v1beta1 version (create resource only)
$ kubebuilder create api --version v1beta1 --kind SparkApplication
INFO Create Resource [y/n] y
INFO Create Controller [y/n] n
$ kubebuilder create api --version v1beta1 --kind ScheduledSparkApplication
INFO Create Resource [y/n] y
INFO Create Controller [y/n] n

$ # Create v1beta2 version (create both resource and controller)
$ kubebuilder create api --version v1beta2 --kind SparkApplication
INFO Create Resource [y/n] y
INFO Create Controller [y/n] y
$ kubebuilder create api --version v1beta2 --kind ScheduledSparkApplication
INFO Create Resource [y/n] y
INFO Create Controller [y/n] y

The structure of the project is like this:

$ tree -L 1
.
├── Dockerfile
├── Makefile
├── PROJECT       # Kubebuilder use PROJECT file to hold project matadata and scaffold the project.
├── README.md
├── api
├── bin
├── cmd
├── config
├── go.mod
├── go.sum
├── hack
├── internal
└── test

Then I move the orignal API definitions from pkg/apis/v1beta1 to api/v1beta1 and pkgs/apis/v1beta2 to api/v1beta2 respectivelyls. After modifying the API definition, we can use make manifests to generate CustomResourceDefinitions to config/crd/bases directory.

@ChenYi015
Copy link
Contributor Author

We will define controllers for SparkApplication, ScheduledSparkApplication, MutaingWebhookConfiguration and ValidatingWebhokConfiguration. The controller for webhook configurations are used to patch generated CA certificates when a new mutating/validating webhook created/updated.

$ tree -d internal/controller
internal/controller
├── mutatingwebhookconfiguration
├── scheduledsparkapplication
├── sparkapplication
└── validatingwebhookconfiguration

For every controller, we need to define event filter (predicate), event handler and reconciler, for detailed information, refer Architecture Concept Diagram.

We will define webhooks for SparkApplication, ScheduledSparkApplication and Spark Pod:

internal/webhook
├── doc.go
├── scheduledsparkapplication_defaulter.go
├── scheduledsparkapplication_validator.go
├── sparkapplication_defaulter.go
├── sparkapplication_validator.go
├── sparkpod_defaulter.go
├── sparkpod_defaulter_test.go
├── suite_test.go
└── webhook.go
  • For the webhook for SparkApplication and ScheduledSparkApplication, a defaulter/validator is defined to default/validate it.
  • For the webhook for Spark pods, a defaulter is defined to patch fileds like volumes, volumesMounts and envs to driver/executor pods.

@ChenYi015
Copy link
Contributor Author

ChenYi015 commented Jul 2, 2024

I had did some tests as follows (go, docker, make are needed):

# Build operator image, create a kind cluster and load image to the kind cluster
make docker-build kind-create-cluster kind-load-image

Install the chart:

helm install spark-operator charts/spark-operator-chart \
    --namespace spark-operator \
    --create-namespace

Submit an example spark-pi sparkapplication and wait it to be completed:

kubectl apply -f examples/spark-pi.yaml

Check out controller logs:

$ kubectl logs -n spark-operator spark-operator-controller-fc5bd5bbc-4mzp4
...
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator controller start --zap-log-level=1 --namespaces=default --enable-ui-service=true --ingress-url-format= --controller-threads=10 --enable-batch-scheduler=false --enable-metrics=true --metrics-labels=app_type --metrics-port=10254 --metrics-endpoint=/metrics --metrics-prefix= --leader-election=true --leader-election-lock-name=spark-operator-controller-lock --leader-election-lock-namespace=spark-operator
2024-07-02T02:27:17.108Z	INFO	controller/start.go:262	Starting manager
2024-07-02T02:27:17.110Z	INFO	manager/server.go:50	starting server	{"kind": "health probe", "addr": "[::]:8081"}
I0702 02:27:17.110611      13 leaderelection.go:250] attempting to acquire leader lease spark-operator/spark-operator-controller-lock...
I0702 02:27:35.251311      13 leaderelection.go:260] successfully acquired lease spark-operator/spark-operator-controller-lock
2024-07-02T02:27:35.251Z	DEBUG	events	recorder/recorder.go:104	spark-operator-controller-fc5bd5bbc-4mzp4_db6fff2b-4d6a-4e76-b426-e0bb1d7cacba became leader	{"type": "Normal", "object": {"kind":"Lease","namespace":"spark-operator","name":"spark-operator-controller-lock","uid":"9aeacf62-a5a7-40db-8070-83aafc59b7bb","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1894"}, "reason": "LeaderElection"}
2024-07-02T02:27:35.253Z	INFO	controller/controller.go:178	Starting EventSource	{"controller": "spark-application-controller", "source": "kind source: *v1beta2.SparkApplication"}
2024-07-02T02:27:35.254Z	INFO	controller/controller.go:178	Starting EventSource	{"controller": "scheduled-spark-application-controller", "source": "kind source: *v1beta2.ScheduledSparkApplication"}
2024-07-02T02:27:35.254Z	INFO	controller/controller.go:186	Starting Controller	{"controller": "scheduled-spark-application-controller"}
2024-07-02T02:27:35.254Z	INFO	controller/controller.go:178	Starting EventSource	{"controller": "spark-application-controller", "source": "kind source: *v1.Pod"}
2024-07-02T02:27:35.254Z	INFO	controller/controller.go:186	Starting Controller	{"controller": "spark-application-controller"}
2024-07-02T02:27:35.360Z	INFO	controller/controller.go:220	Starting workers	{"controller": "spark-application-controller", "worker count": 10}
2024-07-02T02:27:35.360Z	INFO	controller/controller.go:220	Starting workers	{"controller": "scheduled-spark-application-controller", "worker count": 10}
2024-07-02T02:27:37.744Z	DEBUG	sparkapplication/event_handler.go:125	SparkApplication created	{"name": "spark-pi", "namespace": "default", "state": ""}
2024-07-02T02:27:37.751Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": ""}
2024-07-02T02:27:37.762Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "", "newState": ""}
2024-07-02T02:27:37.773Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": ""}
2024-07-02T02:27:37.773Z	INFO	sparkapplication/controller.go:598	Submitting SparkApplication	{"name": "spark-pi", "namespace": "default", "state": ""}
2024-07-02T02:27:37.773Z	INFO	sparkapplication/controller.go:618	Creating web UI service for SparkApplication	{"name": "spark-pi", "namespace": "default"}
2024-07-02T02:27:37.783Z	INFO	sparkapplication/controller.go:682	Running spark-submit for SparkApplication	{"name": "spark-pi", "namespace": "default", "arguments": ["--master", "k8s://https://10.96.0.1:443", "--class", "org.apache.spark.examples.SparkPi", "--deploy-mode", "cluster", "--conf", "spark.app.name=spark-pi", "--conf", "spark.kubernetes.namespace=default", "--conf", "spark.kubernetes.submission.waitAppCompletion=false", "--conf", "spark.kubernetes.driver.pod.name=spark-pi-driver", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=spark-pi", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true", "--conf", "spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=d370dcab-f5a4-447d-ae06-d1435889a025", "--conf", "spark.kubernetes.driver.container.image=spark:3.5.0", "--conf", "spark.kubernetes.authenticate.driver.serviceAccountName=spark-operator-spark", "--conf", "spark.kubernetes.driver.label.version=3.5.0", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=spark-pi", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true", "--conf", "spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=d370dcab-f5a4-447d-ae06-d1435889a025", "--conf", "spark.executor.instances=1", "--conf", "spark.kubernetes.executor.container.image=spark:3.5.0", "--conf", "spark.kubernetes.executor.label.version=3.5.0", "local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar"]}
2024-07-02T02:27:39.941Z	DEBUG	sparkapplication/event_handler.go:53	Spark pod created	{"name": "spark-pi-driver", "namespace": "default", "phase": "Pending"}
2024-07-02T02:27:39.945Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-driver", "namespace": "default", "oldPhase": "Pending", "newPhase": "Pending"}
2024-07-02T02:27:39.955Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-driver", "namespace": "default", "oldPhase": "Pending", "newPhase": "Pending"}
2024-07-02T02:27:40.099Z	DEBUG	events	recorder/recorder.go:104	SparkApplication spark-pi was submitted successfully	{"type": "Normal", "object": {"kind":"SparkApplication","namespace":"default","name":"spark-pi","uid":"54c20cb0-0163-4661-af90-93bce8b31d8b","apiVersion":"sparkoperator.k8s.io/v1beta2","resourceVersion":"1903"}, "reason": "SparkApplicationSubmitted"}
2024-07-02T02:27:40.104Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "", "newState": "SUBMITTED"}
2024-07-02T02:27:40.109Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "SUBMITTED"}
2024-07-02T02:27:40.114Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "SUBMITTED", "newState": "SUBMITTED"}
2024-07-02T02:27:40.121Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "SUBMITTED"}
2024-07-02T02:27:40.712Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-driver", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-07-02T02:27:40.718Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "SUBMITTED"}
2024-07-02T02:27:40.718Z	DEBUG	events	recorder/recorder.go:104	Driver spark-pi-driver is running	{"type": "Normal", "object": {"kind":"SparkApplication","namespace":"default","name":"spark-pi","uid":"54c20cb0-0163-4661-af90-93bce8b31d8b","apiVersion":"sparkoperator.k8s.io/v1beta2","resourceVersion":"1924"}, "reason": "SparkDriverRunning"}
2024-07-02T02:27:40.722Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "SUBMITTED", "newState": "RUNNING"}
2024-07-02T02:27:40.729Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:28:02.516Z	DEBUG	sparkapplication/event_handler.go:53	Spark pod created	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "phase": "Pending"}
2024-07-02T02:28:02.524Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:28:02.527Z	DEBUG	events	recorder/recorder.go:104	Executor [spark-pi-f41a97907145267b-exec-1] is pending	{"type": "Normal", "object": {"kind":"SparkApplication","namespace":"default","name":"spark-pi","uid":"54c20cb0-0163-4661-af90-93bce8b31d8b","apiVersion":"sparkoperator.k8s.io/v1beta2","resourceVersion":"1937"}, "reason": "SparkExecutorPending"}
2024-07-02T02:28:02.529Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "oldPhase": "Pending", "newPhase": "Pending"}
2024-07-02T02:28:02.538Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-07-02T02:28:02.539Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "oldPhase": "Pending", "newPhase": "Pending"}
2024-07-02T02:28:02.540Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:28:03.696Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "oldPhase": "Pending", "newPhase": "Running"}
2024-07-02T02:28:03.703Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:28:03.704Z	DEBUG	events	recorder/recorder.go:104	Executor [spark-pi-f41a97907145267b-exec-1] is running	{"type": "Normal", "object": {"kind":"SparkApplication","namespace":"default","name":"spark-pi","uid":"54c20cb0-0163-4661-af90-93bce8b31d8b","apiVersion":"sparkoperator.k8s.io/v1beta2","resourceVersion":"1997"}, "reason": "SparkExecutorRunning"}
2024-07-02T02:28:03.710Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-07-02T02:28:03.715Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:29:23.634Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "oldPhase": "Running", "newPhase": "Running"}
2024-07-02T02:29:23.642Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:29:23.835Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "oldPhase": "Running", "newPhase": "Succeeded"}
2024-07-02T02:29:23.842Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:29:23.842Z	DEBUG	events	recorder/recorder.go:104	Executor [spark-pi-f41a97907145267b-exec-1] completed	{"type": "Normal", "object": {"kind":"SparkApplication","namespace":"default","name":"spark-pi","uid":"54c20cb0-0163-4661-af90-93bce8b31d8b","apiVersion":"sparkoperator.k8s.io/v1beta2","resourceVersion":"2007"}, "reason": "SparkExecutorCompleted"}
2024-07-02T02:29:23.848Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "RUNNING", "newState": "RUNNING"}
2024-07-02T02:29:23.855Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:29:23.969Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "oldPhase": "Succeeded", "newPhase": "Succeeded"}
2024-07-02T02:29:23.974Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "oldPhase": "Succeeded", "newPhase": "Succeeded"}
2024-07-02T02:29:23.975Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:29:23.975Z	DEBUG	sparkapplication/event_handler.go:71	Spark pod deleted	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default", "phase": "Succeeded"}
2024-07-02T02:29:23.998Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:29:25.091Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-driver", "namespace": "default", "oldPhase": "Running", "newPhase": "Running"}
2024-07-02T02:29:25.097Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "RUNNING"}
2024-07-02T02:29:25.098Z	DEBUG	events	recorder/recorder.go:104	Driver spark-pi-driver completed	{"type": "Normal", "object": {"kind":"SparkApplication","namespace":"default","name":"spark-pi","uid":"54c20cb0-0163-4661-af90-93bce8b31d8b","apiVersion":"sparkoperator.k8s.io/v1beta2","resourceVersion":"2215"}, "reason": "SparkDriverCompleted"}
2024-07-02T02:29:25.110Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "RUNNING", "newState": "SUCCEEDING"}
2024-07-02T02:29:25.115Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "SUCCEEDING"}
2024-07-02T02:29:25.122Z	DEBUG	sparkapplication/event_handler.go:141	SparkApplication updated	{"name": "spark-pi", "namespace": "default", "oldState": "SUCCEEDING", "newState": "COMPLETED"}
2024-07-02T02:29:25.128Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "COMPLETED"}
2024-07-02T02:29:26.181Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-driver", "namespace": "default", "oldPhase": "Running", "newPhase": "Succeeded"}
2024-07-02T02:29:26.188Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "COMPLETED"}
2024-07-02T02:29:27.111Z	DEBUG	sparkapplication/event_handler.go:64	Spark pod updated	{"name": "spark-pi-driver", "namespace": "default", "oldPhase": "Succeeded", "newPhase": "Succeeded"}
2024-07-02T02:29:27.117Z	INFO	sparkapplication/controller.go:113	Reconciling SparkApplication	{"name": "spark-pi", "namespace": "default", "state": "COMPLETED"}

Check out webhook logs:

$ kubectl logs -n spark-operator spark-operator-webhook-58777cdc9b-2zqb8
...
+ exec /usr/bin/tini -s -- /usr/bin/spark-operator webhook start --zap-log-level=1 --namespaces=default --webhook-secret-name=spark-operator-webhook-certs --webhook-secret-namespace=spark-operator --webhook-svc-name=spark-operator-webhook-svc --webhook-svc-namespace=spark-operator --webhook-port=9443 --leader-election=true --leader-election-lock-name=spark-operator-webhook-lock --leader-election-lock-namespace=spark-operator
2024-07-02T02:27:17.012Z	INFO	webhook/start.go:245	Syncing webhook secret	{"name": "spark-operator-webhook-certs", "namespace": "spark-operator"}
2024-07-02T02:27:17.319Z	INFO	webhook/start.go:259	Writing certificates	{"path": "/etc/k8s-webhook-server/serving-certs", "certificate name": "tls.crt", "key name": "tls.key"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.builder	builder/webhook.go:158	Registering a mutating webhook	{"GVK": "sparkoperator.k8s.io/v1beta2, Kind=SparkApplication", "path": "/mutate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.webhook	webhook/server.go:183	Registering webhook	{"path": "/mutate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.builder	builder/webhook.go:189	Registering a validating webhook	{"GVK": "sparkoperator.k8s.io/v1beta2, Kind=SparkApplication", "path": "/validate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.webhook	webhook/server.go:183	Registering webhook	{"path": "/validate-sparkoperator-k8s-io-v1beta2-sparkapplication"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.builder	builder/webhook.go:158	Registering a mutating webhook	{"GVK": "sparkoperator.k8s.io/v1beta2, Kind=ScheduledSparkApplication", "path": "/mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.webhook	webhook/server.go:183	Registering webhook	{"path": "/mutate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.builder	builder/webhook.go:189	Registering a validating webhook	{"GVK": "sparkoperator.k8s.io/v1beta2, Kind=ScheduledSparkApplication", "path": "/validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.webhook	webhook/server.go:183	Registering webhook	{"path": "/validate-sparkoperator-k8s-io-v1beta2-scheduledsparkapplication"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.builder	builder/webhook.go:158	Registering a mutating webhook	{"GVK": "/v1, Kind=Pod", "path": "/mutate--v1-pod"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.webhook	webhook/server.go:183	Registering webhook	{"path": "/mutate--v1-pod"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.builder	builder/webhook.go:204	skip registering a validating webhook, object does not implement admission.Validator or WithValidator wasn't called	{"GVK": "/v1, Kind=Pod"}
2024-07-02T02:27:17.320Z	INFO	webhook/start.go:323	Starting manager
2024-07-02T02:27:17.320Z	INFO	manager/server.go:50	starting server	{"kind": "health probe", "addr": "[::]:8081"}
2024-07-02T02:27:17.320Z	INFO	controller-runtime.webhook	webhook/server.go:191	Starting webhook server
2024-07-02T02:27:17.320Z	INFO	webhook/start.go:361	disabling http/2
2024-07-02T02:27:17.320Z	INFO	controller-runtime.certwatcher	certwatcher/certwatcher.go:161	Updated current TLS certificate
2024-07-02T02:27:17.320Z	INFO	controller-runtime.webhook	webhook/server.go:242	Serving webhook server	{"host": "", "port": 9443}
2024-07-02T02:27:17.321Z	INFO	controller-runtime.certwatcher	certwatcher/certwatcher.go:115	Starting certificate watcher
I0702 02:27:17.321202      13 leaderelection.go:250] attempting to acquire leader lease spark-operator/spark-operator-webhook-lock...
2024/07/02 02:27:21 http: TLS handshake error from 172.18.0.2:46035: remote error: tls: bad certificate
2024/07/02 02:27:26 http: TLS handshake error from 172.18.0.2:29034: remote error: tls: bad certificate
I0702 02:27:34.320670      13 leaderelection.go:260] successfully acquired lease spark-operator/spark-operator-webhook-lock
2024-07-02T02:27:34.322Z	INFO	controller/controller.go:178	Starting EventSource	{"controller": "mutating-webhook-configuration-controller", "source": "kind source: *v1.MutatingWebhookConfiguration"}
2024-07-02T02:27:34.322Z	INFO	controller/controller.go:186	Starting Controller	{"controller": "mutating-webhook-configuration-controller"}
2024-07-02T02:27:34.321Z	DEBUG	events	recorder/recorder.go:104	spark-operator-webhook-58777cdc9b-2zqb8_f20cc8f9-a272-4946-8f17-effb10eb8546 became leader	{"type": "Normal", "object": {"kind":"Lease","namespace":"spark-operator","name":"spark-operator-webhook-lock","uid":"e5af5225-2ec7-41a4-b4b5-afdfc179bf51","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1887"}, "reason": "LeaderElection"}
2024-07-02T02:27:34.323Z	INFO	controller/controller.go:178	Starting EventSource	{"controller": "validating-webhook-configuration-controller", "source": "kind source: *v1.ValidatingWebhookConfiguration"}
2024-07-02T02:27:34.323Z	INFO	controller/controller.go:186	Starting Controller	{"controller": "validating-webhook-configuration-controller"}
2024-07-02T02:27:34.433Z	INFO	controller/controller.go:220	Starting workers	{"controller": "mutating-webhook-configuration-controller", "worker count": 1}
2024-07-02T02:27:34.433Z	DEBUG	validatingwebhookconfiguration/event_handler.go:46	ValidatingWebhookConfiguration created	{"name": "spark-operator-webhook", "namespace": ""}
2024-07-02T02:27:34.434Z	INFO	controller/controller.go:220	Starting workers	{"controller": "validating-webhook-configuration-controller", "worker count": 1}
2024-07-02T02:27:34.434Z	DEBUG	mutatingwebhookconfiguration/event_handler.go:46	MutatingWebhookConfiguration created	{"name": "spark-operator-webhook", "namespace": ""}
2024-07-02T02:27:34.441Z	INFO	mutatingwebhookconfiguration/controller.go:72	Updating CA bundle of MutatingWebhookConfiguration	{"name": "spark-operator-webhook", "namespace": ""}
2024-07-02T02:27:34.441Z	INFO	validatingwebhookconfiguration/controller.go:73	Updating CA bundle of ValidatingWebhookConfiguration	{"name": "spark-operator-webhook", "namespace": ""}
2024-07-02T02:27:34.454Z	DEBUG	validatingwebhookconfiguration/event_handler.go:68	ValidatingWebhookConfiguration updated	{"name": "spark-operator-webhook", "namespace": ""}
2024-07-02T02:27:34.455Z	DEBUG	mutatingwebhookconfiguration/event_handler.go:68	MutatingWebhookConfiguration updated	{"name": "spark-operator-webhook", "namespace": ""}
2024-07-02T02:27:34.461Z	INFO	validatingwebhookconfiguration/controller.go:73	Updating CA bundle of ValidatingWebhookConfiguration	{"name": "spark-operator-webhook", "namespace": ""}
2024-07-02T02:27:34.461Z	INFO	mutatingwebhookconfiguration/controller.go:72	Updating CA bundle of MutatingWebhookConfiguration	{"name": "spark-operator-webhook", "namespace": ""}
2024-07-02T02:27:37.724Z	INFO	webhook/scheduledsparkapplication_defaulter.go:47	Defaulting ScheduledSparkApplication	{"name": "spark-pi", "namespace": "default"}
2024-07-02T02:27:37.740Z	INFO	webhook/sparkapplication_validator.go:49	Validating SparkApplication create	{"name": "spark-pi", "namespace": "default"}
2024-07-02T02:27:37.758Z	INFO	webhook/scheduledsparkapplication_defaulter.go:47	Defaulting ScheduledSparkApplication	{"name": "spark-pi", "namespace": "default"}
2024-07-02T02:27:37.759Z	INFO	webhook/sparkapplication_validator.go:62	Validating SparkApplication update	{"name": "spark-pi", "namespace": "default"}
2024-07-02T02:27:39.932Z	INFO	webhook/sparkpod_defaulter.go:84	Mutating Spark pod	{"name": "spark-pi-driver", "namespace": "default"}
2024-07-02T02:28:02.499Z	INFO	webhook/sparkpod_defaulter.go:84	Mutating Spark pod	{"name": "spark-pi-f41a97907145267b-exec-1", "namespace": "default"}
...

Finally, we can use make kind-delete-cluster to delete the test kind cluster.

@ChenYi015
Copy link
Contributor Author

I am not certain about our current test coverage, but we should create a new image and test it locally if reviewers have the bandwidth. This would ensure it works with common examples without errors.

@vara-bonthu Now we can use make unit-test to see the test coverage:

$ make unit-test
Running unit tests...
        github.com/kubeflow/spark-operator/api/v1beta1          coverage: 0.0% of statements
?       github.com/kubeflow/spark-operator/hack/api-docs/template       [no test files]
        github.com/kubeflow/spark-operator/cmd/operator         coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/cmd          coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/cmd/operator/controller              coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/internal/batchscheduler              coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/cmd/operator/webhook         coverage: 0.0% of statements
ok      github.com/kubeflow/spark-operator/api/v1beta2  0.515s  coverage: 5.0% of statements
?       github.com/kubeflow/spark-operator/internal/controller  [no test files]
        github.com/kubeflow/spark-operator/internal/controller/mutatingwebhookconfiguration             coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/internal/controller/validatingwebhookconfiguration           coverage: 0.0% of statements
ok      github.com/kubeflow/spark-operator/internal/batchscheduler/volcano      0.227s  coverage: 12.7% of statements
        github.com/kubeflow/spark-operator/pkg/client/clientset/versioned/typed/sparkoperator.k8s.io/v1beta2/fake               coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/clientset/versioned               coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/clientset/versioned/typed/sparkoperator.k8s.io/v1beta1/fake               coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/clientset/versioned/typed/sparkoperator.k8s.io/v1beta2            coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/clientset/versioned/typed/sparkoperator.k8s.io/v1beta1            coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/clientset/versioned/fake          coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/clientset/versioned/scheme                coverage: 0.0% of statements
?       github.com/kubeflow/spark-operator/pkg/client/informers/externalversions/internalinterfaces     [no test files]
        github.com/kubeflow/spark-operator/pkg/client/informers/externalversions                coverage: 0.0% of statements
?       github.com/kubeflow/spark-operator/pkg/common   [no test files]
        github.com/kubeflow/spark-operator/pkg/client/informers/externalversions/sparkoperator.k8s.io/v1beta2           coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/informers/externalversions/sparkoperator.k8s.io/v1beta1           coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/listers/sparkoperator.k8s.io/v1beta1              coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/informers/externalversions/sparkoperator.k8s.io           coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/pkg/client/listers/sparkoperator.k8s.io/v1beta2              coverage: 0.0% of statements
        github.com/kubeflow/spark-operator/sparkctl             coverage: 0.0% of statements
ok      github.com/kubeflow/spark-operator/internal/controller/scheduledsparkapplication        7.001s  coverage: 3.7% of statements
ok      github.com/kubeflow/spark-operator/internal/controller/sparkapplication 6.935s  coverage: 5.1% of statements
ok      github.com/kubeflow/spark-operator/internal/webhook     0.631s  coverage: 63.2% of statements
ok      github.com/kubeflow/spark-operator/internal/webhook/resourceusage       0.370s  coverage: 1.0% of statements
ok      github.com/kubeflow/spark-operator/pkg/cert     6.205s  coverage: 12.9% of statements
ok      github.com/kubeflow/spark-operator/pkg/util     0.741s  coverage: 18.9% of statements
ok      github.com/kubeflow/spark-operator/sparkctl/cmd 0.416s  coverage: 5.2% of statements

To test the operator and helm chart with kind cluster, just do as follows:

make docker-build kind-create-cluster kind-load-image

kind binary will be downloaded automatically to bin directory.

@ChenYi015
Copy link
Contributor Author

/hold for review

@yuchaoran2011
Copy link
Contributor

We could create a pre-release tagged as 2.0 to include other improvements as well. This will give users time to migrate to the newer version since it introduces breaking changes.

I second @vara-bonthu's suggestion here. Given that this PR has completely reworked the implementation, it makes sense to mark it as the start of v2.0. And we can create a new branch for it and merge it there

@ChenYi015
Copy link
Contributor Author

@yuchaoran2011 Maybe we can make a new branch named release-2.0 as described in #1975.

@ChenYi015
Copy link
Contributor Author

@yuchaoran2011 I had did some benchmarks to compare start latency (created state -> submitted state) between helm chart v1.4.5 and this one. Install the operator with 30 worker threads and webhook enabled, using 20 cpu cores and 16 Gi memory. Then submit 1000 spark-pi application concurrently. For chart v1.4.5, the average start latency is 3m53s, and the controller-runtime version is 3m35s, just a little bit faster. Maybe we can achieve better performance by tuning controller-runtime related parameters.

@ChenYi015 ChenYi015 force-pushed the controller-runtime branch from 58c29f7 to 1d4d4bf Compare August 1, 2024 03:36
@ChenYi015
Copy link
Contributor Author

/hold cancel

Copy link
Contributor

@yuchaoran2011 yuchaoran2011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that #2089 is merged, this one is good to merge as well. Excited to see we officially kick off 2.0!
/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Aug 1, 2024
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: yuchaoran2011

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 0dc641b into kubeflow:master Aug 1, 2024
7 checks passed
ChenYi015 added a commit to ChenYi015/spark-operator that referenced this pull request Aug 1, 2024
* Use controller-runtime to reconstruct spark operator

Signed-off-by: Yi Chen <[email protected]>

* Update helm charts

Signed-off-by: Yi Chen <[email protected]>

* Update examples

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 0dc641b)
google-oss-prow bot pushed a commit that referenced this pull request Aug 1, 2024
* Update helm docs (#2081)

Signed-off-by: Carlos Sánchez Páez <[email protected]>
(cherry picked from commit eca3fc8)

* Update the process to build api-docs, generate CRD manifests and code (#2046)

* Update .gitignore

Signed-off-by: Yi Chen <[email protected]>

* Update .dockerignore

Signed-off-by: Yi Chen <[email protected]>

* Update Makefile

Signed-off-by: Yi Chen <[email protected]>

* Update the process to generate api docs

Signed-off-by: Yi Chen <[email protected]>

* Update the workflow to generate api docs

Signed-off-by: Yi Chen <[email protected]>

* Use controller-gen to generate CRD and deep copy related methods

Signed-off-by: Yi Chen <[email protected]>

* Update helm chart CRDs

Signed-off-by: Yi Chen <[email protected]>

* Update workflow for building spark operator

Signed-off-by: Yi Chen <[email protected]>

* Update README.md

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 779ea3d)

* Add topologySpreadConstraints (#2091)

* Update README and documentation (#2047)

* Update docs

Signed-off-by: Yi Chen <[email protected]>

* Remove docs and update README

Signed-off-by: Yi Chen <[email protected]>

* Add link to monthly community meeting

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>

* Add PodDisruptionBudget to chart (#2078)

* Add PodDisruptionBudget to chart

Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>

* PR comments

Signed-off-by: Carlos Sánchez Páez <[email protected]>

---------

Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>

* Set topologySpreadConstraints

Signed-off-by: jbhalodia-slack <[email protected]>

* Update README and increase patch version

Signed-off-by: jbhalodia-slack <[email protected]>

* Revert replicaCount change

Signed-off-by: jbhalodia-slack <[email protected]>

* Update README after master merger

Signed-off-by: jbhalodia-slack <[email protected]>

* Update README

Signed-off-by: jbhalodia-slack <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>
Co-authored-by: Yi Chen <[email protected]>
Co-authored-by: Carlos Sánchez Páez <[email protected]>
(cherry picked from commit 4108f54)

* Use controller-runtime to reconsturct spark operator (#2072)

* Use controller-runtime to reconstruct spark operator

Signed-off-by: Yi Chen <[email protected]>

* Update helm charts

Signed-off-by: Yi Chen <[email protected]>

* Update examples

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 0dc641b)

---------

Co-authored-by: Carlos Sánchez Páez <[email protected]>
Co-authored-by: jbhalodia-slack <[email protected]>
YanivKunda pushed a commit to YanivKunda/spark-operator that referenced this pull request Aug 5, 2024
* Use controller-runtime to reconstruct spark operator

Signed-off-by: Yi Chen <[email protected]>

* Update helm charts

Signed-off-by: Yi Chen <[email protected]>

* Update examples

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
sigmarkarl pushed a commit to spotinst/spark-on-k8s-operator that referenced this pull request Aug 7, 2024
* Use controller-runtime to reconstruct spark operator

Signed-off-by: Yi Chen <[email protected]>

* Update helm charts

Signed-off-by: Yi Chen <[email protected]>

* Update examples

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
@Viktor3434
Copy link

@ChenYi015 Hello!
Can you tell me how to use livenessProbe and other probes in helm values?

@ChenYi015 ChenYi015 deleted the controller-runtime branch September 23, 2024 12:02
@ChenYi015
Copy link
Contributor Author

@ChenYi015 Hello! Can you tell me how to use livenessProbe and other probes in helm values?

@Viktor3434 For now, livenessProbe and readinessProbe are enabled by default and cannot be configured via helm values. I am wondering why do you want it to be configurable?

@Viktor3434
Copy link

@ChenYi015 Thanks, can you tell me if probes have been added for the SparkApplication resource?

@jacobsalway
Copy link
Member

@Viktor3434 do you mean probes on the Spark driver and executor pods? If so, can you go into more detail on the reason for wanting to?

@Viktor3434
Copy link

Viktor3434 commented Sep 24, 2024

@jacobsalway Hi! sorry for the long wait for a reply
I am not very familiar with spark concepts.

We have a SparkApplication that reads data from kafka and writes it to greenplum.
Relatively recently there was a case where the application got stuck, perhaps probes could help, but I'm not sure.

@jacobsalway
Copy link
Member

jacobsalway commented Sep 24, 2024

@Viktor3434 no problem. Spark itself doesn't have a healthcheck port that you could configure a probe for. I would suggest you scrape metrics from the driver and set up separate alerting and monitoring based on streaming progress, Kafka consumer group lag or other metrics that could give you a proxy on whether the job is progressing or stuck.

If you'd like, you can start a thread in the community Slack and tag me so we can discuss in more detail.

jbhalodia-slack added a commit to jbhalodia-slack/spark-operator that referenced this pull request Oct 4, 2024
…ubeflow#2108)

* Update helm docs (kubeflow#2081)

Signed-off-by: Carlos Sánchez Páez <[email protected]>
(cherry picked from commit eca3fc8)

* Update the process to build api-docs, generate CRD manifests and code (kubeflow#2046)

* Update .gitignore

Signed-off-by: Yi Chen <[email protected]>

* Update .dockerignore

Signed-off-by: Yi Chen <[email protected]>

* Update Makefile

Signed-off-by: Yi Chen <[email protected]>

* Update the process to generate api docs

Signed-off-by: Yi Chen <[email protected]>

* Update the workflow to generate api docs

Signed-off-by: Yi Chen <[email protected]>

* Use controller-gen to generate CRD and deep copy related methods

Signed-off-by: Yi Chen <[email protected]>

* Update helm chart CRDs

Signed-off-by: Yi Chen <[email protected]>

* Update workflow for building spark operator

Signed-off-by: Yi Chen <[email protected]>

* Update README.md

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 779ea3d)

* Add topologySpreadConstraints (kubeflow#2091)

* Update README and documentation (kubeflow#2047)

* Update docs

Signed-off-by: Yi Chen <[email protected]>

* Remove docs and update README

Signed-off-by: Yi Chen <[email protected]>

* Add link to monthly community meeting

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>

* Add PodDisruptionBudget to chart (kubeflow#2078)

* Add PodDisruptionBudget to chart

Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>

* PR comments

Signed-off-by: Carlos Sánchez Páez <[email protected]>

---------

Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>

* Set topologySpreadConstraints

Signed-off-by: jbhalodia-slack <[email protected]>

* Update README and increase patch version

Signed-off-by: jbhalodia-slack <[email protected]>

* Revert replicaCount change

Signed-off-by: jbhalodia-slack <[email protected]>

* Update README after master merger

Signed-off-by: jbhalodia-slack <[email protected]>

* Update README

Signed-off-by: jbhalodia-slack <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
Signed-off-by: jbhalodia-slack <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>
Signed-off-by: Carlos Sánchez Páez <[email protected]>
Co-authored-by: Yi Chen <[email protected]>
Co-authored-by: Carlos Sánchez Páez <[email protected]>
(cherry picked from commit 4108f54)

* Use controller-runtime to reconsturct spark operator (kubeflow#2072)

* Use controller-runtime to reconstruct spark operator

Signed-off-by: Yi Chen <[email protected]>

* Update helm charts

Signed-off-by: Yi Chen <[email protected]>

* Update examples

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 0dc641b)

---------

Co-authored-by: Carlos Sánchez Páez <[email protected]>
Co-authored-by: jbhalodia-slack <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants