Releases: kubeflow/spark-operator
v2.1.0
New Features
- Upgrade to Spark 3.5.3 (#2202 by @jacobsalway)
- feat: support archives param for spark-submit (#2256 by @kaka-zb)
- Allow --ingress-class-name to be specified in chart (#2278 by @jacobsalway)
- Update default container security context (#2265 by @ChenYi015)
- Support pod template for Spark 3.x applications (#2141 by @ChenYi015)
- Allow setting automountServiceAccountToken (#2298 by @Aranch)
- Allow the Controller and Webhook Containers to run with the securityContext: readOnlyRootfilesystem: true (#2282 by @npgretz)
- Use NSS_WRAPPER_PASSWD instead of /etc/passwd as in spark-operator image entrypoint.sh (#2312 by @Aakcht)
Bug Fixes
- Minor fixes to e2e test
make
targets (#2242 by @Tom-Newton) - Added off heap memory to calculation for YuniKorn gang scheduling (#2209 by @guangyu-yang-rokt)
- Add permissions to controller serviceaccount to list and watch ingresses (#2246 by @tcassaert)
- Make sure enable-ui-service flag is set to false when controller.uiService.enable is set to false (#2261 by @Roberdvs)
omitempty
corrections (#2255 by @Tom-Newton)- Fix retries (#2241 by @Tom-Newton)
- Fix: executor container security context does not work (#2306 by @ChenYi015)
- Fix: should not add emptyDir sizeLimit conf if it is nil (#2305 by @ChenYi015)
- Fix: should not add emptyDir sizeLimit conf on executor pods if it is nil (#2316 by @Cian911)
- Truncate UI service name if over 63 characters (#2311 by @jacobsalway)
- The webhook-key-name command-line param isn't taking effect (#2344 by @c-h-afzal)
- Robustness to driver pod taking time to create (#2315 by @Tom-Newton)
Misc
- remove redundant test.sh file (#2243 by @ChenYi015)
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.42 to 1.27.43 (#2252 by @dependabot[bot])
- Bump manusa/actions-setup-minikube from 2.12.0 to 2.13.0 (#2247 by @dependabot[bot])
- Bump golang.org/x/net from 0.29.0 to 0.30.0 (#2251 by @dependabot[bot])
- Bump aquasecurity/trivy-action from 0.24.0 to 0.27.0 (#2248 by @dependabot[bot])
- Bump gocloud.dev from 0.39.0 to 0.40.0 (#2250 by @dependabot[bot])
- Add Quick Start guide to README (#2259 by @jacobsalway)
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.63.3 to 1.65.3 (#2249 by @dependabot[bot])
- Add release badge to README (#2263 by @jacobsalway)
- Bump helm.sh/helm/v3 from 3.16.1 to 3.16.2 (#2275 by @dependabot[bot])
- Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 (#2274 by @dependabot[bot])
- Bump cloud.google.com/go/storage from 1.44.0 to 1.45.0 (#2273 by @dependabot[bot])
- Run e2e tests with Kubernetes version matrix (#2266 by @jacobsalway)
- Bump aquasecurity/trivy-action from 0.27.0 to 0.28.0 (#2270 by @dependabot[bot])
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.65.3 to 1.66.0 (#2271 by @dependabot[bot])
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.43 to 1.28.0 (#2272 by @dependabot[bot])
- Add workflow for releasing sparkctl binary (#2264 by @ChenYi015)
- Bump
volcano.sh/apis
to 1.10.0 (#2320 by @jacobsalway) - Bump aquasecurity/trivy-action from 0.28.0 to 0.29.0 (#2332 by @dependabot[bot])
- Bump github.com/onsi/ginkgo/v2 from 2.20.2 to 2.22.0 (#2335 by @dependabot[bot])
- Move sparkctl to cmd directory (#2347 by @ChenYi015)
v2.1.0-rc.0
New Features
- Upgrade to Spark 3.5.3 (#2202 by @jacobsalway)
- feat: support archives param for spark-submit (#2256 by @kaka-zb)
- Allow --ingress-class-name to be specified in chart (#2278 by @jacobsalway)
- Update default container security context (#2265 by @ChenYi015)
- Support pod template for Spark 3.x applications (#2141 by @ChenYi015)
Bug Fixes
- Minor fixes to e2e test
make
targets (#2242 by @Tom-Newton) - Added off heap memory to calculation for YuniKorn gang scheduling (#2209 by @guangyu-yang-rokt)
- Add permissions to controller serviceaccount to list and watch ingresses (#2246 by @tcassaert)
- Make sure enable-ui-service flag is set to false when controller.uiService.enable is set to false (#2261 by @Roberdvs)
omitempty
corrections (#2255 by @Tom-Newton)- Fix retries (#2241 by @Tom-Newton)
Misc
- remove redundant test.sh file (#2243 by @ChenYi015)
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.42 to 1.27.43 (#2252 by @dependabot[bot])
- Bump manusa/actions-setup-minikube from 2.12.0 to 2.13.0 (#2247 by @dependabot[bot])
- Bump golang.org/x/net from 0.29.0 to 0.30.0 (#2251 by @dependabot[bot])
- Bump aquasecurity/trivy-action from 0.24.0 to 0.27.0 (#2248 by @dependabot[bot])
- Bump gocloud.dev from 0.39.0 to 0.40.0 (#2250 by @dependabot[bot])
- Add Quick Start guide to README (#2259 by @jacobsalway)
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.63.3 to 1.65.3 (#2249 by @dependabot[bot])
- Add release badge to README (#2263 by @jacobsalway)
- Bump helm.sh/helm/v3 from 3.16.1 to 3.16.2 (#2275 by @dependabot[bot])
- Bump github.com/prometheus/client_golang from 1.20.4 to 1.20.5 (#2274 by @dependabot[bot])
- Bump cloud.google.com/go/storage from 1.44.0 to 1.45.0 (#2273 by @dependabot[bot])
- Run e2e tests with Kubernetes version matrix (#2266 by @jacobsalway)
- Bump aquasecurity/trivy-action from 0.27.0 to 0.28.0 (#2270 by @dependabot[bot])
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.65.3 to 1.66.0 (#2271 by @dependabot[bot])
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.43 to 1.28.0 (#2272 by @dependabot[bot])
- Add workflow for releasing sparkctl binary (#2264 by @ChenYi015)
v2.0.2
Bug Fixes
- Fix ingress capability discovery (#2201 by @jacobsalway)
- fix: imagePullPolicy was ignored (#2222 by @missedone)
- fix: spark-submission failed due to lack of permission by user
spark
(#2223 by @missedone) - Remove
cap_net_bind_service
from image (#2216 by @jacobsalway) - fix: webhook panics due to logging (#2232 by @ChenYi015)
Misc
- Bump github.com/aws/aws-sdk-go-v2 from 1.30.5 to 1.31.0 (#2207 by @dependabot[bot])
- Bump golang.org/x/net from 0.28.0 to 0.29.0 (#2205 by @dependabot[bot])
- Bump github.com/docker/docker from 27.0.3+incompatible to 27.1.1+incompatible (#2125 by @dependabot[bot])
- Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.58.3 to 1.63.3 (#2206 by @dependabot[bot])
- Update integration test workflow and add golangci lint check (#2197 by @ChenYi015)
- Bump github.com/aws/aws-sdk-go-v2 from 1.31.0 to 1.32.0 (#2229 by @dependabot[bot])
- Bump cloud.google.com/go/storage from 1.43.0 to 1.44.0 (#2228 by @dependabot[bot])
- Bump manusa/actions-setup-minikube from 2.11.0 to 2.12.0 (#2226 by @dependabot[bot])
- Bump golang.org/x/time from 0.6.0 to 0.7.0 (#2227 by @dependabot[bot])
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.33 to 1.27.42 (#2231 by @dependabot[bot])
- Bump github.com/prometheus/client_golang from 1.19.1 to 1.20.4 (#2204 by @dependabot[bot])
- Add check for generating manifests and code (#2234 by @ChenYi015)
What's Changed
- Release v2.0.2 by @ChenYi015 in #2233
More Details
- By removing setcap 'cap_net_bind_service=+ep' from the docker build, the container can run with all capabilities dropped on a non-privileged port. If you want to listen on port <1024, you could either run as root or add back the
NET_BIND_SERVICE
capability or build your own image with this flag on the binary.
v2.0.1
New Features
Bug Fixes
- Update controller RBAC for ConfigMap and PersistentVolumeClaim (#2187 by @ChenYi015)
Misc
- Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.20.2 (#2188 by @dependabot[bot])
- Bump github.com/onsi/gomega from 1.33.1 to 1.34.2 (#2189 by @dependabot[bot])
Full Changelog: v2.0.0...v2.0.1
v2.0.0
Breaking Changes
- Use controller-runtime to reconsturct spark operator (#2072 by @ChenYi015)
- feat: support driver and executor pod use different priority (#2146 by @Kevinz857)
New Features
- Support gang scheduling with Yunikorn (#2107) by @jacobsalway
- Reintroduce option webhook.enable (#2142 by @ChenYi015)
- Add default batch scheduler argument (#2143 by @jacobsalway)
- Support extended kube-scheduler as batch scheduler (#2136 by @ChenYi015)
- Set schedulerName to Yunikorn (#2153 by @jacobsalway)
- Feature: Add pprof endpoint (#2164 by @ImpSy)
Bug Fixes
- fix: Add default values for namespaces to match usage descriptions (#2128 by @snappyyouth)
- Fix: Spark role binding did not render properly when setting spark service account name (#2135 by @ChenYi015)
- fix: unable to set controller/webhook replicas to zero (#2147 by @ChenYi015)
- Adding support for setting spark job namespaces to all namespaces (#2123 by @ChenYi015)
- Fix: e2e test failes due to webhook not ready (#2149 by @ChenYi015)
- fix: webhook not working when settings spark job namespaces to empty (#2163 by @ChenYi015)
- fix: The logger had an odd number of arguments, making it panic (#2166 by @tcassaert)
- fix the make kind-delete-custer to avoid accidental kubeconfig deletion (#2172 by @ImpSy)
- Add specific error in log line when failed to create web UI service (#2170 by @tcassaert)
- Account for spark.executor.pyspark.memory in Yunikorn gang scheduling (#2178 by @jacobsalway)
- Fix: spark application does not respect time to live seconds (#2165 by @ChenYi015)
Misc
- Update workflow and docs for releasing Spark operator (#2089 by @ChenYi015)
- Fix broken integration test CI (#2109 by @ChenYi015)
- Fix CI: environment variable BRANCH is missed (#2111 by @ChenYi015)
- Update Makefile for building sparkctl (#2119 by @ChenYi015)
- Update release workflow and docs (#2121 by @ChenYi015)
- Run e2e tests on Kind (#2148 by @jacobsalway)
- Upgrade to Go 1.23.1 (#2155 by @jacobsalway)
- Upgrade to Spark 3.5.2 (#2154 by @jacobsalway)
- Bump sigs.k8s.io/scheduler-plugins from 0.29.7 to 0.29.8 (#2159 by @dependabot[bot])
- Bump gocloud.dev from 0.37.0 to 0.39.0 (#2160 by @dependabot[bot])
- Update e2e tests (#2161 by @ChenYi015)
- Upgrade to Spark 3.5.2(#2012) (#2157 by @ha2hi)
- Bump github.com/aws/aws-sdk-go-v2/config from 1.27.27 to 1.27.33 (#2174 by @dependabot[bot])
- Bump helm.sh/helm/v3 from 3.15.3 to 3.16.1 (#2173 by @dependabot[bot])
- implement workflow to scan latest released docker image (#2177 by @ImpSy)
What's Changed
- Cherry pick #2081 #2046 #2091 #2072 by @ChenYi015 in #2108
- Cherry pick #2089 #2109 #2111 by @ChenYi015 in #2110
- Release v2.0.0-rc.0 by @ChenYi015 in #2115
- Cherry pick commits for releasing v2.0.0 by @ChenYi015 in #2156
- Release v2.0.0 by @ChenYi015 in #2182
Full Changelog: v1beta2-1.6.2-3.5.0...v2.0.0
More details
This release is a major refactoring of the Spark operator and the Helm chart, includes:
-
Use controller-runtime to reconsturct spark operator (#547). It will significantly improve the maintenance and performance of spark operator.
-
Support multiple namespaces (#507, #2052). For example, if you set
spark.jobNamespaces
to["default", "spark-operator"]
(please make sure these spark job namespaces already exist before the installation), then the controller and the webhook server will only watch and handle SparkApplications in these spark job namespaces:helm install spark-operator spark-operator/spark-operator \ --version 2.0.0 \ --create-namespace \ --namespace spark-operator \ --set 'spark.jobNamespaces={default,spark-operator}'
-
Support all namespaces. If you want to watch SparkApplications in all namespaces, you can set
spark.jobNamespaces
to[]
:helm install spark-operator spark-operator/spark-operator \ --version 2.0.0 \ --create-namespace \ --namespace spark-operator \ --set 'spark.jobNamespaces={}'
But this will not create RBAC resources for Spark in any namespaces. If we want to watch all namespaces and also create RBAC resources for Spark in namespace
default
andspark-operator
, then we can do as follows:helm install spark-operator spark-operator/spark-operator \ --version 2.0.0 \ --create-namespace \ --namespace spark-operator \ --set 'spark.jobNamespaces={,default,spark-operator}'
-
Support multiple instances. Deploy several spark operator in the same namespace or different namespaces. For example, install two spark operator both in the spark-operator namespace. One with name
spark-operator
and handles namespacedefault
, another one with namespark-operator-2
and handles namespacespark-operator
(please make sure these instances have different release names and handle different spark job namespaces so that they are not conflicting with each other):helm install spark-operator spark-operator/spark-operator \ --version 2.0.0 \ --create-namespace \ --namespace spark-operator \ --set 'spark.jobNamespaces={default}' helm install spark-operator-2 spark-operator/spark-operator \ --version 2.0.0 \ --create-namespace \ --namespace spark-operator \ --set 'spark.jobNamespaces={spark-operator}'
-
Support Yunikorn and Kube scheduler as batch schedulers.
-
Leader election is enabled by default and cannot be disabled, it can make sure only one controller instance will be handling SparkApplications during the install/upgrade/rollback process.
-
Webhook server is enabled by default. It will be used to default/validate SparkApplications and mutate Spark pods.
-
Webhook secret will be populated and handled properly during the install/upgrade/rollback process. It will be created and updated by the controller. If the secret is empty, then new certificates will be generated to populate it, otherwise, controller will sync certificates to local disk.
-
Change the default of webhook failurePolicy from
Ignore
toFail
. Change the default of webhook timeoutSeconds from30
to10
. There are many issues related to webhook, e.g. environments variables dropped, volumes not mounted. And these issues can be solv...
v2.0.0-rc.0
This is the Spark Operator v2.0.0-rc.0
pre-release.
Breaking Changes
- Use controller-runtime to reconsturct spark operator (#2072 by @ChenYi015)
Misc
- Fix CI: environment variable BRANCH is missed (#2111 by @ChenYi015)
- Fix broken integration test CI (#2109 by @ChenYi015)
- Update workflow and docs for releasing Spark operator (#2089 by @ChenYi015)
What's Changed
- Cherry pick #2081 #2046 #2091 #2072 by @ChenYi015 in #2108
- Cherry pick #2089 #2109 #2111 by @ChenYi015 in #2110
- Release v2.0.0-rc.0 by @ChenYi015 in #2115
Full Changelog: spark-operator-chart-1.4.3...v2.0.0-rc.0
More details
This pre-release is a major refactoring of the Spark operator and the Helm chart, includes:
-
Use controller-runtime to reconsturct spark operator (#547). It will significantly improve the maintenance and performance of spark operator.
-
Support multiple namespaces (#507, #2052). For example, if you set spark.jobNamespaces to [default, spark-operator] (please make sure these spark job namespaces already exist before the installation), then the controller and the webhook server will only watch and handle SparkApplications in these spark job namespaces:
helm install spark-operator spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--create-namespace \
--namespace spark-operator \
--set 'spark.jobNamespaces={default,spark-operator}'
- Support multiple instances. Deploy several spark operator in the same namespace or different namespaces. For example, install two spark operator both in the spark-operator namespace. One with name spark-operator and handles namespace default, another one with name spark-operator-2 and handles namespace spark-operator (please make sure these instances have different release names and handle different spark job namespaces so that they are not conflicting with each other):
helm install spark-operator spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--create-namespace \
--namespace spark-operator \
--set 'spark.jobNamespaces={default}'
helm install spark-operator-2 spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--create-namespace \
--namespace spark-operator \
--set 'spark.jobNamespaces={spark-operator}'
-
Leader election is enabled by default and cannot be disabled, it can make sure only one controller instance will be handling SparkApplications during the install/upgrade/rollback process.
-
Webhook server is enabled by default and cannot be disabled. It will be used to default/validate SparkApplications and mutate Spark pods.
-
Webhook secret will be populated and handled properly during the install/upgrade/rollback process. It will be created and updated by the controller. If the secret is empty, then new certificates will be generated to populate it, otherwise, controller will sync certificates to local disk.
-
Change the default of webhook failurePolicy from Ignore to Fail. Change the default of webhook timeoutSeconds from 30 to 10. There are many issues related to webhook, e.g. environments variables dropped, volumes not mounted. And these issues can be solved by setting webhook.failurePolicy to Failure, webhook server will admit spark pods creation only when there is no error.
-
Controller and webhook server are deployed in different k8s deployments and can be scaled independently. When deploying spark applications at a very large scale, the webhook server can be a performance bottleneck. This can be solved by increasing the replicas of webhook server:
helm install spark-operator spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--create-namespace \
--namespace spark-operator \
--set webhook.replicas=5
Some Helm values renamings:
- Change imagePullSecrets to image.pullSecrets.
- All controller configurations are prefixed with controller e.g. controller.replicas and controller.workers
- All webhook configurations are prefixed with webhook e.g. webhook.replicas and webhook.failurePolicy.
- All monitoring configurations are prefixed with promethues e.g. promethues.metrics and promethues.podMonitor.
- The update strategy of controller/webhook deployment will be the rolling update, not recreate.
- Change the default spark job namespace from [] to ["default], thus the SparkApplication under examples directory can be running directly without creating rbac resources manually.
- Service account are configured with controller.serviceAccount, webhook.serviceAccount and spark.serviceAccount respectively.
- RBAC resources are configured with controller.rbac, webhook.rbac and spark.rbac respectively.
- logLevel will be one of info, debug and error.
If you want to try this new pre-release with Helm, do as follows:
helm repo add spark-operator https://kubeflow.github.io/spark-operator
helm repo update
helm install spark-operator spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--create-namespace \
--namespace spark-operator
Or upgrade from chart 1.4.6 :
helm upgrade spark-operator spark-operator/spark-operator \
--version 2.0.0-rc.0 \
--namespace spark-operator
spark-operator-chart-1.4.6
A Helm chart for Spark on Kubernetes operator
spark-operator-chart-1.4.5
A Helm chart for Spark on Kubernetes operator
spark-operator-chart-1.4.4
A Helm chart for Spark on Kubernetes operator
spark-operator-chart-1.4.3
A Helm chart for Spark on Kubernetes operator