Skip to content

Commit

Permalink
rename all occurences
Browse files Browse the repository at this point in the history
Signed-off-by: wayner0628 <[email protected]>
  • Loading branch information
wayner0628 committed Nov 21, 2024
1 parent e13babb commit 8b2bd44
Show file tree
Hide file tree
Showing 76 changed files with 5,208 additions and 5,120 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG/CHANGELOG-v1.2.0-b3.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Flyte v1.2.0-b3 Changelog

Allow adding sidecar containers to flyteadmin
Allow adding uploader containers to flyteadmin
139 changes: 77 additions & 62 deletions CHANGELOG/CHANGELOG-v1.3.0-b5.md
Original file line number Diff line number Diff line change
@@ -1,108 +1,119 @@

# Flyte v1.3.0-b5 Changelog

This pulls in Databricks support. Please see the [GH issue](https://github.com/flyteorg/flyte/issues/3173) for a listing of the relevant PRs.
There are other changes included in this beta release as well.

## Try it out locally

You can try out these changes "locally", as they've been included in the `flytectl demo` image for this beta release, but since the demo cluster is meant
to be an isolated, local-only cluster, you'll have to make some changes to get it to talk to a live databricks account. You'll also need to configure
access to a real S3 bucket (as opposed to Minio, which is what the demo local cluster typically relies on).

### S3 Setup

Follow the [AWS instructions](https://docs.aws.amazon.com/powershell/latest/userguide/pstools-appendix-sign-up.html) for generating access and secret
keys that can be used to hit your S3 bucket of choice.

### Flyte Demo Cluster

#### Starting the cluster

Run flytectl demo start with the image argument

```bash
flytectl demo start --image ghcr.io/flyteorg/flyte-sandbox-bundled:sha-e240038bea1f3bdfe2092823688d35dc78fb6e6b
```

#### Configure the Demo Cluster

1. Update the Flyte configmap
```bash
kubectl -n flyte edit cm sandbox-flyte-binary-config
```
1. Update the `003-storage.yaml` section
Make the storage section look like the following. You should update the propeller `rawoutput-prefix` setting as well.
```bash
kubectl -n flyte edit cm sandbox-flyte-binary-config
```
1. Update the `003-storage.yaml` section
Make the storage section look like the following. You should update the propeller `rawoutput-prefix` setting as well.
```
storage:
type: s3
container: "your-bucket"
stow:
kind: s3
config:
access_key_id: AKIAYOURKEY
auth_type: accesskey
secret_key: YOUR+SECRET
disable_ssl: true
region: us-east-2
```
1. Update the `010-inline-config.yaml` section

1. Under the existing `plugins` section, as a sibling to `k8s`, add
```
databricks:
databricksInstance: dbc-abc-123.cloud.databricks.com
entrypointFile: dbfs:///FileStore/tables/entrypoint.py
```
2. In the `k8s` section, update the `default-env-vars` section
```
- FLYTE_AWS_ACCESS_KEY_ID: AKIAYOURKEY
- AWS_DEFAULT_REGION: us-east-2
- FLYTE_AWS_SECRET_ACCESS_KEY: YOUR+SECRET
```
These are the same values as in the storage section above.
3. Add in an section for data proxy
```
storage:
type: s3
container: "your-bucket"
stow:
kind: s3
config:
access_key_id: AKIAYOURKEY
auth_type: accesskey
secret_key: YOUR+SECRET
disable_ssl: true
region: us-east-2
remoteData:
region: us-east-2
scheme: aws
signedUrls:
durationMinutes: 3
```
4. Enable databricks plugin
```shell
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
uploader: uploader
ray: ray
spark: databricks
enabled-plugins:
- container
- databricks
- ray
- uploader
- k8s-array
```
1. Update the `010-inline-config.yaml` section
1. Under the existing `plugins` section, as a sibling to `k8s`, add
```
databricks:
databricksInstance: dbc-abc-123.cloud.databricks.com
entrypointFile: dbfs:///FileStore/tables/entrypoint.py
```
2. In the `k8s` section, update the `default-env-vars` section
```
- FLYTE_AWS_ACCESS_KEY_ID: AKIAYOURKEY
- AWS_DEFAULT_REGION: us-east-2
- FLYTE_AWS_SECRET_ACCESS_KEY: YOUR+SECRET
```
These are the same values as in the storage section above.

3. Add in an section for data proxy
```
remoteData:
region: us-east-2
scheme: aws
signedUrls:
durationMinutes: 3
```
4. Enable databricks plugin
```shell
task-plugins:
default-for-task-types:
container: container
container_array: k8s-array
sidecar: sidecar
ray: ray
spark: databricks
enabled-plugins:
- container
- databricks
- ray
- sidecar
- k8s-array
```
1. Update the Flyte deployment
```
kubectl -n flyte edit deploy sandbox-flyte-binary
```
Add an environment variable for your databricks token to the flyte pod
```
- name: FLYTE_SECRET_FLYTE_DATABRICKS_API_TOKEN
value: dapixyzxyzxyz
```
```
1. Restart the deployment
```
kubectl -n flyte rollout restart deploy sandbox-flyte-binary
```
### Databricks Code
You'll need to upload an [entrypoint](https://gist.github.com/pingsutw/482e7f0134414dac437500344bac5134) file to your dbfs (or S3). This is the referenced gist from the primary [Databricks plugin documentation](https://github.com/flyteorg/flyte/blob/master/docs/deployment/plugin_setup/webapi/databricks.rst) as well, which currently only covers the `flyte-core` Helm chart installation.
You'll need to upload an [entrypoint](https://gist.github.com/pingsutw/482e7f0134414dac437500344bac5134) file to your dbfs (or S3). This is the referenced gist from the primary [Databricks plugin documentation](https://github.com/flyteorg/flyte/blob/master/docs/deployment/plugin_setup/webapi/databricks.rst) as well, which currently only covers the `flyte-core` Helm chart installation.
### User Code
1. a sample py file that has a simple spark task.
```python
Expand Down Expand Up @@ -194,7 +205,9 @@ if __name__ == "__main__":
)
```

2. Build a custom image for spark clusters

```dockerfile
FROM databricksruntime/standard:11.3-LTS
ENV PATH $PATH:/databricks/python3/bin
Expand All @@ -205,13 +218,15 @@ RUN /databricks/python3/bin/pip install awscli flytekitplugins-spark==v1.3.0b5
# Copy the actual code
COPY ./ /databricks/driver
```

3. image building command if necessary.

```shell
docker build -t pingsutw/databricks:test -f Dockerfile .
```

4. pyflyte command to register the flyte workflow and task.

```shell
pyflyte --config ~/.flyte/config-sandbox.yaml register --destination-dir . --image pingsutw/databricks:test databricks.py
```

Loading

0 comments on commit 8b2bd44

Please sign in to comment.