Skip to content

Commit

Permalink
Add gsutil script and Assay.Works specifics (CU-DBMI#1)
Browse files Browse the repository at this point in the history
* changes for assayworks project specifics

* update shell script with bucket name

* detemplatize; gsutil instructions

* assay works docs context; simplify source dir
  • Loading branch information
d33bs committed Jan 26, 2023
1 parent ecac36c commit 784c5ea
Show file tree
Hide file tree
Showing 14 changed files with 138 additions and 22 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,6 @@ cython_debug/
# dagger ignores
cue.mod/pkg
cue.mod/dagger.*

# data ignores
*.json
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,8 @@ repos:
- id: terraform_validate
- id: terraform_tflint
- id: terraform_tfsec
# checking yaml formatting
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.9.0.2
hooks:
- id: shellcheck
38 changes: 33 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Template: Google Cloud - Cloud Storage Bucket
# Google Cloud (GC) Assay.Works Cloud Storage Bucket

```mermaid
flowchart LR
Expand All @@ -7,21 +7,37 @@ flowchart LR
```

Template for creating [Cloud Storage](https://cloud.google.com/storage/) bucket on [Google Cloud](https://cloud.google.com/) with a service account and related key to enable data or file upload and use.
Used for creating [Cloud Storage](https://cloud.google.com/storage/) bucket on [Google Cloud](https://cloud.google.com/) with a service account and related key to enable data or file upload and use.

This repository uses [Terraform](https://developer.hashicorp.com/terraform/intro) to maintain cloud resources. See [terraform/README.md](terraform/README.md) for documentation on Terraform elements.

## 👥 Roles

See below for an overview of roles which are important to context for various parts of this repository.

- __Terraform Administrator__: this role involves administrating over cloud resources created with Terraform. Content found under the `terraform` directory and following steps under [Tutorial: Bucket Infrastructure](#bucket-infrastructure) apply to this role.
- __Assay.Works Data Provider__: this role involves using content under `utilties` to synchronize (add, update, or remove) data to the bucket created by a Terraform Administrator. Instructions specific to this role are provided under [`utilities/README.md`](utilities/README.md).

## 🛠️ Install

See below for steps which are required for installation.

1. [Create a repository from this template](https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template).
1. [Clone the repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository)
1. [Clone the repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) to your development environment.
1. Install [Terraform](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli)
1. Configure Terraform as necessary to your Google Cloud environment.
1. __Optional__: make changes to script under `./utilities/example_gsutil_sync.sh` in preparation for synchronizing data to or from the bucket.

## :books:Tutorial

See below for a brief tutorial on how to implement the work found in this repository for your needs.
See below for brief tutorials on how to implement the work found in this repository for your needs.

### 🏗️ Bucket Infrastructure

These steps cover how to control the infrastructure found within this repository.

| <span style="text-align:left;float:left;font-weight:normal;">:exclamation: Please note: after applying the Terraform code with the steps below, a `service-account.json` file is added to your local directory which contains sensitive data which may enable access to your cloud resources. __This file should not be checked into source control!__</span> |
|-----------------------------------------|

1. Make adjustments to the content as necessary (for example, this readme file).
1. Fill in [terraform/variables.tf](terraform/variables.tf) with values that make sense for your initiative.
Expand All @@ -31,7 +47,19 @@ See below for a brief tutorial on how to implement the work found in this reposi

When finished with the work, optionally use the following step.

- __OPTIONAL__: Terraform destroy: : to destroy all created resources use command `terraform -chdir=terraform destroy`
- __OPTIONAL__: Terraform __destroy__: to destroy all created resources use command `terraform -chdir=terraform destroy`

### 📁 Using the Bucket

These steps cover an example of how to use the bucket with an example [gsutil](https://cloud.google.com/storage/docs/gsutil) script after creating the surrounding infrastructure. These steps presume `gsutil` has already been installed.

| <span style="text-align:left;float:left;font-weight:normal;"> ⚠️ Please note: be certain data you upload to Google Cloud abide any data governance or privacy restrictions applicable to your environment. The steps below do not inherently check or validate that data, the bucket, or the Google Cloud environment follow these policies. </span> |
|-----------------------------------------|

1. Change directory into `./utilities`
1. Ensure `service-account.json` key is found within `./utilities` directory (becomes available after infrastructure steps are taken with Terraform).
1. Make changes to `gsutil rsync ...` line to specify the local data location and the target bucket.
1. Run the `gsutil_sync.sh` script (for example: `sh ./gsutil_sync.sh`).

## 🧑‍💻 Development

Expand Down
6 changes: 3 additions & 3 deletions project.cue
Original file line number Diff line number Diff line change
Expand Up @@ -67,11 +67,11 @@ import "universe.dagger.io/docker"
workdir: "/lint"
}
},
// git init for pre-commit caching
// git init for pre-commit caching
bash.#Run & {
script: contents: """
git init
"""
git init
"""
},
docker.#Copy & {
contents: filesystem
Expand Down
5 changes: 2 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
[tool.poetry]
name = "gc-cloud-storage-bucket"
name = "gc-assayworks-bucket"
version = "0.0.1"
description = "Template for creating Cloud Storage bucket on Google Cloud."
description = "Cloud Storage bucket on Google Cloud for Assay.Works data transfer."
authors = ["d33bs <[email protected]>"]
license = "BSD-3-Clause license"
readme = "README.md"
packages = [{include = "gc_cloud_storage_bucket"}]

[tool.poetry.dependencies]
python = "^3.9"
Expand Down
22 changes: 21 additions & 1 deletion terraform/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 7 additions & 4 deletions terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | ~> 1.3.5 |
| <a name="requirement_google"></a> [google](#requirement\_google) | ~> 4.50.0 |
| <a name="requirement_local"></a> [local](#requirement\_local) | ~> 2.3.0 |

## Providers

| Name | Version |
|------|---------|
| <a name="provider_google"></a> [google](#provider\_google) | 4.50.0 |
| <a name="provider_local"></a> [local](#provider\_local) | 2.3.0 |

## Modules

Expand All @@ -23,18 +25,19 @@ No modules.
| Name | Type |
|------|------|
| [google_service_account.service_account](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/service_account) | resource |
| [google_service_account_key.key](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/service_account_key) | resource |
| [google_storage_bucket.target_bucket](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_bucket) | resource |
| [google_storage_bucket_iam_member.member](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_bucket_iam_member) | resource |
| [google_storage_hmac_key.key](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/storage_hmac_key) | resource |
| [local_file.service_account_key](https://registry.terraform.io/providers/hashicorp/local/latest/docs/resources/file) | resource |

## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_bucket_name"></a> [bucket\_name](#input\_bucket\_name) | Name for the bucket being created | `string` | `"lab-initiative-bucket"` | no |
| <a name="input_initiative_label"></a> [initiative\_label](#input\_initiative\_label) | Label for specific initiative useful for differentiating between various resources | `string` | `"lab-initiative"` | no |
| <a name="input_bucket_name"></a> [bucket\_name](#input\_bucket\_name) | Name for the bucket being created | `string` | `"waylab-assayworks-bucket"` | no |
| <a name="input_initiative_label"></a> [initiative\_label](#input\_initiative\_label) | Label for specific initiative useful for differentiating between various resources | `string` | `"waylab-assayworks"` | no |
| <a name="input_project"></a> [project](#input\_project) | tf variables project to create the related resources in | `string` | `"cuhealthai-sandbox"` | no |
| <a name="input_region"></a> [region](#input\_region) | Region to be used with the project resources | `string` | `"us-central1"` | no |
| <a name="input_region"></a> [region](#input\_region) | Region to be used with the project resources | `string` | `"europe-west4"` | no |

## Outputs

Expand Down
7 changes: 4 additions & 3 deletions terraform/accounts.tf
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ resource "google_service_account" "service_account" {
account_id = "${var.initiative_label}-svc-account"
}

#Create the HMAC key for the associated service account
resource "google_storage_hmac_key" "key" {
service_account_email = google_service_account.service_account.email
#Create a service-account key for the associated service account
resource "google_service_account_key" "key" {
service_account_id = google_service_account.service_account.name
public_key_type = "TYPE_X509_PEM_FILE"
}
5 changes: 5 additions & 0 deletions terraform/local.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# tf local output
resource "local_file" "service_account_key" {
filename = "../utilities/service-account.json"
content = base64decode(google_service_account_key.key.private_key)
}
6 changes: 3 additions & 3 deletions terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@ variable "project" {
# Region to be used with the project resources
variable "region" {
type = string
default = "us-central1"
default = "europe-west4"
}
# Name for the bucket being created
variable "bucket_name" {
type = string
default = "lab-initiative-bucket"
default = "waylab-assayworks-bucket"
}
# Label for specific initiative
# useful for differentiating between
# various resources
variable "initiative_label" {
type = string
default = "lab-initiative"
default = "waylab-assayworks"
}
4 changes: 4 additions & 0 deletions terraform/versions.tf
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,9 @@ terraform {
source = "hashicorp/google"
version = "~> 4.50.0"
}
local = {
source = "hashicorp/local"
version = "~> 2.3.0"
}
}
}
24 changes: 24 additions & 0 deletions utilities/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Assay.Works Data Sync Instructions

Thank you for your help in uploading data as part of this project! Please see the following instructions on uploading data to the Google Cloud bucket.

1. Ensure `service-account.json` key is found within the same directory where script is run.
1. Prepare data to be uploaded under `./data` directory relative to `gsutil_sync.sh` location.
1. Run the `gsutil_sync.sh` script (for example: `sh ./gsutil_sync.sh`).

Please reference the following directory tree structure for an example of what the path should contain:

```shell
.
├── README.md
├── data
│   └── <data to be synchronized>
├── gsutil_sync.sh
└── service-account.json
```

## Additional Notes

- __Alternative data upload path__: if an alternative data upload path is preferred, please reference and update `gsutil_sync.sh` as follows:
- Original: `gsutil rsync ./data gs://waylab-assayworks-bucket`
- Updated: `gsutil rsync <new data location> gs://waylab-assayworks-bucket`
Empty file added utilities/data/.gitkeep
Empty file.
24 changes: 24 additions & 0 deletions utilities/gsutil_sync.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/bin/sh
#
# This file automates how data are sync'd to a
# Google Cloud Cloud Storage bucket using a
# pre-existing service account key.
#
# Notes:
# ----------------------------------------------------
# presumes gsutil has already been installed and is
# available in the path.
# see gsutil docs for more information:
# https://cloud.google.com/storage/docs/gsutil_install

# authenticate gcloud for the service account
# note: this is the preferred method for authenticating gsutil
# see the following for more details:
# https://cloud.google.com/storage/docs/gsutil/commands/config#configuring-service-account-credentials
gcloud auth activate-service-account --key-file=./service-account.json

# synchronize data from local directory `./data`
# to bucket lab-initiative-bucket
# see the following for more details:
# https://cloud.google.com/storage/docs/gsutil/commands/rsync
gsutil rsync ./data gs://waylab-assayworks-bucket

0 comments on commit 784c5ea

Please sign in to comment.