-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit a6d9873
Showing
112 changed files
with
12,483 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: ee2edea7cd0e405e075f4650e6ba2801 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
Welcome to LmBISON - RIIS Analysis | ||
====================================== | ||
|
||
The BISON repository contains data and scripts to annotate GBIF occurrence records | ||
with information regarding geographic location and USGS RIIS status of the record. | ||
|
||
|
||
Current | ||
------------ | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
pages/about | ||
pages/workflow | ||
|
||
Setup AWS | ||
------------ | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
pages/aws/aws_setup | ||
|
||
Using BISON | ||
------------ | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
pages/interaction/about | ||
|
||
History | ||
------------ | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
pages/history/year4_planB | ||
pages/history/year4_planA | ||
pages/history/year3 | ||
pages/history/year5 | ||
pages/history/aws_experiments | ||
|
||
* :ref:`genindex` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
About | ||
======== | ||
|
||
The `Lifemapper BISON repository <https://github.com/lifemapper/bison>`_ is an open | ||
source project supported by USGS award G19AC00211. | ||
|
||
The aim of this repository is to provide a workflow for annotating and analyzing a | ||
large set of United States specimen occurrence records for the USGS BISON project. | ||
|
||
.. image:: ../.static/lm_logo.png | ||
:width: 150 | ||
:alt: Lifemapper |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
Create lambda function to initiate processing | ||
------------------------------------------------ | ||
* Create a lambda function for execution when the trigger condition is activated, | ||
aws/events/bison_find_current_gbif_lambda.py | ||
|
||
* This trigger condition is a file deposited in the BISON bucket | ||
|
||
* TODO: change to the first of the month | ||
|
||
* The lambda function will delete the new file, and test the existence of | ||
GBIF data for the current month | ||
|
||
* TODO: change to mount GBIF data in Redshift, subset, unmount | ||
|
||
Edit the execution role for lambda function | ||
-------------------------------------------- | ||
* Under Configuration/Permissions see the Execution role Role name | ||
(bison_find_current_gbif_lambda-role-fb05ks88) automatically created for this function | ||
* Open in a new window and under Permissions policies, Add permissions | ||
|
||
* bison_s3_policy | ||
* redshift_glue_policy | ||
|
||
Create trigger to initiate lambda function | ||
------------------------------------------------ | ||
|
||
* Check for existence of new GBIF data | ||
* Use a blueprint, python, "Get S3 Object" | ||
* Function name: bison_find_current_gbif_lambda | ||
* S3 trigger: | ||
|
||
* Bucket: arn:aws:s3:::gbif-open-data-us-east-1 | ||
|
||
* Create a rule in EventBridge to use as the trigger | ||
|
||
* Event source : AWS events or EventBridge partner events | ||
* Sample event, "S3 Object Created", aws/events/test_trigger_event.json | ||
* Creation method: Use pattern form | ||
* Event pattern | ||
|
||
* Event Source: AWS services | ||
* AWS service: S3 | ||
* Event type: Object-Level API Call via CloudTrail | ||
* Event Type Specifications | ||
|
||
* Specific operation(s): GetObject | ||
* Specific bucket(s) by name: arn:aws:s3:::bison-321942852011-us-east-1 | ||
|
||
* Select target(s) | ||
|
||
* AWS service | ||
|
||
|
||
AWS lambda function that queries Redshift | ||
-------------------------------------------- | ||
|
||
https://repost.aws/knowledge-center/redshift-lambda-function-queries | ||
|
||
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift-data/client/execute_statement.html | ||
|
||
* Connect to a serverless workgroup (bison), namespace (bison), database name (dev) | ||
|
||
* When connecting to a serverless workgroup, specify the workgroup name and database | ||
name. The database user name is derived from the IAM identity. For example, | ||
arn:iam::123456789012:user:foo has the database user name IAM:foo. Also, permission | ||
to call the redshift-serverless:GetCredentials operation is required. | ||
* need redshift:GetClusterCredentialsWithIAM permission for temporary authentication | ||
with a role |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
AWS Resource Setup | ||
******************** | ||
|
||
Create policies and roles | ||
=========================================================== | ||
|
||
The :ref:`_bison_redshift_lambda_role` allows access to the bison Redshift | ||
namespace/workgroup, lambda functions, EventBridge Scheduler, and S3 data. | ||
The Trusted Relationships on this policy allow each to | ||
|
||
The :ref:`_bison_redshift_lambda_role_trusted_relationships policy allow | ||
|
||
The :ref:`_bison_ec2_s3_role` allows an EC2 instance to access the public S3 data and | ||
the bison S3 bucket. Its trust relationship grants AssumeRole to ec2 and s3 services. | ||
This role will be assigned to an EC2 instance that will initiate | ||
computations and compute matrices. | ||
|
||
The :ref:`_bison_redshift_s3_role` allows Redshift to access public S3 data and | ||
the bison S3 bucket, and allows Redshift to perform glue functions. Its trust | ||
relationship grants AssumeRole to redshift service. | ||
|
||
Make sure that the same role granted to the namespace is used for creating an external | ||
schema and lambda functions. When mounting external data as a redshift table to the | ||
external schema, you may encounter an error indicating that the "dev" database does not | ||
exist. This refers to the external database, and may indicate that the role used by the | ||
command and/or namespace differs from the role granted to the schema upon creation. | ||
|
||
Redshift Namespace and Workgroup | ||
=========================================================== | ||
|
||
Namespace and Workgroup | ||
------------------------------ | ||
|
||
A namespace is storage-related, with database objects and users. A workspace is | ||
a collection of compute resources such as security groups and other properties and | ||
limitations. | ||
https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-workgroup-namespace.html | ||
|
||
External Schema | ||
------------------------ | ||
The command below creates an external schema, redshift_spectrum, and also creates a | ||
**new** external database "dev". It appears in the console to be the same "dev" | ||
database that contains the public schema, but it is separate. Also note the IAM role | ||
used to create the schema must match the role attached to the namespace:: | ||
|
||
CREATE external schema redshift_spectrum | ||
FROM data catalog | ||
DATABASE dev | ||
IAM_ROLE 'arn:aws:iam::321942852011:role/bison_redshift_s3_role' | ||
CREATE external database if NOT exists; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
EC2 instance creation | ||
=========================================================== | ||
|
||
Create (Console) | ||
-------------------------------- | ||
* Future - create and save an AMI or template for consistent reproduction | ||
* via Console, without launch template: | ||
|
||
* Ubuntu Server 24.04 LTS, SSD Volume Type (free tier eligible), Arm architecture | ||
* Instance type t4g.micro (1gb RAM, 2 vCPU) | ||
* Security Group: launch-wizard-1 | ||
* 15 Gb General Purpose SSD (gp3) | ||
* Modify `IAM instance profile` - to role created for s3 access (bison_ec2_s3_role) | ||
* Use the security group created for this region (currently launch-wizard-1) | ||
* (no?) Use the bison-ec2-role for this instance | ||
* Assign your key pair to this instance | ||
|
||
* If you do not have a keypair, create one for SSH access (tied to region) on initial | ||
EC2 launch | ||
* One chance only: Download the private key (.pem file for Linux and OSX) to local | ||
machine | ||
* Set file permissions to 400 | ||
|
||
* Launch | ||
* Test by SSH-ing to the instance with the Public IPv4 DNS address, with default user | ||
(for ubuntu instance) `ubuntu`:: | ||
|
||
ssh -i .ssh/<aws_keyname>.pem ubuntu@<ec2-xxx-xxx-xxx-xxx.compute-x.amazonaws.com> | ||
|
||
Create an SSH key for Github clone | ||
----------------------------------------------- | ||
|
||
* Generate an SSH key:: | ||
|
||
ssh-keygen -t ed25519 -C "bison@whereever" | ||
|
||
* Add the public key to your Github profile, | ||
https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent | ||
|
||
|
||
Install software | ||
--------------------------- | ||
|
||
* Update apt and install unzip:: | ||
|
||
sudo apt update | ||
sudo apt install unzip | ||
|
||
* AWS Client tools | ||
|
||
* Use instructions to install the awscli package (Linux): | ||
https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html. | ||
* Make sure to use the instructions with the right architecture (x86 vs Arm) | ||
* Test by listing the contents of bison bucket (permission from role bison_ec2_s3_role):: | ||
|
||
aws s3 ls s3://bison-321942852011-us-east-1/input/ | ||
|
||
* install docker for BISON application deployment:: | ||
|
||
sudo apt install docker.io | ||
sudo apt install docker-compose-v2 | ||
|
||
* BISON code (for building docker image during development/testing) | ||
|
||
* Download the BISON code repository:: | ||
|
||
git clone https://github.com/lifemapper/bison.git | ||
|
||
* Edit the .env.conf (Docker environment variables) and nginx.conf (webserver address) | ||
files with the FQDN of the server being deployed. For development/testing EC2 servers, | ||
use the Public IPv4 DNS for the EC2 instance. | ||
|
||
for API deployment | ||
---------------------------------- | ||
SSL certificates | ||
................... | ||
|
||
* install apache for getting/managing certificates | ||
* install certbot for Let's Encrypt certificates:: | ||
|
||
sudo apt install apache2 certbot plocate | ||
|
||
* Create an SSL certificate on the EC2 instance. | ||
* For testing/development, use self-signed certificates because Cerbot will not create | ||
certificates for an AWS EC2 Public IPv4 DNS, or an IP address. | ||
|
||
* Edit the compose.yml file under `nginx` service (which intercepts all web | ||
requests) in `volumes` to bind-mount the directory containing self-signed | ||
certificates to /etc/letsencrypt:: | ||
|
||
services: | ||
... | ||
nginx: | ||
... | ||
volumes: | ||
- "/home/ubuntu/certificates:/etc/letsencrypt:ro" | ||
|
||
Configure for AWS access | ||
-------------------- | ||
|
||
In the home directory, create the directory and file .aws/config, with the following | ||
content:: | ||
|
||
[default] | ||
region = us-east-1 | ||
output = json | ||
duration_seconds = 43200 | ||
credential_source = Ec2InstanceMetadata | ||
|
||
|
||
EC2 for Workflow Tasks | ||
--------------------------------- | ||
|
||
Credentials | ||
.............. | ||
|
||
EC2 must be set up with a role for temporary credentials to enable applications to | ||
retrieve those credentials for AWS permissions to other services (i.e. S3). | ||
By default, the instance allows IMDSv1 or IMDSv2, though making v2 required is recommended. | ||
|
||
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` \ | ||
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/s3access | ||
|
||
Using IMDSv2, first get a token:: | ||
|
||
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` | ||
|
||
Then get top level metadata:: | ||
|
||
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/ | ||
|
||
To set up config to use/assume a role: | ||
https://docs.aws.amazon.com/sdkref/latest/guide/feature-assume-role-credentials.html | ||
|
||
More info: | ||
|
||
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html | ||
|
||
Hop Limit for AWS communication | ||
................................ | ||
|
||
* Extend the hop limit for getting metadata about permissions to 2 | ||
host --> dockercontainer --> metadata | ||
https://specifydev.slack.com/archives/DQSAVMMHN/p1717706137817839 | ||
|
||
* SSH to the ec2 instance, then run :: | ||
|
||
aws ec2 modify-instance-metadata-options \ | ||
--instance-id i-082e751b94e476987 \ | ||
--http-put-response-hop-limit 2 \ | ||
--http-endpoint enabled | ||
|
||
* or in console, add metadata tag/value HttpPutResponseHopLimit/2 |
Oops, something went wrong.