deploy: de480ab

lifemapper · Oct 22, 2024 · a6d9873 · a6d9873
commit a6d9873
Show file tree

Hide file tree

Showing 112 changed files with 12,483 additions and 0 deletions.
diff --git a/.buildinfo b/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: ee2edea7cd0e405e075f4650e6ba2801
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/.doctrees/environment.pickle b/.doctrees/environment.pickle
diff --git a/.doctrees/index.doctree b/.doctrees/index.doctree
diff --git a/.doctrees/pages/about.doctree b/.doctrees/pages/about.doctree
diff --git a/.doctrees/pages/aws/automation.doctree b/.doctrees/pages/aws/automation.doctree
diff --git a/.doctrees/pages/aws/aws_setup.doctree b/.doctrees/pages/aws/aws_setup.doctree
diff --git a/.doctrees/pages/aws/ec2_setup.doctree b/.doctrees/pages/aws/ec2_setup.doctree
diff --git a/.doctrees/pages/aws/roles.doctree b/.doctrees/pages/aws/roles.doctree
diff --git a/.doctrees/pages/history/aws_experiments.doctree b/.doctrees/pages/history/aws_experiments.doctree
diff --git a/.doctrees/pages/history/year3.doctree b/.doctrees/pages/history/year3.doctree
diff --git a/.doctrees/pages/history/year4_planA.doctree b/.doctrees/pages/history/year4_planA.doctree
diff --git a/.doctrees/pages/history/year4_planB.doctree b/.doctrees/pages/history/year4_planB.doctree
diff --git a/.doctrees/pages/history/year5.doctree b/.doctrees/pages/history/year5.doctree
diff --git a/.doctrees/pages/interaction/aws_prep.doctree b/.doctrees/pages/interaction/aws_prep.doctree
diff --git a/.doctrees/pages/interaction/debug.doctree b/.doctrees/pages/interaction/debug.doctree
diff --git a/.doctrees/pages/interaction/deploy.doctree b/.doctrees/pages/interaction/deploy.doctree
diff --git a/.doctrees/pages/workflow.doctree b/.doctrees/pages/workflow.doctree
diff --git a/.nojekyll b/.nojekyll
diff --git a/_images/lm_logo.png b/_images/lm_logo.png
diff --git a/_sources/index.rst.txt b/_sources/index.rst.txt
@@ -0,0 +1,45 @@
+Welcome to LmBISON - RIIS Analysis
+======================================
+
+The BISON repository contains data and scripts to annotate GBIF occurrence records
+with information regarding geographic location and USGS RIIS status of the record.
+
+
+Current
+------------
+
+.. toctree::
+    :maxdepth: 2
+
+    pages/about
+    pages/workflow
+
+Setup AWS
+------------
+
+.. toctree::
+    :maxdepth: 2
+
+    pages/aws/aws_setup
+
+Using BISON
+------------
+
+.. toctree::
+    :maxdepth: 2
+
+    pages/interaction/about
+
+History
+------------
+
+.. toctree::
+    :maxdepth: 2
+
+    pages/history/year4_planB
+    pages/history/year4_planA
+    pages/history/year3
+    pages/history/year5
+    pages/history/aws_experiments
+
+* :ref:`genindex`
diff --git a/_sources/pages/about.rst.txt b/_sources/pages/about.rst.txt
@@ -0,0 +1,12 @@
+About
+========
+
+The `Lifemapper BISON repository <https://github.com/lifemapper/bison>`_ is an open
+source project supported by USGS award G19AC00211.
+
+The aim of this repository is to provide a workflow for annotating and analyzing a
+large set of United States specimen occurrence records for the USGS BISON project.
+
+.. image:: ../.static/lm_logo.png
+  :width: 150
+  :alt: Lifemapper
diff --git a/_sources/pages/aws/automation.rst.txt b/_sources/pages/aws/automation.rst.txt
@@ -0,0 +1,68 @@
+Create lambda function to initiate processing
+------------------------------------------------
+* Create a lambda function for execution when the trigger condition is activated,
+  aws/events/bison_find_current_gbif_lambda.py
+
+  * This trigger condition is a file deposited in the BISON bucket
+
+    * TODO: change to the first of the month
+
+  * The lambda function will delete the new file, and test the existence of
+    GBIF data for the current month
+
+    * TODO: change to mount GBIF data in Redshift, subset, unmount
+
+Edit the execution role for lambda function
+--------------------------------------------
+* Under Configuration/Permissions see the Execution role Role name
+  (bison_find_current_gbif_lambda-role-fb05ks88) automatically created for this function
+* Open in a new window and under Permissions policies, Add permissions
+
+  * bison_s3_policy
+  * redshift_glue_policy
+
+Create trigger to initiate lambda function
+------------------------------------------------
+
+* Check for existence of new GBIF data
+* Use a blueprint, python, "Get S3 Object"
+* Function name: bison_find_current_gbif_lambda
+* S3 trigger:
+
+    * Bucket: arn:aws:s3:::gbif-open-data-us-east-1
+
+* Create a rule in EventBridge to use as the trigger
+
+  * Event source : AWS events or EventBridge partner events
+  * Sample event, "S3 Object Created", aws/events/test_trigger_event.json
+  * Creation method: Use pattern form
+  * Event pattern
+
+    * Event Source: AWS services
+    * AWS service: S3
+    * Event type: Object-Level API Call via CloudTrail
+    * Event Type Specifications
+
+      * Specific operation(s): GetObject
+      * Specific bucket(s) by name: arn:aws:s3:::bison-321942852011-us-east-1
+
+  * Select target(s)
+
+    * AWS service
+
+
+AWS lambda function that queries Redshift
+--------------------------------------------
+
+https://repost.aws/knowledge-center/redshift-lambda-function-queries
+
+https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift-data/client/execute_statement.html
+
+* Connect to a serverless workgroup (bison), namespace (bison), database name (dev)
+
+* When connecting to a serverless workgroup, specify the workgroup name and database
+  name. The database user name is derived from the IAM identity. For example,
+  arn:iam::123456789012:user:foo has the database user name IAM:foo. Also, permission
+  to call the redshift-serverless:GetCredentials operation is required.
+* need redshift:GetClusterCredentialsWithIAM permission for temporary authentication
+  with a role
diff --git a/_sources/pages/aws/aws_setup.rst.txt b/_sources/pages/aws/aws_setup.rst.txt
@@ -0,0 +1,50 @@
+AWS Resource Setup
+********************
+
+Create policies and roles
+===========================================================
+
+The :ref:`_bison_redshift_lambda_role` allows access to the bison Redshift
+namespace/workgroup, lambda functions, EventBridge Scheduler, and S3 data.
+The Trusted Relationships on this policy allow each to
+
+The :ref:`_bison_redshift_lambda_role_trusted_relationships policy allow
+
+The :ref:`_bison_ec2_s3_role` allows an EC2 instance to access the public S3 data and
+the bison S3 bucket.  Its trust relationship grants AssumeRole to ec2 and s3 services.
+This role will be assigned to an EC2 instance that will initiate
+computations and compute matrices.
+
+The :ref:`_bison_redshift_s3_role` allows Redshift to access public S3 data and
+the bison S3 bucket, and allows Redshift to perform glue functions. Its trust
+relationship grants AssumeRole to redshift service.
+
+Make sure that the same role granted to the namespace is used for creating an external
+schema and lambda functions.  When mounting external data as a redshift table to the
+external schema, you may encounter an error indicating that the "dev" database does not
+exist.  This refers to the external database, and may indicate that the role used by the
+command and/or namespace differs from the role granted to the schema upon creation.
+
+Redshift Namespace and Workgroup
+===========================================================
+
+Namespace and Workgroup
+------------------------------
+
+A namespace is storage-related, with database objects and users.  A workspace is
+a collection of compute resources such as security groups and other properties and
+limitations.
+https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-workgroup-namespace.html
+
+External Schema
+------------------------
+The command below creates an external schema, redshift_spectrum, and also creates a
+**new** external database "dev".  It appears in the console to be the same "dev"
+database that contains the public schema, but it is separate.  Also note the IAM role
+used to create the schema must match the role attached to the namespace::
+
+    CREATE external schema redshift_spectrum
+        FROM data catalog
+        DATABASE dev
+        IAM_ROLE 'arn:aws:iam::321942852011:role/bison_redshift_s3_role'
+        CREATE external database if NOT exists;
diff --git a/_sources/pages/aws/ec2_setup.rst.txt b/_sources/pages/aws/ec2_setup.rst.txt
@@ -0,0 +1,153 @@
+EC2 instance creation
+===========================================================
+
+Create (Console)
+--------------------------------
+* Future - create and save an AMI or template for consistent reproduction
+* via Console, without launch template:
+
+  * Ubuntu Server 24.04 LTS, SSD Volume Type (free tier eligible), Arm architecture
+  * Instance type t4g.micro (1gb RAM, 2 vCPU)
+  * Security Group: launch-wizard-1
+  * 15 Gb General Purpose SSD (gp3)
+  * Modify `IAM instance profile` - to role created for s3 access (bison_ec2_s3_role)
+  * Use the security group created for this region (currently launch-wizard-1)
+  * (no?) Use the bison-ec2-role for this instance
+  * Assign your key pair to this instance
+
+    * If you do not have a keypair, create one for SSH access (tied to region) on initial
+      EC2 launch
+    * One chance only: Download the private key (.pem file for Linux and OSX) to local
+      machine
+    * Set file permissions to 400
+
+  * Launch
+  * Test by SSH-ing to the instance with the Public IPv4 DNS address, with default user
+    (for ubuntu instance) `ubuntu`::
+
+    ssh  -i .ssh/<aws_keyname>.pem  ubuntu@<ec2-xxx-xxx-xxx-xxx.compute-x.amazonaws.com>
+
+Create an SSH key for Github clone
+-----------------------------------------------
+
+* Generate an SSH key::
+
+    ssh-keygen -t ed25519 -C "bison@whereever"
+
+* Add the public key to your Github profile,
+  https://docs.github.com/en/authentication/connecting-to-github-with-ssh/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent
+
+
+Install software
+---------------------------
+
+* Update apt and install unzip::
+
+    sudo apt update
+    sudo apt install unzip
+
+* AWS Client tools
+
+    * Use instructions to install the awscli package (Linux):
+      https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html.
+    * Make sure to use the instructions with the right architecture (x86 vs Arm)
+    * Test by listing the contents of bison bucket (permission from role bison_ec2_s3_role)::
+
+        aws s3 ls s3://bison-321942852011-us-east-1/input/
+
+* install docker for BISON application deployment::
+
+    sudo apt install docker.io
+    sudo apt install docker-compose-v2
+
+* BISON code (for building docker image during development/testing)
+
+    * Download the BISON code repository::
+
+      git clone https://github.com/lifemapper/bison.git
+
+    * Edit the .env.conf (Docker environment variables) and nginx.conf (webserver address)
+      files with the FQDN of the server being deployed. For development/testing EC2 servers,
+      use the Public IPv4 DNS for the EC2 instance.
+
+for API deployment
+----------------------------------
+SSL certificates
+...................
+
+* install apache for getting/managing certificates
+* install certbot for Let's Encrypt certificates::
+
+    sudo apt install apache2 certbot plocate
+
+* Create an SSL certificate on the EC2 instance.
+* For testing/development, use self-signed certificates because Cerbot will not create
+  certificates for an AWS EC2 Public IPv4 DNS, or an IP address.
+
+  * Edit the compose.yml file under `nginx` service (which intercepts all web
+    requests) in `volumes` to bind-mount the directory containing self-signed
+    certificates to /etc/letsencrypt::
+
+    services:
+    ...
+      nginx:
+      ...
+      volumes:
+        - "/home/ubuntu/certificates:/etc/letsencrypt:ro"
+
+Configure for AWS access
+--------------------
+
+In the home directory, create the directory and file .aws/config, with the following
+content::
+
+    [default]
+    region = us-east-1
+    output = json
+    duration_seconds = 43200
+    credential_source = Ec2InstanceMetadata
+
+
+EC2 for Workflow Tasks
+---------------------------------
+
+Credentials
+..............
+
+EC2 must be set up with a role for temporary credentials to enable applications to
+retrieve those credentials for AWS permissions to other services (i.e. S3).
+By default, the instance allows IMDSv1 or IMDSv2, though making v2 required is recommended.
+
+TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` \
+&& curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/s3access
+
+Using IMDSv2, first get a token::
+
+    TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
+
+Then get top level metadata::
+
+    curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/
+
+To set up config to use/assume a role:
+https://docs.aws.amazon.com/sdkref/latest/guide/feature-assume-role-credentials.html
+
+More info:
+
+https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html
+
+Hop Limit for AWS communication
+................................
+
+* Extend the hop limit for getting metadata about permissions to 2
+  host --> dockercontainer --> metadata
+  https://specifydev.slack.com/archives/DQSAVMMHN/p1717706137817839
+
+* SSH to the ec2 instance, then run ::
+
+    aws ec2 modify-instance-metadata-options \
+        --instance-id i-082e751b94e476987 \
+        --http-put-response-hop-limit 2 \
+        --http-endpoint enabled
+
+* or in console, add metadata tag/value HttpPutResponseHopLimit/2