This Terraform script creates following Alarms to tagged EC2 instances.
- CPUUtilization
- mem_used_percent
- disk_used_percent
-
Proper AWS CLI permissions to run Terraform (EC2, CloudWatch, SNS)
-
A bunch of running EC2 instances with at least following tags
- Name:Value (Used to name the CloudWatch Metrics)
- Stack:Value (Used to fetch instance list)
-
CloudWatch Agent Must be running on all machines to fetch metrics like mem_used_percent and disk dimensions.
-
Besides Terraform,
jq
andaws cli
utilities.
JSON data will be regenerated by the null_resource
block for every run of terraform apply
or terraform plan
if the depends_on
line is uncommented in each module block. Normally one would not need to regenerate data for every terraform run when interacting manually, so the depends_on
line can be commented out to drastically cut time. If this code is run in a pipeline on a schedule (maybe once or twice a week), leave it uncommented so the pipeline always has the correct instance data. Regenerating data too often can lead to AWS API limits.
There is a high liklihood that your environment is organized differently, so please edit the following lines in instances.sh
with the correct tags for your machines.
--query 'Reservations[*].Instances[?Tags[?Key==`Environment` && (Value!=`Production` && Value!=`DR` && Value!=`Prod`)]].[InstanceId]' \
> nonprod_instance-ids.json
--query 'Reservations[*].Instances[*].[InstanceId]' \
--filters "Name=tag:Environment,Values=DR,Production,Prod" \
> prod_instance-ids.json
-
terraform init
-
terraform plan -var "profile=default" -var "region=us-east-1" -var "tag_name=Stack" -var "tag_value=Test"
-var "threshold_ec2_cpu=70" -var "threshold_ec2_mem=90" -var "threshold_ec2_disk=90"
-var 'sns_arn=["arn:aws:sns:us-east-1:000000000000:test"]'
-
terraform apply -var "profile=default" -var "region=us-east-1" -var "tag_name=Stack" -var "tag_value=Test"
-var "threshold_ec2_cpu=70" -var "threshold_ec2_mem=90" -var "threshold_ec2_disk=90"
-var 'sns_arn=["arn:aws:sns:us-east-1:000000000000:test"]'