Automated ELK Stack Deployment
The files in this repository were used to configure the network depicted below.
These files have been tested and used to generate an automated ELK Stack Deployment on Azure. They can be used to either recreate the entire deployment figured below. Otherwise, select portions of the YAML files may be used to install only certain pieces of it, for example, Filebeat and Metricbeat.
- install-elk.yml
- filebeat-config.yml
- filebeat-playbook.yml
- metricbeat-config.yml
- metricbeat-playbook.yml
This document contains the following details:
- Description of the Topology
- Access Policies
- ELK Configuration
- Beats in Use
- Machines Being Monitored
- How to Use the Ansible Build
The main purpose of this network is to expose a load-balanced and monitored instance of DVWA, the D*mn Vulnerable Web Application.
Load balancing ensures that the application will be highly available, in addition to restricting inbound access to the network.
What aspect of security do load balancers protect?
- According to Azure security baseline for Azure Load Balancer, the load balancer's main purpose is to distribute web traffic across multiple servers. In our network, the load balancer was installed in front of the VM to
- protect Azure resources within virtual networks.
- monitor and log the configuration and traffic of virtual networks, subnets, and NICs.
- protect critical web applications
- deny communications with known malicious IP addresses
- record network packets
- deploy network-based intrusion detection/intrusion prevention systems (IDS/IPS)
- manage traffic to web applications
- minimize complexity and administrative overhead of network security rules
- maintain standard security configurations for network devices
- document traffic configuration rules
- use automated tools to monitor network resource configurations and detect changes
What is the advantage of a jump box?
- A Jump Box or a "Jump Server" is a gateway on a network used to access and manage devices in different security zones. A Jump Box acts as a "bridge" between two trusted networks zones and provides a controlled way to access them. We can block the public IP address associated with the VM. It helps to improve security also prevents all Azure VM’s to expose to the public.
Integrating an Elastic Stack server allows us to easily monitor the vulnerable VMs for changes to their file systems and system metrics such as privilege escalation failures, SSH logins activity, CPU and memory usage, etc.
What does Filebeat watch for?
- Filebeat helps keep things simple by offering a lightweight way (low memory footprint) to forward and centralize logs, files and watches for changes.
What does Metricbeat record?
- Metricbeat helps monitor servers by collecting metrics from the system and services running on the server so it records machine metrics and stats, such as uptime.
The configuration details of each machine may be found below.
Name | Function | IP Address | Operating System |
---|---|---|---|
Jump-Box-Provisioner | Gateway | 44.77.55.33 ; 10.0.0.4 | Linux |
Web-1 | webserver | 10.0.0.5 | Linux |
Web-2 | webserver | 10.0.0.6 | Linux |
ELKServer | Kibana | 104.45.159.216 ; 10.1.0.4 | Linux |
RedTeam-LB | Load Balancer | 40.122.215.16 | DVWA |
In addition to the above, Azure has provisioned a load balancer in front of all machines except for the jump box. The load balancer's targets are organized into availability zones: Web-1 + Web-2
The machines on the internal network are not exposed to the public Internet.
Only the Jump Box machine can accept connections from the Internet. Access to this machine is only allowed from the following IP addresses: 47.185.204.83 Machines within the network can only be accessed by SSH from Jump Box.
A summary of the access policies in place can be found in the table below.
Name | Publicly Accessible | Allowed IP Addresses |
---|---|---|
Jump-Box-Provisioner | Yes | 47.185.204.83 |
ELKServer | Yes | 47.185.204.83:5601 |
DVWA 1 | No | 10.0.0.1-254 |
DVWA 2 | No | 10.0.0.1-254 |
Ansible was used to automate the configuration of the ELK server. No configuration was performed manually, which is advantageous because Ansible can be used to easily configure new machines, update programs, and configurations on hundreds of servers at once, and the best part is that the process is the same whether we're managing one machine or dozens and even hundreds.
What is the main advantage of automating configuration with Ansible?
- Ansible is focusing on bringing a server to a certain state of operation.
Click here to view ELK Configuration.
We will configure an ELK server within virtual network. Specifically,
- Deployed a new VM on our virtual network.
- Created an Ansible play to install and configure an ELK instance.
- Restricted access to the new server.
- Create a new vNet located in the same resource group we have been using.
- Make sure this vNet is located in a new region and not the same region as our other VM's, which region we select is not important as long as it's a different US region than our other resources, we can also leave the rest of the settings at default.
- In this example, that the IP Addressing has automatically created a new network space of 10.1.0.0/16. If our network is different (10.2.0.0 or 10.3.0.0) it is ok as long as we accept the default settings. Azure automatically creates a network that will work.
- Create a Peer connection between our vNets. This will allow traffic to pass between our vNets and regions. This peer connection will make both a connection from our first vNet to our second vNet and a reverse connection from our second vNet back to our first vNet. This will allow traffic to pass in both directions.
- Navigate to
Virtual Network
in the Azure Portal. - Select our new vNet to view it's details.
- Under
Settings
on the left side, selectPeerings
. - Click the + Add button to create a new Peering.
- A unique name of the connection from our new vNet to our old vNet such as depicted example below.
- Choose our original RedTeam vNet in the dropdown labeled
Virtual Network
. - Leave all other settings at their defaults.
- Create a new Ubuntu VM in our virtual network with the following configurations:
-
The VM must have a public IP address.
-
The VM must be added to the new region in which we created our new vNet. We want to make sure we select our new vNEt and allow a new basic Security Group to be created for this VM.
-
The VM must use the same SSH keys as our WebserverVM's. This should be the ssh keys that were created on the Ansible container that's running on our jump box.
-
After creating the new VM in Azure, verify that it works as expected by connecting via SSH from the Ansible container on our jump box VM.
-
ssh sysadmin@<jump-box-provisioner>
-
sudo docker container list -a
-
sudo docker start goofy_wright && sudo docker attach goofy_wright
-
- Copy the SSH key from the Ansible container on our jump box:
- RUN
cat id_rsa.pub
Configure a new VM using that SSH key.
- RUN
In this step, we have to:
- Add our new VM to the Ansible hosts file.
- Create a new Ansible playbook to use for our new ELK virtual machine.
- From our Ansible container, add the new VM to Ansible's hosts file.
- RUN
nano /etc/ansible/hosts
and put our IP withansible_python_interpreter=/usr/bin/python3
- RUN
- In the below play, representing the header of the YAML file, I defined the title of my playbook based on the playbook's main goal by setting the keyword 'name:' to: "Configure Elk VM with Docker". Next, I defined the user account for the SSH connection, by setting the keyword 'remote_user:' to "sysadmin" then activated privilege escalation by setting the keyword 'become:' to "true".
The playbook implements the following tasks:
---
- name: Configure Elk VM with Docker
hosts: elk
remote_user: sysadmin
become: true
tasks:
In this play, the ansible package manager module is tasked with installing docker.io. The keyword 'update_cache:' is set to "yes" to download package information from all configured sources and their dependencies prior to installing docker, it is necessary to successfully install docker in this case. Next the keyword 'state:' is set to "present" to verify that the package is installed.
# Use apt module
- name: Install docker.io
apt:
update_cache: yes
name: docker.io
state: present
In this play, the ansible package manager module is tasked with installing 'pip3', a version of the 'pip installer' which is a standard package manager used to install and maintain packages for Python. The keyword 'force_apt_get:' is set to "yes" to force usage of apt-get instead of aptitude. The keyword 'state:' is set to "present" to verify that the package is installed.
# Use apt module
- name: Install pip3
apt:
force_apt_get: yes
name: python3-pip
state: present
In this play the pip installer is used to install docker and also verify afterwards that docker is installed ('state: present').
# Use pip module
- name: Install Docker python module
pip:
name: docker
state: present
In this play, the ansible sysctl module configures the target virtual machine (i.e., the Elk server VM) to use more memory. On newer version of Elasticsearch, the max virtual memory areas is likely to be too low by default (ie., 65530) and will result in the following error: "elasticsearch | max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]", thus requiring the increase of vm.max_map_count to at least 262144 using the sysctl module (keyword 'value:' set to "262144"). The keyword 'state:' is set to "present" to verify that the change was applied. The sysctl command is used to modify Linux kernel variables at runtime, to apply the changes to the virtual memory variables, the new variables need to be reloaded so the keyword 'reload:' is set to "yes" (this is also necessary in case the VM has been restarted).
# Use sysctl module
- name: Use more memory
sysctl:
name: vm.max_map_count
value: "262144"
state: present
reload: yes
In this play, the ansible docker_container module is used to download and launch our Elk container. The container is pulled from the docker hub repository. The keyword 'image:' is set with the value "sebp/elk:761", "sebp" is the creator of the container (i.e., Sebastien Pujadas). "elk" is the container and "761" is the version of the container. The keyword 'state:' is set to "started" to start the container upon creation. The keyword 'restart_policy:' is set to "always" and will ensure that the container restarts if we restart our web vm. Without it, we will have to restart our container when we restart the machine. The keyword 'published_ports:' is set with the 3 ports that are used by our Elastic stack configuration, i.e., "5601" is the port used by Kibana, "9200" is the port used by Elasticsearch for requests by default and "5400" is the default port Logstash listens on for incoming Beats connections (we will go over the Beats we installed in the following section "Target Machines & Beats").
# Use docker_container module
- name: download and launch a docker elk container
docker_container:
name: elk
image: sebp/elk:761
state: started
restart_policy: always
published_ports:
- 5601:5601
- 9200:9200
- 5044:5044
In this play, the ansible systemd module is used to start docker on boot, setting the keyword 'enabled:' to "yes".
# Use systemd module
- name: Enable service docker on boot
systemd:
name: docker
enabled: yes
Now we can start launching and exposing the container by run
ansible-playbook install-elk.yml
The following screenshot displays the result of running install-elk.yml
SSH to our container: ssh [email protected]
and RUN sudo docker ps
The following screenshot displays the result of running docker ps
after successfully configuring the Elastic Stack instance.
Logging into the Elk server and manually launch the ELK container with:
sudo docker start elk
then curl http://localhost:5601/app/kibana
does return HTML.
The following screenshot displays the result of running curl
after start ELK container
This step is to restrict access to the ELK VM using Azure's network security groups (NSGs). We need to add public IP address to a whitelist, just as we did when clearing access to jump box.
Go to Network Security Group to config our host IP to Kibana as follow
Then try to access web browser to http://<your.ELK-VM.External.IP>:5601/app/kibana
This ELK server is configured to monitor the following machines:
- Web-1 (DVWA 1) | 10.0.0.5
- Web-2 (DVWA 2) | 10.0.0.6
I have installed the following Beats on these machines:
- Filebeat
- Metricbeat
Click here to view Target Machines & Beats.
These Beats allow us to collect the following information from each machine:
Filebeat
: Filebeat detects changes to the filesystem. I use it to collect system logs and more specifically, I use it to detect SSH login attempts and failed sudo escalations.
We will create a filebeat-config.yml and metricbeat-config.yml configuration files, after which we will create the Ansible playbook files for both of them.
Once we have this file on our Ansible container, edit it as specified:
- The username is elastic and the password is changeme.
- Scroll to line #1106 and replace the IP address with the IP address of our ELK machine. output.elasticsearch: hosts: ["10.1.0.4:9200"] username: "elastic" password: "changeme"
- Scroll to line #1806 and replace the IP address with the IP address of our ELK machine. setup.kibana: host: "10.1.0.4:5601"
- Save both files filebeat-config.yml and metricbeat-config.yml into
/etc/ansible/files/
Next, create a new playbook that installs Filebeat & Metricbeat, and then create a playbook file, filebeat-playbook.yml
& metricbeat-playbook.yml
RUN nano filebeat-playbook.yml
to enable the filebeat service on boot by Filebeat playbook template below:
---
- name: Install and Launch Filebeat
hosts: webservers
become: yes
tasks:
# Use command module
- name: Download filebeat .deb file
command: curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.4.0-amd64.deb
# Use command module
- name: Install filebeat .deb
command: dpkg -i filebeat-7.4.0-amd64.deb
# Use copy module
- name: Drop in filebeat.yml
copy:
src: /etc/ansible/roles/install-filebeat/files/filebeat-config.yml
dest: /etc/filebeat/filebeat.yml
# Use command module
- name: Enable and Configure System Module
command: filebeat modules enable system
# Use command module
- name: Setup filebeat
command: filebeat setup
# Use command module
- name: Start filebeat service
command: service filebeat start
# Use systemd module
- name: Enable service filebeat on boot
systemd:
name: filebeat
enabled: yes
- RUN
ansible-playbook filebeat-playbook.yml
Verify that our playbook is completed by navigate back to the Filebeat installation page on the ELK server GUI
Metricbeat
: Metricbeat detects changes in system metrics, such as CPU usage and memory usage.
RUN nano metricbeat-playbook.yml
to enable the metricbeat service on boot by Metricbeat playbook template below:
---
- name: Install and Launch Metricbeat
hosts: webservers
become: true
tasks:
# Use command module
- name: Download metricbeat
command: curl -L -O https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-7.4.0-amd64.deb
# Use command module
- name: install metricbeat
command: dpkg -i metricbeat-7.4.0-amd64.deb
# Use copy module
- name: drop in metricbeat config
copy:
src: /etc/ansible/roles/install-metricbeat/files/metricbeat-config.yml
dest: /etc/metricbeat/metricbeat.yml
# Use command module
- name: enable and configure docker module for metric beat
command: metricbeat modules enable docker
# Use command module
- name: setup metric beat
command: metricbeat setup
# Use command module
- name: start metric beat
command: service metricbeat start
# Use systemd module
- name: Enable service metricbeat on boot
systemd:
name: metricbeat
enabled: yes
- RUN
ansible-playbook metricbeat-playbook.yml
Verify that this playbook is completed by navigate back to the Filebeat installation page on the ELK server GUI
Next, I want to verify that filebeat
and metricbeat
are actually collecting the data they are supposed to and that my deployment is fully functioning.
To do so, I have implemented 3 tasks:
- Generate a high amount of failed SSH login attempts and verify that Kibana is picking up this activity.
- Generate a high amount of CPU usage on my web servers and verify that Kibana picks up this data.
- Generate a high amount of web requests to my web servers and make sure that Kibana is picking them up.
Click here to view Using the Playbook.
To generate these attempts I intentionally tried to connect to my Web-1 web server from the Jump Box instead of connecting from my Ansible container in order to generate failed attempts (the server can't verify my private key outside of the container). All ELK Stack scripts refer to Elk_Stack_scripts.sh
To do so I used the following short script to automate 1000 failed SSH login attempts:
for i in {1..1000}; do ssh [email protected]; done
Next We check Kibana to see if the failed attempts were logged:
I can see that all the failed attempts were detected and sent to Kibana.
-
Now Let's breakdown the syntax of my previous short script:
-
for
begins thefor
loop. -
i in
creates a variable namedi
that will hold each numberin
our list. -
{1..1000}
creates a list of 1000 numbers, each of which will be given to ouri
variable. -
;
separates the portions of ourfor
loop when written on one line. -
do
indicates the action taken by each loop. -
ssh [email protected]
is the command run bydo
. -
;
separates the portions of our for loop when it's written on one line. -
done
closes thefor
loop.
-
-
Now I can run the same short script command with a few modifications, to test that
filebeat
is logging all failed attempts on all web servers wherefilebeat
was deployed.
I want to run a command that will attempt to SSH into multiple web servers at the same time and continue forever until I stop it:
while true; do for i in {5..6}; do ssh [email protected].$i; done
-
Now let's breakdown the syntax of my previous short script:
-
while
begins thewhile
loop. -
true
will always be equal totrue
so this loop will never stop, unless we force quit it. -
;
separates the portions of ourwhile
loop when it's written on one line. -
do
indicates the action taken by each loop. -
i in
creates a variable namedi
that will hold each number in our list. -
{5..6}
creates a list of numbers (5 and 6), each of which will be given to ouri
variable. -
ssh [email protected].$i
is the command run bydo
. It is passing in the$i
variable so thewget
command will be run on each server, i.e., 10.0.0.5, 10.0.0.6 (Web-1, Web-2).
-
Next, I want to confirm that metricbeat
is functioning. To do so I will run a linux stress test.
Generating a high amount of CPU usage on my web servers (Web-1, Web-2) and confirming that Kibana is collecting the data.
- From my Jump Box, I start my Ansible container with the following command:
sudo docker start goofy_wright && sudo docker attach goofy_wright
- Then, SSH from my Ansible container to Web-1.
- Install the
stress
module with the following command:
sudo apt install stress
- Run the service with the following command and let the stress test run for a few minutes:
sudo stress --cpu 1
- Note: The stress program will run until we quit with Ctrl+C.
Next, view the Metrics page for that VM in Kibana and comparing 2 of web servers to see the differences in CPU usage, confirmed that metricbeat
is capturing the increase in CPU usage due to our stress command:
Another view of the CPU usage metrics Kibana collected:
Generate a high amount of web requests to both web servers and make sure that Kibana is picking them up.
This time we will generate a high amount of web requests directed to one of my web servers. To do so, I will use wget
to launch a DoS attack.
-
Log into my Jump Box Provisioner
-
ssh sysadmin@<jump-box-provisioner>
-
-
We need to add a new firewall rule to allow my Jump Box (10.0.0.4) to connect to my web servers over HTTP on port 80. To do so, I add a new Inbound Security Rule to Red-Team Network Security Group:
-
Run the following command to download the file
index.html
from Web-1 VM:-
wget 10.0.0.5
-
Output of the command:
-
Confirm that the file has been downloaded with the
ls
command:-
sysadmin@Jump-Box-Provisioner:~$ ls index.html
-
-
Next, run the
wget
command in a loop to generate a very high number of web requests, I will use thewhile
loop:-
while true; do wget 10.0.0.5; done
-
The result is that the Load
, Memory Usage
and Network Traffic
were hit as seen below:
After stopping the wget
command, I can see that thousands of index.html files were created (as seen below).
I can use the following command to clean that up:
rm *
Now if we use ls
again, the directory is a lot cleaner:
I can also avoid the creation of the index.html
file by adding the flag -O
to my command so that I can specify a destination file where all the index.html
files will be concatenated and written to.
Since I don't want to save the index.html
files, I will not write them to any output file but instead send them directly to a directory that doesn't save anything, i.e., /dev/null
.
I use the following command to do that:
while true; do wget 10.0.0.5 -O /dev/null; done
Now, if I want to perform the wget
DoS request on all my web servers, I can use the previous command I used to generate failed SSH login attempts on all my web servers, but this time I will tweak the command to send wget
requests to all webservers:
while true; do for i in {5..6}; do wget -O /dev/null 10.0.0.$i; done
Note that we need to press CTRL + C to stop the wget
requests since I am using the while
loop.
My Elastic Stack server is now functioning and correctly monitoring my load-balanced exposed DVWA web application.
elk-docker
Container Documentation- Elastic.co: The Elastic Stack
- Ansible Documentation
elk-docker
Image Documentation- Virtual Memory Documentation
- Docker Commands Cheatsheet
- Azure's page on peer networks: Network-Peering
- Peer networking in Azure How-To: Global vNet Peering
- Microsoft Support: How to open a support ticket
© Trilogy Education Services, a 2U, Inc., Instructor Jerry Arnold and TAs; Matt McNew, Jansen Russell, Micheal Stephenson.
© The University of Texas at Austin Boot Camp, The Cybersecurity program.