The Challenge
When using the Elastic Stack, I've found that Elasticsearch and Beats are great at load-balancing, but Logstash... not so much as they do not support clustering. The issues arise when you have end devices that do not support installing Beats agents which send to two or more Logstash servers. To get around this, you would typically:
- Set up any one of the Logstash servers as the syslog/event destination
- Pro: Only one copy of the data to maintain
- Con: What if that server or Logstash input goes down?
- Set up multiple Logstash servers as the syslog/event destinations
- Pro: More likely to receive the logs during a Logstash server or input outage
- Con: Duplicate copies of the logs to deal with
A third option that I've developed and laid out below contains all of the pros and none of the cons of the above options to provide a highly-available and load-balanced Logstash implementation. This solution is highly scalable as well. Let's get started.
Prerequisites
To begin creating this proof-of-concept solution, I began with a very minimal configuration:
- Two virtual machines within same layer 2 domain (inside VMware Fusion)
- CentOS 7 64-bit
- Logstash 6.4.2
- Java
- Keepalived
- IP Virtual Server (ipvsadm)
- Host machine to generate some traffic (which will generate sample logs)
- Mac OSX
- nc
Log Server Configuration
OS install
For this, I simply created a small VMware Fusion virtual machine using the CentOS 7 Minimal ISO as my installation source (this one in particular). The rest of the machine creation is pretty straight-forward. (Note: I did change from NAT to Wi-Fi networking as I was having very strange issues with NAT networking)
After starting the virtual machine, the install process will begin. This is where you can just do a basic install, but I chose a few options that hit close to home with my day job:
- Partition disk manually if intending to use a security policy (this would otherwise cause a security policy violation that will keep us from proceeding)
- Configure static addressing (my Wi-Fi network within Fusion is 192.168.1.0/24 with a 192.168.1.1 gateway)
Application Install
From here, let the machine reboot and SSH in (it's a much better experience than using the console via Fusion, in my opinion). Some packages can now be added.
- First, the Logstash and Load Balancing pre-requisite applications:
- sudo yum -y install java tcpdump ipvsadm keepalived
- Next, install Logstash per Elastic's best practices:
- sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
- sudo vi /etc/yum.repos.d/logstash.repo
[logstash-6.x]name=Elastic repository for 6.x packages
baseurl=https://artifacts.elastic.co/packages/6.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md - sudo yum -y install logstash
Logstash configuration
There would be no way to show off all of the possible Logstash configurations (that's some research for you :) ), so I'll just set up a simple one for testing our highly-available Logstash cluster:
- This is a bit different, but the API will need to be exposed outside the localhost:
- sudo vi /etc/logstash/logstash.yml
- Uncomment http.host and set to the server's IP address
- Uncomment http.port and set to JUST 9600
- The input and output configuration for Logstash is next (you can change the filename to something else... unless you agree). For this testing, I'm just setting up a raw UDP listener on port 5514 and writing to a file in /tmp.
- sudo vi /etc/logstash/conf.d/ryanisawesome.conf
input {
udp {
host => "192.168.1.210" # server's IP
port => 5514
id => "udp-5514"
}
}
output {
file {
path => "/tmp/itworked.txt"
codec => json_lines
}
}
SELinux Tweaks
There's a few settings that need changed to allow keepalived and ipvsadm to work properly.
- Set the nis_enabled SELinux boolean to allow keepalived to call scripts which will access the network
- sudo setsebool -P nis_enabled=1
- Allow IP forwarding and binding to a nonlocal IP address
- sudo vi /etc/sysctl.conf
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1 - If you chose the DISA STIG Policy during the VM build, comment out "net.ipv4.ip_forward = 0" (yes... this is a finding if this system is not a router. But once ipvsadm is running it IS a router. So we're all good ;) )
- sudo sysctl -p
Keepalived
Here's where the real bread-and-butter of this setup lies: keepalived. This application is typically used to provide a virtual IP between two or more servers. If the primary server were to go down, the second (slave) would pick up the IP to avoid any substantial downtime. This is not a bad solution in regards to high-availability, but that means only one server will be online at a given time to process our logs. We can do better.
Another feature of keepalived is virtual_servers. With this, you can configure a listening port for our virtual IP and, when data is received, will forward to a pool of servers via a load-balancing method of your choosing. The configuration would look something like this:
- sudo vi /etc/keepalived/keepalived.conf
# Global Configuration
global_defs {
notification_email {
notification@domain.org
}
notification_email_from keepalived@domain.org
smtp_server localhost
smtp_connect_timeout 30
router_id LVS_MASTER
}
# describe virtual service ip
vrrp_instance VI_1 {
# initial state
state MASTER
interface ens33
# arbitary unique number 0..255
# used to differentiate multiple instances of vrrpd
virtual_router_id 1
# for electing MASTER, highest priority wins.
# to be MASTER, make 50 more than other machines.
priority 100
authentication {
auth_type PASS
auth_pass secret42
}
virtual_ipaddress {
192.168.1.230/24
}
}
# describe virtual Logstash server
virtual_server 192.168.1.230 5514 {
delay_loop 5
lb_algo rr
lb_kind NAT
ops
protocol UDP
real_server 192.168.1.210 5514 {
MISC_CHECK {
misc_path "/bin/python /etc/keepalived/inputstatus.py 192.168.1.210 udp-5514"
}
}
real_server 192.168.1.220 5514 {
MISC_CHECK {
misc_path "/bin/python /etc/keepalived/inputstatus.py 192.168.1.220 udp-5514"
}
}
}
Here's where the real bread-and-butter of this setup lies: keepalived. This application is typically used to provide a virtual IP between two or more servers. If the primary server were to go down, the second (slave) would pick up the IP to avoid any substantial downtime. This is not a bad solution in regards to high-availability, but that means only one server will be online at a given time to process our logs. We can do better.
- sudo vi /etc/keepalived/keepalived.conf
# Global Configuration
global_defs {
notification_email {
notification@domain.org
}
notification_email_from keepalived@domain.org
smtp_server localhost
smtp_connect_timeout 30
router_id LVS_MASTER
}
# describe virtual service ip
vrrp_instance VI_1 {
# initial state
state MASTER
interface ens33
# arbitary unique number 0..255
# used to differentiate multiple instances of vrrpd
virtual_router_id 1
# for electing MASTER, highest priority wins.
# to be MASTER, make 50 more than other machines.
priority 100
authentication {
auth_type PASS
auth_pass secret42
}
virtual_ipaddress {
192.168.1.230/24
}
}
# describe virtual Logstash server
virtual_server 192.168.1.230 5514 {
delay_loop 5
lb_algo rr
lb_kind NAT
ops
protocol UDP
real_server 192.168.1.210 5514 {
MISC_CHECK {
misc_path "/bin/python /etc/keepalived/inputstatus.py 192.168.1.210 udp-5514"
}
}
real_server 192.168.1.220 5514 {
MISC_CHECK {
misc_path "/bin/python /etc/keepalived/inputstatus.py 192.168.1.220 udp-5514"
}
}
}
Logstash Health Checks
You'll probably notice a reference to inputstatus.py in the above configuration. Keepalived will need to run an external script to determine whether or not the configured "real server" is eligible to receive the data. This is typically pretty easy to do with TCP... if a SYN, SYN/ACK, ACK is successful, we can assume the service is listening. This is not an option with a Logstash UDP input as nothing is sent back to confirm that the service is listening. What can be used instead is the API. The following script simply makes an API call to list the node's stats, parse the resulting list of inputs, and, if the input we're looking for is up, exit normally.
- sudo vi /etc/keepalived/inputstats.py#!/bin/python
import sys
import urllib2
import json
if len(sys.argv) != 3:
print "This script needs 3 arguments!: inputstatus.py IP input-id"
exit(1)
res = urllib2.urlopen('http://' + sys.argv[1] + ':9600/_node/stats').read()
inputs = json.loads(res)['pipelines']['main']['plugins']['inputs']
match = False
for input in inputs:
if sys.argv[2] == input['id']:
match = True
if match == True:
exit(0)
else:
exit(1)
Of course, you would have to create several of these if you have Logstash listening on multiple ports, but cut and paste is easy. Just look at /var/log/messages to ensure that these scripts are exiting properly. If you see a line like "Oct 30 09:44:58 stash1 Keepalived_healthcheckers[16141]: pid 16925 exited with status 1", either the script failed or a particular input is not up. Since this error message isn't the most descriptive, you'll have to manually test or view each input on each host to see which one it is. You can manually test the Logstash inputs (once that service is running) by issuing:
- /bin/python /etc/keepalived/inputstatus.py <IP> <input-id>
Firewall Rules
Sure, we could just disable firewalld... but we did just expose our API to anything that can reach this machine, so we need to lock this down a bit better. Don't worry, the rules are pretty straight-forward. (Note: replace '192.168.1.111' with your host which is sending logs to Logstash and '192.168.1.210', '192.168.1.220', and '192.168.1.230' with the two Logstash servers and virtual IP address, in that order).
- sudo firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.1.210/32 protocol value=vrrp accept'
- sudo firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.1.220/32 protocol value=vrrp accept'
- sudo firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.1.210/32 destination address=192.168.1.220/32 port port=9600 protocol=tcp accept'
- sudo firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.1.220/32 destination address=192.168.1.210/32 port port=9600 protocol=tcp accept'
- sudo firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.1.111/32 destination address=192.168.1.230/32 port port=5514 protocol=udp accept'
- sudo firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.1.111/32 destination address=192.168.1.210/32 port port=5514 protocol=udp accept'
- sudo firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.1.111/32 destination address=192.168.1.220/32 port port=5514 protocol=udp accept'
- sudo firewall-cmd --reload
The Second Logstash server
Shut down the Logstash server virtual machine since it's much easier to just clone this one and make a few configuration changes instead of stepping through this process all over again.
Now that it's shut down...
Boot the second one up (leaving the first powered off for now) and make the following changes in the VM console:
- Set hostname
- sudo hostnamectl set-hostname stash2
- Set IP address
- sudo vi /etc/sysconfig/network-scripts/ifcfg-<interface>
- Change IPADDR to appropriate IP address
- sudo systemctl restart network
- Change Logstash listening IPs
- sudo vi /etc/logstash/logstash.yml
- Change http.host to stash2's IP address
- sudo vi /etc/logstash/conf.d/ryanisawesome.conf
- Change host to stash2's IP address
- Swap the unicast_src_ip and unicast_peer IP addresses
- sudo vi etc/keepalived/keepalived.conf
- Reboot
- sudo reboot now
Now, you should be able to start the original virtual machine (in my case, Stash1)
Putting It All Together
We've finally reached the point to fire up all the services and test out the HA Logstash configuration. On each Logstash VM:
- sudo systemctl enable logstash
- sudo systemctl start logstash
- sudo systemctl enable keepalived
- sudo systemctl start keepalived
You can monitor that Logstash is up by viewing the output of:
- sudo ss -nltp | grep 9600
If you have no output, it's not up yet. If it doesn't come up after a few minutes, check out /var/log/logstash/logstash-plain.log to any error messages. Personally, I like to "tail -f" this file right after start logstash to ensure everything is working properly (plus it looks cool to those that look over your shoulder as all that nerdy text flies by).
On each machine, you can now check that ipvsadm and keepalived are configured properly and playing nice together. You should be able to run the following command and get similar output (you IPs may be different, but you should see TWO real servers):
- ip a
- Only ONE of the two servers should have the virtual IP assigned (by default, the one with the higher IP address since the priority is the same and this is the tie-breaker when using VRRP)
- sudo ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 192.168.1.230:5514 rr ops
-> 192.168.1.210:5514 Masq 1 0 0
-> 192.168.1.220:5514 Masq 1 0 0
To test that load balancing is happening, the sample log source (in my case, my host operating system) will need to send some data over UDP 5514 to the virtual IP address. To do this, I'm going to use netcat (but really anything that can send data manually over UDP will work... including PowerShell).
- for i in $(seq 1 4); do echo "testing..." | nc -u -w 1 192.168.1.230; done
What I just did was send four test messages to the virtual IP. If everything worked properly, the virtual server will have received the messages and load-balanced, in a round-robin fashion, to each server's /tmp/itworked.txt file. On each server, let's check it out.
- cat /tmp/itworked.txt
{"host":"192.168.1.111","@timestamp":"2018-11-04T17:34:37.065Z","message":"testing...\n","@version":"1"}
{"host":"192.168.1.111","@timestamp":"2018-11-04T17:34:39.038Z","message":"testing...\n","@version":"1"}
Success! Both servers received two messages!
I am proud of you for your brief explanation. Your post is very interesting and this is very useful for increasing my knowledge skills. Well said, I like to more post from your blog.
ReplyDeleteLinux Training in Chennai
Linux Course in Chennai
Best Linux Training in Chennai
Tableau Training in Chennai
Spark Training in Chennai
Oracle Training in Chennai
Oracle DBA Training in Chennai
Linux Training in OMR
Linux Training in Velachery
Thanks for your interesting ideas.the information's in this blog is very much useful for me to improve my knowledge.
ReplyDeleteiOS Training in Chennai
iOS Training in T Nagar
JAVA Training in Chennai
Python Training in Chennai
Big data training in chennai
Selenium Training in Chennai
IOS Training in Chennai
ios training institute in chennai
Thank you so much for all the wonderful information about Technology! I love your work.
ReplyDeleteC C++ Training in Chennai
C and C++ institute
C Language Training in Chennai
C C++ Training in T Nagar
core java training in chennai
javascript training in chennai
SAS Training in Chennai
QTP Training in Chennai
Thanks for your interesting ideas.the information's in this blog is very much useful for me to improve my knowledge.
ReplyDeletePython Training | Digital Marketing Training | Java Training
The website is looking bit flashy and it catches the visitors eyes. Design is pretty simple and a good user friendly interface. Serious Security Bayswater
ReplyDeleteThis post is very simple to read and appreciate without leaving any details out. Great work! security guards
ReplyDeletePretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I'll be subscribing to your feed and I hope you post again soon. Big thanks for the useful info. Serious Security
ReplyDelete