Video Screencast Help

Using Splunk in a distributed cluster for Security Analysis

Created: 04 Mar 2013 • Updated: 06 Mar 2013
Language Translations
Krishnan Narayan's picture
+2 2 Votes
Login to vote


Splunk is a popular tool used by many companies to enable monitoring and log aggregation. It's also really useful for performing data analytics leveraging the big data capabilities to extract a lot more information than what was possible before especially in a distributed setup where extracting data from end points could be trivial but being able to correlate all this data together in a meaningful way was hard. 

With Splunk's plugin framework we can do a lot more than just aggregate logs from different sources. Some examples include Google maps integration for GeoIP capabilities, Cisco Plugins for working with Cisco network level data sets aggregated by the base splunk framework, Palo Alto networks plugin for analyzing traffic logs etc. 

In this brief overview, we will be going over the steps involved in setting up Splunk in a distributed environment. This aspect of Splunk is not particularly well documented on their site[1]

In a distrubuted setup, Splunk utilizes components called 'Forwarders'[2] which are like agents that transfer information to a pre-configured splunk server. There are various topologies possible with different types of Forwarders, more details for which can be found here[3]

For this article we would be taking the Universal Forwarder deployed in a Data Consolidation Topology which is one of the most common configuration for splunk.

Use Case

  • Deploy a simple web application with port 80 open in the cloud (AWS)
  • Deploy Splunk Server in the cloud with controlled access. 
  • Deploy forwarders to the compute instances hosting the sample web application to monitor apache logs on these instances. 
  • Apply Google Maps plugin as an example on the data aggregated from the web tier instances. (GeoIP functionality and monitor source of traffic to the web app via Splunk)


The deployment procedure can be broken down into 4 stages.

Web Tier

A simple web application is deployed with apache using AWS OpsWorks[4] which enables complete automation as a part of DevOps. A simple banking scenario was used to create the web page. More details about the deployment can be found in references. 

Splunk Server Deployment

A splunk server was set up in a dedicated compute instance by downloading the tar ball from the here[5]. After untaring the archive you can Splunk as

​$> splunk/bin/splunk start

At the end of this operation you should be able to hit the splunk server endpoint at http://<host>:<port>/ 

For this instance we are ensuring that only our IP range has access to this server as Splunk deployed in the cloud would have complete visibility to all other related compute instances. This can be done by editing the security group's source IP range. 

Configuring Forwarders

For Splunk to work in a truly distributed fashion we should be able to extract logs from different instances in the cluster. Towards this end we can use forwards. 

Forwarder for Linux can be downloaded from here

Once we untar the archive we can start the splunk forwarder in the same way we started the splunk server

$> splunkforwarder/bin/splunk start

However there are many configuration steps required before we can start the forwarder:

# Step 1: Add Splunk forward server to this forwarder 
[uname@host splunkforwarder]$ bin/splunk add forward-server <splunk-server-hostname>:<port>

### verify at this stage that your etc/system/local/outputs.conf looks like:
#defaultGroup = default-autolb-group
#server = <splunk-server-hostname>:<port>

# Step 2: Add resources to monitor to update the Splunk Server. (Tell forwarder to watch specific resources)
[uname@host splunkforwarder]$ vim etc/system/default/inputs.conf
## Add Resources to monitor as follows (append the following to inputs.conf)

# Step 3: Add the following lines to the Splunk Server's ~SPLUNKHOME/etc/system/default/inputs.conf
# disabled = 0

# Step 4: Restart splunk and splunk forwarders. 
[uname@host splunk]$ bin/splunk restart
# Restart splunk forwarders on the two compute instances
[uname@host splunkforwarder] bin/splunk restart
# Monitor ~SPLUNKHOME/var/log/splunk/splunk.d on all instances where splunk server or forwarder is deployed
# to catch an issues that my arise during this connection. 
## Ensure that the Security Groups are configured correctly for these instances to ensure communication between the two. 

If everything was setup correctly we should be able to see the forwarder in the Splunk Server's management console. A useful application for monitoring this setup process is the Deployment Monitor[6] which detects all forwarders, indexers etc and provides other useful metrics related to distributed deployment of Splunk.

Snapshot of Splunk configured in a distributed setup with forwarder deployed on host 'tarlet':

We should also see data coming in from the forwarder:

Google Maps Integration

At this stage we have completely set up Splunk to extract access logs from the web tier for the sample web application. We can now use plugins from the app store[7] in splunk to analyze the data.

In this example, we will use Google Maps Application to leverage the GeoIP features by extracting IPs from the aggregated apache access logs and get a geographical distribution of the source of the web application traffic. 

  • First, we add the Google Maps Application from the Apps page[8]. Splunk Server may restart in this process. 
  • Click on Apps in the top right corner and from the drop down menu choose the Google maps application. 
  • In the search bar enter the following:
  • source="/etc/httpd/logs/simpleapp-access.log" | rex "(?<IP_add>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})" |g
    eoip IP_add
  • On running the search, we would be able to see a geographic distribution of the count of IP addresses from each region. This plugin does a reverse IP lookup for the IP addresses extracted by the above search regex and plots it on the map giving us unique locations from where we receive web app traffic. 
  • This can be now scheduled to run periodically and exported to a custom dashboard and/or generate reports. 


Security Advantage

This framework provides significant support to log analysis and advanced correlation between activities in an ecosystem that would help us looking at usage patterns, anamolous behavior etc. allowing room for greater sophistication when it comes to security analysis. In a cloud ecosystem there are numerous moving parts and monitoring activity for each of them in a disconnected way does not help us in performing effective analysis. Correlating events and activities within the infrastructure could help security admins and Ops to be able to enforce pre-emptive measures towards securing their assets in the cloud. 

To conclude, some of the advantages of using Splunk are:

  • Log/activity aggregation with different topologies to support horizontal and vertical scaling. 
  • Advanced Indexing features enables quick log correlation for search. 
  • Search also supports multilevel processing using pipe and logical operators. Search also incorporates functionality added by different applications.
  • Expanding framework with Apps that can handle specific data sets. (Cisco router logs etc). 
  • A step towards the 'single pane of glass view' for any kind of environment (be it on-prem or cloud)


[1] Splunk Documentation,

[2] Types of Forwarders,

[3] Forwarding Topologies,

[4] Deploying a Web Application in AWS using OpsWorks,

[5] Download Splunk,

[6] Splunk Deployment Monitor,

[7] Splunk Apps,

[8] Google Maps Application,