by Don Parker and Ryan Wegner
Consider how a preprocessor can be used to introduce learning into our intrusion detection system (IDS). One can use the problem defined in Part I of this article, where the IDS is encouraged to adapt to changes in the type of traffic seen and alert administrators if the traffic is anomalous.
Before Snort, or any IDS, is able to identify what is considered anomalous, it has to learn what normal network traffic for the network it is deployed on should look like. In artificial intelligence (AI) it is called the baseline, or training. The IDS observes the traffic for some period of time and takes statistics to use later to compare the expected traffic to the seen traffic. If the network traffic is significantly different then usual traffic, an alert can be generated to indicate to the user that something strange is happening.
Since this is meant to be a proof of concept, let's consider an IDS that is monitoring Web traffic where traffic is expected to be coming in and out of one or more Web servers on port 80. This is a very trivial setup; however, it makes the preprocessor much easier to explain, test, and demonstrate.
Outline of the preprocessor
The preprocessor looks at every single packet. In this example, the preprocessor determines whether or not the packet is part of an established connection and on port 80, otherwise the packet is ignored. One of the primary requirements for preprocessors is speed-you do not want a preprocessor chewing up any more cycles then absolutely necessary. So, if the first two conditions aren't met, the routine exits immediately.
Once the preprocessor identifies traffic on port 80, it determines which of four possible scenarios have manifested themselves:
- LOCAL_IP:PORT ' INTERNET_IP:80
- LOCAL_IP:80 ' INTERNET_IP:PORT
- INTERNET_IP:PORT ' LOCAL_IP:80
- INTERNET_IP:80 ' LOCAL_IP:PORT
If you are inclined to look at the code, you will find that these situations are highlighted and routines for handling each situation are nested in an
LOCAL_IP:PORT ' INTERNET_IP:80
In this case, one of the machines protected by the IDS is browsing the Internet. Security professionals might not be too worried about the content; however, they would like the IDS to learn what a client request typically looks like on this network. In general, a client request will be quite small. The majority are
GET requests. The
GET requests should average a relatively small number of bytes compared to the response of the server on the Internet. If the IDS observes larger requests, this might indicate someone tunneling out of a network on port 80. This is not necessarily a threat to the internal network, but it is something that an IT security specialist should be aware of. The intention is that the IDS learns which clients are likely to be sending large requests to servers and informs the IT security professional so that they can decide if such activity is normal.
This should indicate that a server on your internal network is responding to a client on the Internet. If your internal network has Web servers, the activity itself is probably not an issue. It is strange to see this activity occurring from a machine that is not a Web server. When you configure Snort, there is a section of the
Snort.conf file that allows you to list your servers in an environment variable. IT security specialists should be encouraged to check for themselves which IP addresses are running Web servers and indicate these in the
Snort.conf. If you open the
Snort.conf file (it is very important to configure Snort properly in order to get the best performance out of it) you will see a series of
- # List of DNS servers on your network:
var DNS_SERVERS $HOME_NET
- # List of SMTP servers on your network:
var SMTP_SERVERS $HOME_NET
- # List of Web servers on your network:
var HTTP_SERVERS $HOME_NET
- # List of telnet servers on your network:
var TELNET_SERVERS $HOME_NET
This can sometimes change over time as new Web servers are added to a network or IP addresses change. Using the preprocessor discussed here, the IDS can learn which IP addresses are most likely to be Web servers by observing the internal port 80 traffic being sent out to the Internet. If such activity exists, the IDS can track it. The IDS will start to profile certain IP addresses and make guesses about whether an IP address represents a Web server or not. Once an IDS observes the activity it should check to see if the source IP address is a known Web server. If it is, then the packets can be ignored. However, if the activity is coming from a box that has not been identified as a Web server, the IDS should monitor the activity and alert the system administrator. This idea in itself is not new. The interesting idea presented here is to have the IDS get feedback from the IT security specialist and to adapt to the network. The IDS will indicate if it suspects a new Web server at some IP address and in turn the IT security specialist verifies that the IP address actually has a Web server. If it does, then the IDS can add the IP address to the list of known Web servers, adjusting its own configuration and thus adapting to the network.
The operator of the IDS does not even need to know, at first, where the Web servers are on a network (this shouldn't be encouraged of course) and can simply deploy the IDS to let it figure out where the Web servers exist. Feedback from the IT security specialist is desirable, but not necessary. By tracking this type of port 80 activity the IDS can build a list of likely Web servers and flag those that demonstrate anomalous Web activity. For example, alerting on Web servers that appear to be producing valid traffic on the local network that are only active for two hours a day from 2:00am to 4:00am, while ignoring Web servers that operate regularly during business hours.
Similar to the situation discussed above, the preprocessor can verify that the destination is in fact a Web server. Also, it can verify that the
GET requests being made by the machine outside are not large when compared to the responses from the local server. If very large requests are going into your network, which is atypical of the target server, it might indicate someone trying to upload a malicious executable onto one of your machines.
From a security perspective, this situation is normal. Other content-based Snort rules will check to make sure there is nothing malicious in the payload of the Internet server response. The only thing that may be of interest in this traffic is the average response sizes. Tracking the average response sizes can help ensure that the responses are typically larger than the requests. This is more valuable for demonstrating that the IDS is adapting to the network than it is practical, given the wide variety of possible requests from an Internet IP address.
Implementation of the preprocessor
Let's get into some implementation details. This section outlines the format of a typical Snort preprocessor and describes what the methods do. Pseudo code and method headers should help the reader understand the preprocessor internals. For more detail, you can download the preprocessor source code, which is a work in progress.
The header file in most preprocessors is very simple. You can see that the header in ours contains one function prototype:
extern void SetupIdentRogueHTTP();
The function itself is contained in the source file and its purpose will be discussed shortly. However, it has to be there in order for Snort to register the preprocessor.
C code, the preprocessor source file starts with a variety of
include statements. Some of the
includes are specific to Snort development like
plugbase.h. In general, it is not easy to code from scratch. It is much more efficient to just use all the same headers that other modules use (especially when dealing with an open source application) and figure out if you need them later. Also, you can always run an Internet search for more information on headers.
There are two major function prototypes. One is the init function:
extern void IdentRogueHttpInit(u_char *);
The other is the function that will process each packet:
extern void checkHttp(Packet *);
The names of the functions don't really matter all that much, so long as the rest of the code calls them properly. The
init method should, of course, contain the word "init" at the end.
init method is meant to perform three tasks: parse the arguments being sent in via the character pointer argument to the method, take any last minute setup steps on data structure being used in the preprocessor, and finally, link the preprocessor function into a list of functions. In our preprocessor example, a call is made to:
method just reads in all of the information that the IDS already knows about the network Web servers. Then, it calls:
The name "method" pretty much says it all. It's adding the method that is going to be used to parse the packets to the preprocessor list. The
method named in the argument will be called each time a packet is intercepted.
method checkHttp is basically the brains of the preprocessor. You can see from the source code and the
method prototype that it accepts a pointer to a packet structure as an argument. Below is some pseudo code that describes the example preprocessor. Be prepared--the claim is that this is intelligent--but, it just looks like a bunch of
if else statements:
if(packet not TCP return if(srcport not 80 and destport not 80) return if(destport is 80 and destIP on local network) if(destIP is a known Web server) # we expect this activity, so exit. if(destIP is not a known Web server) # we're going to track this. if(desport is 80 and destIP is not on local network) Our client browsing the Internet if(the requests are unusually large) # This is an anomaly, learn about it by noting it # We use our buffered http response sizes for comparison if(srcport is 80 and srcIP is on local network) if(srcIP is a known Web server) # we could track packet size, but should probably just exit if(srcIP is not a known Web server) # then it's weird -- track it and learn if this is # a new Web server or an anomaly if(srcport is 80 and srcIP is not on local network) # buffer these to compare them with the GET requests # being sent by the clients on the network
It doesn't seem very complex, but it shouldn't, for two reasons. First, pseudo code is meant to be easier to follow than the real code anyway-it's that transition step from idea to C code. Secondly, this is a proof of concept, but hopefully it has your gears churning, figuring out how you could expand on this idea. The important thing to remember is that the system is observing traffic and learning to change its behavior, based on the presence of Web traffic from IPs that aren't supposed to be Web servers.
Let's consider for a moment what kind of Snort rules you could write to handle the same kind of activity. There should be a rule for each of four scenarios. The first line of the Snort rules would be something like:
- alert TCP $HOME_NET any -> any 80
- alert TCP $HOME_NET 80 -> any any
- alert TCP $HOME_NET any <- any 80
- alert TCP $HOME_NET any <- any any
Right off the bat, there is an issue. IP addresses on our home net should be considered; however, machines that are already in the variable
$ WEB_SERVERS should not raise any concerns. Let's take the first rule and elaborate on it:
alert TCP $HOME_NET any > any 80 (msg: "Large http request";
flow:to_server,established; dsize > $SOME_VARIABLE; sid: xxx; rev: 1;)
Well, that's pretty close to the desired result. Traffic from the network out to an HTTP server where the traffic is larger than expected. How large? Larger than some predefined variable. This rule can adapt-somewhat. The IT security specialist has to determine a value for
$SOME_VARIABLE and adjust it as necessary. Also he or she would have to accommodate cases where a specific client legitimately performs that sort of request. For example, if a client tends to use HTTP as a mechanism for uploading pictures to a server, the IDS should be made aware of that. The authors don't claim to know all the tricks of Snort rule making, but if you want to maintain state information and learn dynamically from specific traffic packets, more is needed to support the rules. The learning preprocessor alleviates some of the work required by the IT security specialist to achieve the same goal.
Arguably, you could devise a series of Snort rules that could deal with the above situations, using some pretty complex alerting systems. Note that with this article, the whole goal is to demonstrate how you could introduce additional intelligence into your IDS and allow it to adapt to the network. This situation had been chosen because it is relatively simple and a decent solution, in the form of a preprocessor, can be explained clearly.
So, hopefully this article has perked up your interest in regards to making an IDS smarter using a preprocessor. This article really only scratched the surface of how artificial intelligence can lend a hand with making your IDS smarter. It also focused a lot on preprocessors in Snort as a method to implement more intelligence. The intelligence proposed here consisted of modeling machines in the network to determine if they were Web servers performing a legitimate service, as well as checking to see if clients were producing anomalous Web requests. A lot of research has been performed on artificial intelligence. For example, the work of Andres Arboleda and Charles Bedon (under the direction of Siler Amador) produced a preprocessor using neural networks in order to identity port scan traffic.
Another example is the work of Stefanos Koutsoutos, Ioannis Christou and Sofoklis Efremidis. They are working on an IDS for network-initiated attacks using a hybrid neural network. Also Gunes Kayacik, Nur Zincir-Heywood and Maclolm Heywood from Dalhousie University are doing a great deal of work on genetic programming and its use in anomaly detection. These are just a few examples and there are plenty more.
Preprocessors are a powerful feature of Snort and they have a lot of potential for increasing the intelligence of your IDS. Hopefully, by understanding the basics behind some rudimentary AI concepts and some of the advanced features of Snort, you will be able to think of new approaches for increasing intelligence in your IDS. Or, if you are a coder, experiment with your own preprocessors.
That concludes our paper on artificial intelligence and the impact it could have on an intrusion detection system to fulfill its mandate. We hoped you enjoyed the article and welcome your feedback.
This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.