Posted: 7 Min ReadFeature Stories

Machine Learning: Symantec’s Past, Present, and Future

Beyond the buzzwords: Here’s how powerful algorithms are creating strong protection for users

The term ‘Machine Learning’ was first coined in 1959 by a pioneer in the field named Arthur Samuels, who developed a revolutionary computer program at IBM that could play checkers against a human opponent - and get better as it did.

Today that may seem quaint, but when Samuel’s program was first demonstrated, so the story goes, IBM’s stock jumped 15 points, overnight.

The field continues to amaze (and promise profits). Yet across the cyber security industry, machine learning is often a buzz-phrase, casually thrown about without any real understanding of what it actually is, what it truly can do, or what it requires. It may be currently hailed as security’s ‘shiny new toy,’ but for many it is just a newly-purchased hammer and now everything looks like a nail.

At Symantec, however, machine learning is seen a bit differently. A bit more deeply.

According to Eric Chien, Distinguished Engineer and Technical Director of Symantec’s Security Technology and Response (STAR) Division, “the reality is we've been doing machine learning for years.”            

At its simplest level, cyber security machine learning involves feeding large amounts of data about both malicious and legitimate files into an algorithm. The algorithm outputs a ‘classifier’ that can then be used to look at a new file it has never seen before and determine if that file, or that URL, or that situation on an endpoint, whatever the particular in question may be, is good or bad. Previously, writing a classifier was always work done by human analysts, but machine learning allows for it to be done in an automatic way without a human needing to write the program. The machine becomes the analyst.

But the ‘learning’ part has to be done just right.

To make machine learning effective, you have to keep revisiting your work, retraining your algorithms over and over to produce newer and better classifiers as attackers keep retesting their threats against them. Unfortunately, the bad guys are using machines to revisit their work, too, devising automated scripts to constantly poke and probe and look for vulnerabilities.

“Ultimately, what we do is build out a graph. We call it a spidering system that automatically combs trillions of security events to connect the dots and find the paths that are most interesting, and then build out an entire attack chain, so we can quickly get the picture of an attack,” says Alejandro Borgia, Vice President of the Security Technology and Response (STAR) Division. “That technology is something no one else has today. It used to take our analysts upwards of two weeks to manually run a query for a particular indicator of interest. We took that two-week process down to minutes. Now they can run thousands and, in parallel, check for the latest attack as it's unfolding.”

To create an effective classifier requires more than merely ‘large’ but truly vast amounts of data. A small data set doesn’t create a full enough picture and leads to mistakes. According to Adam Bromwich, Senior Vice President of the STAR Division, you need, “an insane amount of data.”

To make machine learning effective, you have to keep revisiting your work, retraining your algorithms over and over to produce newer and better classifiers as attackers keep retesting their threats against them. 

This is where size truly matters. Symantec’s Global Information Network (GIN) collects telemetry from 175 million endpoints, *80 million web proxy users, and *63 million email users, generating over 8 billion reputation requests per day and over 20 trillion security events per year.

In addition to its products and control points, Symantec monitors nearly 100 different industry feeds daily, not blindly trusting, of course, but evaluating them to determine exactly what genuinely is a threat and what is not.

It is that massive data volume that gives Symantec’s STAR Team the visibility to determine not only what is ‘bad’ but what is ‘good’, and knowing what is good is almost more important, according to Bromwich.

“If we know all the good stuff out there, we can be a lot more aggressive. Every endpoint, every web gateway, and now every mobile device, is sending us telemetry, and we're able to use our analytics capabilities to determine whether something is likely good or not. Once we identify likely good, we can block the rest.”

The ‘old world’ was simply about detecting malware, understanding malware, reverse-engineering malware. The way Symantec detects advanced attacks today is by driving in ‘insane’ amounts of telemetry and running advanced analytics on it. By generating intelligence about what's really happening on any one machine, Symantec can share information about a potential attack—its features—across the entire Symantec product line, keeping everyone safe.

“Machine learning makes our endpoint solutions that much smarter. It makes our network solutions smarter,” says Brian Witten, Senior Director of Development. “But we're also leveraging intelligence from all our products to create a Security Operation Center (SOC) workbench where we can help the SOC analysts be almost superhuman with bionic intelligence reinforced by machines operating at a scale that people can't really wrap their head around. By doing all of that we multiply the effectiveness of everybody on our Symantec team.”

Back to basics for a moment: The more features you have, the better your classifier. But generating new features is time-consuming; you have to generate the idea, measure it, build the model, and then prove it out. Yes, machine learning allows you to automatically generate and discover new features, reducing the time it takes to create new classifiers and apply them to a model that can actually run on an endpoint. That’s great.

But in today’s threat landscape, a classifier has only milliseconds to decide if a file is clean or malicious. Rudimentary machine learning is not enough, and so Symantec has elevated its machine learning into the realm of AI (Artificial Intelligence), creating an entirely new level called ‘deep learning.’

A prime example is Symantec’s Insight Technology, developed in the Research Labs to look at file characteristics, the variables of which constantly change. Insight instantaneously applies what it gleans to change its behavior, to compensate. Not only does it determine what files might be dangerous, it extrapolates where else those dangerous files might be found and what other Symantec clients might be affected. It learns.

And just where is Symantec taking what it learns? To the cloud. We—all of us—are the Cloud Generation, where users, data, devices need access everywhere: On-prem, in the cloud, and in between.

But there is another part to Symantec’s machine learning story, a very important one.

When Symantec acquired Blue Coat, Inc., in 2016, they added visibility from the leading provider of entry point security for enterprises and governments around the world. With Symantec already being the worldwide leader in endpoint protection, adding in Blue Coat means an exponential increase in protection. “Putting their network visibility with our endpoint computer device visibility allows us to not only prevent hundreds of thousands more attacks,” Chien said, “but to actually uncover brand new, previously unknown attack campaigns, just by combining our intelligence.”

The integration of these two networks means they work as one now, blocking nearly 1 million more threats per day, responding essentially as one massive global platform. 

Now, multiply that protection again with the newly updated Symantec Endpoint Protection 14 which, according to Bromwich, “is changing the game of software. It's innovation at a level that is designed to throw a huge nuclear bomb on the attacker. They won't even know what happened.”

New Ways to Confound Attackers

As a ‘game changer,’ SEP 14 offers multiple advanced cloud-based technologies. “If you turn it on, it's going to start blocking,” says Bromwich, “The cloud really enables us to give you the ability to lock things down.”

The first big advance is ‘high-intensity detection,’ which gives the user the ability to ‘crank up’ the machine learning within the product to fundamentally lock down their machine, making it impossible for an attacker to latch on. Bromwich again: “There's no way they can iterate. There's no way they can just change things or get it in through a different exploit or through a different channel. Those things get cut off, and it's going to take attackers a very long time to be able to figure out how to get around that.”

The second advance is ‘memory exploit mitigation,’ which offers much more protection for zero-day exploits right out of the gate. Simply put, new exploits will fail.

The third advance involves creating a hardened environment on the machine via the cloud, essentially taking a technology that typically runs on a server to lock it down and create that on the endpoint environment, where a user may need to do different things than a server might. The new technology essentially puts the endpoint in a container, making sure it's safe.

There’s also a fourth advance that is a bit crafty … Deception.

Deception is not a hardening-type technology, but is about tricking an attacker, especially advanced attackers who do manage to break in and start ‘looking around.’ Deception puts fake items on the machine that the attackers trip over, not knowing they are fake. “It essentially turns your entire endpoint into a honeypot network to spot bad things happening,” offers Bromwich. “It's a tried-and-true technique in every type of war, and it's certainly a powerful one in cyber war. We felt it was really important to bring that to our portfolio.”

Adam Bromwich on machine learning as simply a subset of an Integrated Cyber Defense.

The cold new reality is that most attacks now are programmatic. No longer is it some black hat sitting there typing, typing, typing, endlessly trying to trick you. Instead the bad actors run their scams and schemes through algorithms. They now have a machine running a program trying to figure out how to penetrate your systems, your firewall, even your iPhone.

It is now about algorithm versus algorithm.

But there is still the human side. Us. The systems we use are merely extensions of who we are, decent people simply trying to run our lives and businesses on these machines we have become so dependent on.

Matthew Barnes, Vice President of Cyber Security Services, views it this way: “We take in 150 billion events per day from our customers. From that information we try to answer one simple question: is the customer protected, yes or no? If the customer's protected, awesome. If the customer is not protected, the question becomes how quickly do you get from ‘no’ to ‘yes’?"

Symantec prides itself on its people and their constant innovation, on their endless striving to help individual customers, enterprises, and governments make their systems more secure. No doubt machine learning is a cornerstone of the future of cyber security, but to Symantec it is simply another arrow in the quiver of their Integrated Cyber Defense.

Symantec Enterprise Blogs
You might also enjoy
Video
5 Min Read

Symantec POV: The Threat Horizon

How our cyber warriors outpace the exponential growth and sophistication of attacks

About the Author

Joshua Abramson

Symantec Cyber Security Staff Writer

Joshua is accountable for brand and messaging for Symantec Enterprise Security. He's the co-creator of Symantec's Innovations portal, which highlights our cyber warriors and the work they do behind the scenes.

Want to comment on this post?

We encourage you to share your thoughts on your favorite social platform.