For Data to Survive, It Must ADAPT…

Adapt

Now that I’ve annoyed you by suggesting that network security will over time become irrelevant given lost visibility due to advances in OS protocol transport and operation, allow me to give you another nudge towards the edge and further reinforce my theories with some additionally practical data-centric security perspectives.

If any form of network-centric security solution is to succeed in adding value over time, the mechanics of applying policy and effecting disposition on flows as they traverse the network must be made on content in context.  That means we must get to a point where we can make “security” decisions based upon information and its “value” and classification as it moves about.

It’s not good enough to only make decisions on how flows/data should be characterized and acted on with the criteria being focused on the 5-tupule (header,) signature-driven profiling or even behavioral analysis that doesn’t characterize the content in context of where it’s coming from, where it’s going and who (machine, virtual machine and “user”) or what (application, service) intends to access and consume it.

In the best of worlds, we like to be able to classify data before it makes its way though the IP stack and enters the network and use this metadata as an attached descriptor of the ‘type’ of content that this data represents.  We could do this as the data is created by applications (thick or thin, rich or basic) either using the application itself or by using an agent (client-side) that profiles the data prior to storage or transmission.

Since I’m on my Jericho Forum kick lately, here’s how they describe how data ought to be controlled:

Access to data should be controlled by security attributes of the data itself.

  • Attributes can be held within the data (DRM/Metadata) or could be a separate system.
  • Access / security could be implemented by encryption.
  • Some data may have “public, non-confidential” attributes.
  • Access and access rights have a temporal component.

You would probably need client-side software to provide this
functionality.  As an example, we do this today with email compliance solutions that have primitive versions of
this sort of capability that force users to declare the classification
of an email before you can hit the send button or even the document info that can be created when one authors a Word document.

There are a bunch of ERM/DRM solutions in play today that are bandied about being sold as “compliance” solutions, but there value goes much deeper than that.  IP Leakage/Extrusion prevention systems (with or without client-side tie-ins) try to do similar things also.

Ideally, this metadata would be used as a fixed descriptor of the content that permanently attaches itself and follows that content around so it can be used to decide what content should be “routed” based upon policy.

If we’re not able to use this file-oriented static metadata, we’d like then for the “network” (or something in/on it) to be able to dynamically profile content at wirespeed and characterize the data as it moves around the network from origin to destination in the same way.

So, this is where Applied Data & Application Policy Tagging (ADAPT) comes in.  ADAPT is an approach that can make use of existing and new technology to profile and characterize content (by using content matching, signatures, regular expressions and behavioral analysis in hardware or software) to then apply policy-driven information “routing” functionality as flows traverse the network by using an 802.1 q-in-q VLAN tags (open approach) or applying a proprietary ADAPT tag-header as a descriptor to each flow as it moves around the network.

Think of it like a VLAN tag the describes the data within the packet/flow which is defined as seen fit;

The ADAPT tag/VLAN is user defined and can use any taxonomy that best suits the types of content that is interesting; one might use asset classification such as “confidential” or uses taxonomies such as “HIPAA” or “PCI” to describe what is contained in the flows.  One could combine and/or stack the tags, too.  The tag maps to one of these arbitrary categories which could be fed by interpreting metadata attached to the data itself (if in file form) or dynamically by on-the-fly profiling at the network level.

As data moves across the network and across what we call boundaries (zones) of trust, the policy tags are parsed and disposition effected based upon the language governing the rules.  If you use the “open” version using the q-in-q VLAN’s, you have something on the order of 4096 VLAN IDs to choose from…more than enough to accomodate most asset classification and still leave room for VLAN usage.  Enforcing the ACL’s can be done by pretty much ANY modern switch that supports q-in-q, very quickly.

Just like an ACL for IP addresses or VLAN policies, ADAPT does the same thing for content routing, but using VLAN ID’s (or the proprietary ADAPT header) to enforce it.

To enable this sort of functionality, either every switch/router in the network would need to either be q-in-q capable (which is most switches these days) or ADAPT enabled (which would be difficult since you’d need every network vendor to support the protocols.)  You could use an overlay UTM security services switch sitting on top of the network plumbing through which all traffic moving from one zone to another would be subject to the ADAPT policy since each flow has to go through said device.

Since the only device that needs to be ADAPT aware is this UTM security service switch (see the example below,) you can let the network do what it does best and utilize this solution to enforce the policy for you across these boundary transitions.  Said UTM security service switch needs to have an extremely high-speed content security engine that is able to characterize the data at wirespeed and add a tag to the frame as it moves through the switching fabric and processed prior to popping out onto the network.

Clearly this switch would have to have coverage across every network segment.  It wouldn’t work well in virtualized server environments or any topology where zoned traffic is not subject to transit through the UTM switch.

I’m going to be self-serving here and demonstrate this “theoretical” solution using a Crossbeam X80 UTM security services switch plumbed into a very fast, reliable, and resilient L2/L3 Cisco infrastructure.  It just so happens to have a wire-speed content security engine installed in it.  The reason the X-Series can do this is because once the flow enters its switching fabric, I own the ultimate packet/frame/cell format and can prepend any header functionality I like onto the structure to determine how it gets “routed.”

Take the example below where the X80 is connected to the layer-3 switches using 802.1q VLAN trunked interfaces.  I’ve made this an intentionally simple network using VLANs and L3 routing; you could envision a much more complex segmentation and routing environment, obviously.

AdaptjpgThis network is chopped up into 4 VLAN segments:

  1. General Clients (VLAN A)
  2. Finance & Accounting Clients (VLAN B)
  3. Financial Servers (VLAN C)
  4. HR Servers (VLAN D)

Each of the clients/servers in the respective VLANs default routes out to an IP address which belongs to the firewall cluster IP addresses which is proffered by the firewall application modules providing service in the X80.

Thus, to get from one VLAN to another VLAN, one must pass through the X80 and profiled by this content security engine and whatever additional UTM services are installed in the chassis (such as firewall, IDP, AV, etc.)

Let’s say then that a user in VLAN A (General Clients) attempts to access one or more resources in the VLAN D (HR Servers.)

Using solely IP addresses and/or L2 VLANs, let’s say the firewall and IPS policies allow this behavior as the clients in that VLAN have a legitimate need to access the HR Intranet server.  However, let’s say that this user tries to access data that exists on the HR Intranet server but contains personally identifiable information that falls under the governance/compliance mandates of HIPAA.

Let us further suggest that the ADAPT policy states the following:

Rule  Source                Destination            ADAPT Descriptor           Action
==============================================================

1        VLAN A             VLAN D                    HIPAA, Confidential        Deny
IP.1.1               IP.3.1

2        VLAN B             VLAN C                    PCI                                 Allow
IP.2.1             IP.4.1

Using rule 1 above, as the client makes the request, he transits from VLAN A to VLAN D.  The reply containing the requested information is profiled by the content security engine which is able to  characterize the data as containing information that matches our definition of either “HIPAA or Confidential” (purely arbitrary for the sake of this example.)

This could be done by reading the metadata if it exists as an attachment to the content’s file structure, in cooperation with an extrusion prevention application running in the chassis, or in the case of ad-hoc web-based applications/services, done dynamically.

According to the ADAPT policy above, this data would then be either silently dropped, depending upon what “deny” means, or perhaps the user would be redirected to a webpage that informs them of a policy violation.

Rule 2 above would allow authorized IP’s in VLANs to access PCI-classified data.

You can imagine how one could integrate IAM and extend the policies to include pseudonymity/identity as a function of access, also.  Or, one could profile the requesting application (browser, for example) to define whether or not this is an authorized application.  You could extend the actions to lots of stuff, too.

In fact, I alluded to it in the first paragraph, but if we back up a step and look at where consolidation of functions/services are being driven with virtualization, one could also use the principles of ADAPT to extend the ACL functionality that exists in switching environments to control/segment/zone access to/from virtual machines (VMs) of different asset/data/classification/security zones.

What this translates to is a workflow/policy instantiation that would use the same logic to prevent VM1 from communicating with VM2 if there was a “zone” mis-match; as we add data classification in context, you could have various levels of granularity that defines access based not only on VM but VM and data trafficked by them.

Furthermore, assuming this service was deployed internally and you could establish a trusted CA with certs that would support transparent MITM SSL decrypts, you could do this (with appropriate scale) with encrypted traffic also.

This is data-centric security that uses the network when needed, the host when it can and the notion of both static and dynamic network-borne data classification to enforce policy in real-time.

/Hoff

[Comments/Blogs on this entry you might be interested in but have no trackbacks set:

MCWResearch Blog

Rob Newby’s Blog

Alex Hutton’s Blog

Security Retentive Blog

  1. May 31st, 2007 at 23:26 | #1

    This is very cool. The user policy and application profiling stuff you talk about at the end is already being done by my old colleagues at Vormetric, and that's also very cool. The ADAPT stuff sounds very similar to what Njini are doing over in the UK (also pretty cool, I suggest taking a look).
    Either way, this is just completely spot on. I'd love to see this working.

  2. June 1st, 2007 at 02:20 | #2

    Data Centric Security… Yeuch

    Rational Security: For Data to Survive, It Must ADAPT… All this data-centric security stuff sounds really good in principle, but to be honest I'm not buying it, for a couple of reasons. One: there's no widely agreed on DRM open…

  1. No trackbacks yet.