Does Centralized Data Governance Equal Centralized Data?
I’ve been trying to construct a palette of blog entries over the last few months which communicates the need for a holistic network, host and data-centric approach to information security and information survivability architectures.
I’ve been paying close attention to the dynamics of the DLP/CMF market/feature positioning as well as what’s going on in enterprise information architecture with the continued emergence of WebX.0 and SOA.
That’s why I found this Computerworld article written by Jay Cline very interesting as it focused on the need for a centralized data governance function within an organization in order to manage risk associated with coping with the information management lifecycle (which includes security and survivability.) The article went on to also discuss how the roles within the organization, namely the CIO/CTO, will also evolve in parallel.
The three primary indicators for this evolution were summarized as:
1. Convergence of information risk functions
2. Escalating risk of information compliance
3. Fundamental role of information.
Nothing terribly earth-shattering here, but the exclamation point of this article to enable a
centralized data governance organization is a (gasp!) tricky combination of people, process
"How does this all add up? Let me connect the dots: Data must soon become centralized,
its use must be strictly controlled within legal parameters, and information must drive the
business model. Companies that don’t put a single, C-level person in charge of making
this happen will face two brutal realities: lawsuits driving up costs and eroding trust in the
company, and competitive upstarts stealing revenues through more nimble use of centralized
Let’s deconstruct this a little because I totally get the essence of what is proposed, but
there’s the insertion of some realities that must be discussed. Working backwards:
- I agree that data and it’s use must be strictly controlled within legal parameters.
- I agree that a single, C-level person needs to be accountable for the data lifecycle
- However, I think that whilst I don’t disagree that it would be fantastic to centralize data,
I think it’s a nice theory but the wrong universe.
Interesting, Richard Bejtlich focused his response to the article on this very notion, but I can’t get past a couple of issues, some of them technical and some of them business-related.
There’s a confusing mish-mash alluded to in Richard’s blog of "second home" data repositories that maintain copies of data that somehow also magically enforce data control and protection schemes outside of this repository while simultaneously allowing the flexibility of data creation "locally." The competing themes for me is that centralization of data is really irrelevant — it’s convenient — but what you really need is the (and you’ll excuse the lazy use of a politically-charged term) "DRM" functionality to work irrespective of where it’s created, stored, or used.
Centralized storage is good (and selfishly so for someone like Richard) for performing forensics and auditing, but it’s not necessarily technically or fiscally efficient and doesn’t necessarily align to an agile business model.
The timeframe for the evolution of this data centralization was not really established,
but we don’t have the most difficult part licked yet — the application of either the accompanying
metadata describing the information assets we wish to protect OR the ability to uniformly classify and
enforce it’s creation, distribution, utilization and destruction.
Now we’re supposed to also be able to magically centralize all our data, too? I know that large organizations have embraced the notion of data warehousing, but it’s not the underlying data stores I’m truly worried about, it’s the combination of data from multiple silos within the data warehouses that concerns me and its distribution to multi-dimensional analytic consumers.
You may be able to protect a DB’s table, row, column or a file, but how do you apply a policy to a distributed ETL function across multiple datasets and paths?
ATAMO? (And Then A Miracle Occurs)
What I find intriguing about this article is that this so-described pendulum effect of data centralization (data warehousing, BI/DI) and resource centralization (data center virtualization, WAN optimization/caching, thin client computing) seem to be on a direct collision course with the way in which applications and data are being distributed with Web2.0/Service Oriented architectures and delivery underpinnings such as rich(er) client side technologies such as mash-ups and AJAX…
So what I don’t get is how one balances centralizing data when today’s emerging infrastructure
and information architectures are constructed to do just the opposite; distribute data, processing
and data re-use/transformation across the Enterprise? We’ve already let the data genie out of the bottle and now we’re trying to cram it back in? (*please see below for a perfect illustration)
I ask this again within the scope of deploying a centralized data governance organization and its associated technology and processes within an agile business environment.
P.S. I expect that a certain analyst friend of mine will be emailing me in T-Minus 10, 9…
*Here’s a perfect illustration of the futility of centrally storing "data." Click on the image and notice the second bullet item…: