Cyber Defense Magazine September 2022

The Implications of Zero Trust for Data

Zero Trust is a hot topic in network security. For those not familiar, zero trust is the “never trust, always verify” premise applied to every device, with an eye to protecting the corporate network. In many ways, this architectural approach represents the ultimate security posture.

That said, most zero trust approaches today have a flaw. Two, actually: people and data.

The people flaw might be colloquially termed “the insider threat problem.” In short, how do you protect against rogue actors (or good actors that have been phished)? With the right credentials, that actor has the keys to the kingdom.

The data problem is even more pernicious: how do you protect PII, confidential and classified information without creating data silos? Most larger companies today use some form of a data lake where they ideally collect and physically co-locate everything – structured and unstructured, batch and continuously streaming, classified and unclassified, basically all sorts of complex data. There is no way to block, say, a social security number contained in a piece of unstructured data without blocking (siloing) the whole file. These data silos can wreak havoc on analytics, data science, and artificial intelligence (AI) initiatives, especially in sectors with a heavy dose of sensitive data, such as financial services, life sciences, healthcare, and of course government.

The problem with previous approaches to zero trust is that it’s applied at a network and file level, not the data level. In that sense it’s a blunt instrument; you either have access or you don’t, the data itself is insecure. The irony is that zero trust pushes for perimeterless security, yet what network-based approaches alone do is to establish a bolt-on zero trust perimeter around your data storage, and then slice that data up to try to maintain the appropriate level of security.

Let’s examine what this means in regard to people and data access.

The People Problem

You may think with zero trust, your data is locked down, and that highly confidential data is safe. But is it? Only authorized users have access, after all. And those authorized users include all your database administrators, your helpdesk staff or any of which may be on contract, and thus be more transitory than typical employees and subject to less scrutiny. Any of these people (employees or contractors) could be phished. Or have a virus on their computer.

Still feeling safe?

Even with zero trust, there can still be issues in configuration and policy management. Anyone who’s dealt with common cloud security policies knows these can be onerous to apply to a large and varied set of data and services. An administrator sets up a new cloud database, only to discover it can’t communicate with the policy engine or web servers. The natural inclination is to just change settings to “allow” … and now everything works, but your data is open to the internet. Are you certain that all those loopholes have been closed?

The Data Problem

Regardless of zero trust, for most organizations today, data is secured by segmenting it – in other words, creating data silos. Again, this is a blunt, all-or-nothing approach, especially when it comes to unstructured data.

Take a spreadsheet, for example, where two workers, Bob and Alice, need access. Both have credentials and are working from a trusted device. Alice is authorized to view all the data in the spreadsheet, even the confidential information. Bob, however, doesn’t have clearance to see the sensitive data, so he needs to work on a copy of that spreadsheet with that information removed. Now you have two copies of the same file. Even worse, once Bob updates the spreadsheet, now someone has to reconcile those changes. This happens over and over across the organization.

Having to silo confidential information can have a significant impact on data science, analytics, and AI, particularly if this data is of mixed sensitivities. Either it becomes off-limits to the people and algorithms that could use it, or the organization has to effectively duplicate storage, management, AI/ML pipelines, etc.

Integrating Zero Trust at the Data Level

The traditional network-centric approach to zero trust does not address these issues. But what if we implemented zero trust, along with attribute-based access controls (ABAC), at the data level instead? What would this look like?

All data would have security labels applied on write, i.e., immediately protected at ingestion. The system should be able to handle all data types – structured or unstructured, streaming or static – in their raw forms, retaining the data’s original structure to ensure greater flexibility and scalability.

Attribute-based access control allows resources to be protected by a policy that takes users’ attributes and credentials into account, not just their roles, and can allow for more complex rules. And if ABAC is used to protect data at a fine-grained level, it ensures that data segregation is no longer necessary. Unlike the more common role-based access control (RBAC), which uses course-grained roles and privileges to manage access, ABAC is considered the next generation of access control because it’s “dynamic, context-aware and risk-intelligent.” These access controls can be applied at the level of the dataset, column, attribute-based row, document/file, and even individual paragraphs. In this scenario, people see only the data they need (and are authorized) to see, even if they’re looking at the same file.

Let’s look at our earlier examples through the lens of zero trust for data. A data analyst could upload sensitive information that would be immediately labeled. Even the database administrator would not be able to view this information – they can manage the system resources, but not view the confidential data therein. Zero trust.

It gets even more interesting when considering the spreadsheet being managed by our friends Alice and Bob. Only one copy of the spreadsheet exists; both Bob and Alice can be looking at it and working on it, but each see and have access to only the data appropriate to their credentials. Technically, Bob would not even know he’s not seeing all the data. Again, zero trust.

The Implications of Zero Trust for Data

So, what would this mean for an organization and its data?

First, that data – all data, across mixed sensitivities – would be better protected. Because silos are eliminated, all data can be co-located, improving efficiency, and making information immediately available for use. Because we’ve now got fine-grained control, we can even apply this zero trust and ABAC to search, so that all data, regardless of sensitivity, can be readily indexed and found; users only see the results they’re authorized to see. And data scientists can focus on the objectives for their AI and analytics work, instead of the infrastructure.

If this sounds like fantasy, it’s not. It’s actually the approach that prominent three-letter government agencies use when they have to work with data of mixed sensitivities. That zero trust for data is now making its way into commercial and government organizations of all types, and it promises to have a major impact on how we work with – and protect – data going forward.

*This article first appeared in Cyber Defense Magazine’s e-Magazine, September 2022 edition.