How data is organized directly influences its security, availability, usability, and flexibility. Getting the most value out of your complex and sensitive data requires a careful balance of how it is structured and controlled. In the Goldilocks zone of data, the data organization is “just right” because all the data is findable, searchable, and analyzable in a single location while also highly secure.
Consider how systems that are too unstructured make it difficult to find meaningful information, while systems that are too structured are less adaptable to data changes and dirty data, and unable to support cross-dataset analytics due to reliance on disparate data silos. Likewise, systems that unduly restrict data access compromise availability, searchability, and flexibility, whereas having data overly available puts security at risk.
Most current data structures fall short of meeting goals, for instance:
- A relational database is highly structured and searchable but is less flexible.
- File or object storage services do not index deeply enough into files and objects to make all data searchable, and typically only offer security controls at a coarse-grained level on entire files, folders, or buckets.
By contrast, the Goldilocks zone specifically offers the following for all data in one system:
- Storage of any type of unstructured file and records from structured sources like databases
- Ability to index and make all values and text in files or records searchable
- Automatic tracking of schemas for all files and records with optional enforcement for flexibility
- Access control at multiple levels: entire datasets, individuals files and records, or parts of records
Thus, the Goldilocks zone of data adds just enough structure to make the data both secure and searchable, so it is available, usable, and adds value to the organization. Systems in this zone easily adapt to changing missions and changing data, providing the flexibility to bring in new datasets, new data, and new security groups without extra effort.
Are you in the zone?
Only data structures in the Goldilocks zone provide for both physical control and indexing of all data, even unstructured data, so it is searchable and flexible. And they enable fine-grained authorization to ensure that every query performed complies with given rules and regulations, and the right data is returned to the individual requestor and used correctly from a sensitivity perspective. Koverse Data Platform (KDP) is an example of a data organization that is in the Goldilocks zone.
The following four tests can help to judge whether your data is in the zone:
- Is all the data available for searching? If not, then you’re not in the Goldilocks zone. With KDP, every field of every record is automatically indexed and available for searching upon ingestion of the database.
- If a new field suddenly shows up in a dataset being ingested, how much work is needed to get it into the system and make it searchable? If any human effort is required for data transformation or cleaning, then you’re not in the Goldilocks zone. If the change consumes excessive system resources, such as for reindexing, then you’re not in the Goldilocks zone. In KDP, the amount of extra effort needed to bring in a new field is zero—the system treats new fields just like any other field being ingested.
- If a new type of data sensitivity shows up in a dataset being ingested, how much work is needed to get it applied in the system? If changes are needed to existing data or any reindexing or extra processing is required, then you’re not in the Goldilocks zone. With KDP, adding a new security group simply requires ensuring the data is labeled properly and granting the right people access to it.
- Is it possible to join or otherwise process different datasets in the system? If you cannot join, aggregate, or otherwise process entire datasets to create a new dataset without specific effort, or you can but the data comparison takes an inordinate amount of time and system resources, then you’re not in the Goldilocks zone. With KDP, you can process any dataset with any other dataset at scale.
To learn more, contact us today, or try for free for 30 days at koverse.com/get-started