Simply put, Koverse will make your existing big data infrastructure investment perform in the high-speed manner that your organization requires. With astonishing speed, Koverse enables the:


  • CIO: Can meet the on-demand needs of the business – now
  • Chief Data Officer: Can aggregate data, regardless of structure, realizing the promise of big-data – fast
  • Developer: Has a new window into data and can operationalize applications – automatically
  • CISO: Can protect the integrity of data and meet governance requirements – simultaneously
  • Engineer: Can be assured that existing investment in hardware and storage practices are – not impacted

Koverse lets you bring in all your data, of any type, and enables you to discover the information it contains, run advanced analytics on day one, and build web applications on results quickly.

1. Collect and Secure

Use Cases
Technical Details

Import data from external sources – relational databases, remote file systems, streaming sources and more – and organize, and secure data into collections. Grant access using role-based controls.

Storing data in a centralized Data Lake for analysis consolidates storage and reduces overall costs. Questions that were previously prevented from being asked because data were physically separated due to a lack of access controls are now possible.
Strong security controls make it easy to comply with policy and legal data regulations. Data of varying sensitivities are logically isolated. Koverse can integrate with existing directories where users and groups are managed.

Case study: A military organization requires the ability to tightly control access to a variety of sensitive data sets. Using Koverse to label collections as well as individual records, all data sets are consolidated into one system and integrated with a public-key infrastructure (PKI) to allow access to authorized individuals. Data are protected at all times and query audits make verifying compliance straightforward.

Koverse ships with connectors to common data sources including:

  • Relational Databases (MySQL, SQL Server, Postgres)
  • Filesystems (FTP, HDFS, SFTP)
  • Data Streams (Kafka, Twitter API)

In addition, Koverse understands many common file formats including:

  • CSV
  • XML
  • JSON
  • Microsoft Office
  • HTML

Imports can be controlled to run on a scheduled basis or continuously. Extending Koverse to connect to new sources and file formats is easy using the Koverse SDK.

All data are stored in records in a NoSQL database called Apache Accumulo and protected using Accumulo’s Column Visibility feature, which allows each data element to be labeled with the accesses required to read the data.

2. Profile Data

Use Cases
Technical Details

Automatically discover schema including field names, types, and distribution of values to learn what is in your data. Identify opportunities for analysis and detect any data quality issues.

Discovering data quality issues and mapping varying schemas can cause prohibitive delays in data availability. Koverse provides immediate visibility into data sets, whether structured, semi-structured, or unstructured. Noisy and problematic data that was previously kept out of the data warehouse can be staged and processed into useful information.

Case study: A major web property wishes to evaluate whether two data sets can be combined to disambiguate customer identifiers. Loading the data and using built-in automatic profiling allows users to report on the degree to which the two data sets overlap and the quality issues present. Using automatically detected schemas, data sets are joined and the next steps for improving the intersection of records is made clear.

Koverse provides insight into every element in each data set including:

  • Field names
  • Field presence
  • Value types
  • Cardinality estimate
  • Top 20 values

This information is always available and presented in intuitive ways to help analysts get up to speed quickly. Data profiles also inform follow-on analytics, making analysis smarter and more resilient.

In addition, for every data Koverse maintains a representative sample that can be accessed at any time for processing in external tools that aren’t designed for massive scalability, such as Excel, and statistical tools like R and iPython.

3. Index and Search

Use Cases
Technical Details

Index all or selected fields within collections to make them searchable. Auto-suggest and search across collections or in specific collections, across fields or within specific fields. Search full text, ranges of numbers and dates, or geo-spatial regions.

Without search-enabling indexes, data sets remain sequestered in cumbersome files and folders. Search brings information required for decision making within the reach of analysts in less than a second. Analytical results can be looked up interactively and analysts can always go back to the raw records that fed result sets, which is crucial for verifiability, auditing, and confident decision making.

Case study: An intelligence organization requires a solution to bring together various types of intel into one system, including reports, sensor data, and geographical tracks. Koverse provides a comprehensive index and search capability for all of these data types as part of a broader platform. New applications that previously took six months to develop can now be created and integrated into the system in two weeks, building on Koverse’s search API.

Koverse indexes several types of data including

  • Full text using the Apache Lucene library
  • Numbers
  • Dates
  • IP Addresses
  • Geographical points

Range queries can span multiple dimensions. For example, an application can be made to search over four ranges simultaneously including latitude, longitude, time, and elevation.

All indexes live in Apache Accumulo tables and are secured using the same column visibilities as are applied to original records. This provides massive scalability and assured access control.

4. Analyze

Use Cases
Technical Details

Once opportunities are identified, combine, summarize and analyze one or more collections using transforms – multi-stage distributed processes. Store analytical output as a new, searchable collection.

Solving storage and search problems is insufficient, if information is too low-level and voluminous to be readily understood. Koverse Transforms provide a way to combine, aggregate, and summarize data records into high-level information that can be applied to the decision at hand. Transforms provide a way for data scientists to package analytics into re-usable data flows, enabling their efforts to be scaled across the organization.

Case study: A retailer wishes to know where to build new stores using not just their internal data but also public records such as U.S. Census records. Using Koverse Transforms to combine these data sets and build a predictive model of projected revenue per location, analysts are able to deliver a clear path forward for new store development.

All analysis in Koverse is performed via transforms. Transforms consist of multi-stage MapReduce jobs and can be configured to take parameters so that, once an analytical algorithm is developed, it can be reused by non-developers simply via configuration.

Because Koverse provides a common data abstraction, transforms can be written once and applied to a variety of input collections. Koverse’s transform API reduces the amount of boilerplate and configuration developers need to do compared to the off-the-shelf Apache Hadoop API, and all transforms can be easily ported to the vanilla Apache Hadoop API if necessary.


5. Interact

Use Cases
Technical Details

Deliver analytical results to many decision makers throughout the organization by embedding analytical results into interactive web applications.

Koverse embeds analytical results into interactive web applications so that hundreds of users can quickly lookup information they need to make decisions throughout the day. Going beyond reports and technical records, Koverse applications provide the interactivity required to gain deep insight into problems and the speed to act while the information is relevant.

Case study: A service provider wishes to know when a customer was once a frequent customer but has since stopped using their service. A ranked list of customers matching this profile is pre-computed for each individual store and made available for quick lookups via a simple web application developed in one day using Koverse’s Javascript SDK. Store employees can access these lists at any time for promotional purposes and associate this information at points-of-sale during face-to-face interactions.

Koverse provides application developers with a REST API, Apache Thrift API and a Javascript SDK. API methods include:

  • Query
  • Setup workflows
  • Control jobs
  • Control access
  • Administrate

For more developer details click here.

Deployment Options

Deploy Koverse on premise or in the Cloud, whichever option is right for the organization.