icon-technology

Shape Data Automatically
Get The Right Information In the Right Format

Analysts typically spend 80% of their time preparing data and just 20% analyzing it. BigDataMonkey switches this around. BigDataMonkey’s automated data shaping platform lets the analyst ingest, prepare and combine data on Hadoop without ETL or programming. Data is ingested into Hadoop in raw form from databases, files or cloud services. It is profiled to analyze its content, structure and semantics. It is cleansed, integrated and restructured automatically to match the desired format. BigDataMonkey’s visual workspace automatically shapes data to support discovery, analysis, modeling or warehousing. See how Automated Data Shaping works. View our Infographic.


Sourcethought Infographic


BigDataMonkey Automatically Shapes Data on Hadoop with:

tools
tools
talent


The Data Bottleneck

Most of the effort in analytics is not analyzing data but rather sourcing, preparing and combining data. Teams of analysts, engineers and ETL specialists build complex processes and queries to get data clean and in the desired format. Big data with inconsistent structures makes the problem worse. The time, cost and resources required to shape data into a usable structure is the largest bottleneck to data analysis, modeling, and applications. At BigDataMonkey our goal is to eliminate this bottleneck.

 

What is Data Shaping

Data comes in many shapes and sizes. Organizations have 100’s of data sources including enterprise applications, custom applications, databases, files, cloud services, third party information and much more. Each have different structures, keys, codes and calculations. Most have quality issues. The challenge is to take inconsistent data with differing structures from disparate sources and shape it into a clean, consistent, combined set of useful rows and columns that can drive analysis and modeling. That is data shaping.

 

So Many Uses, So Many Shapes

A technology company wants to identify high value prospects based on sales, demographic, contract, usage and third party list information. They need to shape customer, contact and product data from enterprise, custom and cloud applications along with web logs and list providers into a structure to feed a predictive model. A hospital wants to identify patients likely to readmit based on diagnoses, clinical, lab, medication and socio-economic data. An oil and gas company needs to identify equipment that needs adjustment or maintenance based on sensor readings combined with geological, equipment, rig, maintenance and historical data. These are just few real world examples.

 

Too Much Time, Too Much Effort

Like the stories above, all organizations share the need for shaped data. It can take months to shape data to solve these types of problems. Analysts and other data specialists have to profile, query, visualize, and inspect the data using various analysis and quality tools. It is iterative, tedious and time consuming. Programmers and ETL specialists then build and test ETL processes that support production use. This manual process is untenable in a data driven world. We must solve more problems using more data with fewer people in less time.

BigDataMonkey Automates Data Shaping

BigDataMonkey provides an innovative browser based visual workspace that allows the analyst to automatically shape data to match the desired structure. Data is ingested into Hadoop in raw form from databases, files or cloud services. It is profiled to analyze its content, structure and semantics. It is cleansed, integrated and restructured automatically to match the format needed for discovery, analysis, modeling or warehousing. All without the need for ETL, a database or programming.

 

Any Data in Any Format

What if it took minutes or hours to shape data? What if you could quickly try many different shapes with any information from any source to see what provided the most insight or was the most predictive? What could you discover if you had easy access to the data you wanted in the format you needed? BigDataMonkey automatically maps your data and relationships recommending filters and transformations to present an integrated view of the shaping options. You can select what you need and how you want it, and BigDataMonkey shapes the data you requested.

 

Learning To Improve

BigDataMonkey’s patent pending technology learns from the data and how it is used. The more data is shaped, the better the algorithms perform. Leveraging Hadoop to perform semantic and schema analysis, relationships between different data sets are automatically revealed. As data is profiled, cleansed and shaped, these automated processes are validated and the learned behaviors are reinforced to leverage in future shapes. With usage, shaping gets more automated and more consistent.

 

Shaping the Data Lake

The Data Lake is an emerging best practice to collect all relevant information in Hadoop and build a comprehensive repository including big data and raw detail typically missing from a warehouse. The Data Lake can feed the warehouse and enable new uses including predictive modeling, automated decisioning and new analytics applications. BigDataMonkey is the ideal platform to supply and access data on the Data Lake and shape it to feed your business intelligence, analytics, modeling and warehousing platforms.

Learn how BigDataMonkey Shapes Information in the Data Lake >