BigDataMonkey Shapes the Data Lake

Until Hadoop, it was not possible to put all types of information in one place. With so many systems and structures, it was too hard and too costly. Things have changed. BigDataMonkey and Hadoop now provide an economical, scalable platform that can capture, store and shape ALL your data in the “Data Lake”. The Data Lake collects new data sources like weblogs, messaging, media, social and sensor data and combines them with structured data from enterprise and cloud applications along with outside information services. But the value is not from what goes in the Data Lake but what you can get out.  BigDataMonkey lets you get exactly what you want in the shape you need.

Data Lake

Light Bulb Water

Insight from All Your Information

BigDataMonkey provides an intuitive drag and drop environment to capture and shape information in the Data Lake. As data is added or changed, BigDataMonkey automatically ingests and profiles the new information from files, streams, databases or other sources. Data in any format can be added easily without spending months or years planning. BigDataMonkey profiles the data to assess structure, cleanses it to improve consistency, and compares it with other data to determine how it relates. BigDataMonkey then automatically combines it with other data into shapes needed by your analytic tools, data applications or data warehouse. BigDataMonkey can create any shape with any data so you can easily get insight from all the information in your Data Lake.

The Value of Any and All Information

All organizations share a common complaint about their information. They do not have everything they need. Data warehouses promised to provide all this information. However, the planning required to select and structure data prior to loading means new data takes too much time to get to the warehouse. When it gets there, the necessary detail is missing. And new sources like weblogs, messaging, social and sensor data are not supported. With any and all types of data available in the Data Lake, you can finally access anything you need.


Take Action on Information

Getting real value from information requires action. Traditional business intelligence solutions based on summarized information have helped improve decision making. But the promise of big data is in automating decisions with predictive models that spot trends and events based on large volumes of detailed data. These new models are hooked directly to websites, applications and machines to provide real time recommendations and automated decision making. These new models require much more data than current tools and warehouses provide.


New Analytics Require Detail

New predictive models require detailed, time series, transactional data that captures every interaction, log entry and sensor reading. These large data sets need to be filtered, validated and cleansed to ensure decisions are based on relevant and accurate information, especially since decisions can occur without human review. This detail must be put in context by combining it with enterprise data. It needs to be stored and accessible for long periods so patterns and critical events can be detected. A Data Lake running on a low cost scalable platform like Hadoop is the only practical approach to capture this detail economically without creating large replicated silos of big data.


Lakes Must Be Shaped

The common concern with a Data Lake is that it can look more like a mud pit than a lake. Disparate, raw, inconsistent data can be dumped into Hadoop without understanding the content. It can become unwieldy and unusable. BigDataMonkey automatically profiles the data sets then cleans and relates them to one another. Users see what is there and how it fits together so they can combine information into useful shapes. The shaping process is easily operationalized and scheduled so it happens automatically. Raw unclean data can be controlled to prevent unauthorized use. Shaped data that is clean and ready can be regularly published and distributed.

Learn More About Automated Data Shaping

Start with a Pond

Creating a Data Lake is a process. The key to success is to start with a pond including a few data sets that can be shaped to deliver a quick win. Get value from the pond as it grows. Identify high value information that is not accessible today or a combination of data that can be shaped to provide new insights. Marketing is the most common first customer. Use the demonstrated value to justify the investment and approach. With the low cost of Hadoop running in the cloud or on commodity hardware, it is surprising how little it costs.


Cost Savings Pay for the Lake

In the era of shrinking IT budgets, the Data Lake actually saves money. Processing and storage costs using traditional data warehouse and RDBMS technologies are typically in the 10’s of thousands of dollars per terabyte. By comparison, costs on Hadoop are in the 100’s of dollars per terabyte. That’s a 100x cost reduction. By transitioning ETL and data shaping from expensive infrastructure to BigDataMonkey and Hadoop, you free up capacity and reduce licensing and hardware costs. These cost savings usually cover the costs of creating the Data Lake so you actually get more information for less money.


The Lake and the Warehouse

One of the controversial debates is the role of the Data Lake compared with the Data Warehouse. While the Data Warehouse and other RDBMS technologies do not efficiently handle new types of data, they do support efficient indexing, enforce consistency and handle BI use cases that Hadoop does not currently support. Currently most organizations use the warehouse to provide reference and summary data to the Data Lake. BigDataMonkey can combine data from the warehouse with new big data sources into usable shapes that support new analytics use cases. In turn, many organizations use the Data Lake to shape clean summarized data to feed the warehouse.


Shape New Opportunities

Easy access to relevant information opens up whole new business opportunities. How your customers use your products and services can tell you something they would pay for. This information is already available from your systems, logs, sensors, databases and files. Shape this information to feed predictive models and analytic applications, and your customers can gain tremendous value from these insights and innovations. Optimize their operations, tell them what’s likely to happen, or identify new opportunities for them. These improvements can help your customers, enhance your products, and create new revenue streams for you.

Learn how BigDataMonkey Technology Supports the Data Lake