Data Refinery


Extracting Crude Data and Refining it for Analysis

The most critical step to create value with Big Data, is Data Refinery.


Executive Summary 

Within the past decade, the oil and gas (“O&G”) industry has been at the forefront of the concept known as data refinery. The Merriam-Webster dictionary defines refinery as a “place where the unwanted substances in something (such as oil or sugar) are removed: a place where something is refined.” To take mass amounts of large, complex and fast-moving data, and refining that information into something viable for company use, companies need to implement a tool. “Viable use of data” has captured the attention of outside executives and shown a significant opportunity for data advancement. The purpose of this white paper is to show how Cyberhill Partners, LLC (“Cyberhill”) has taken the problem of data refinery and applied the right solution.

Corporations have vast quantities of data cycling through their systems each day- data that is constantly changing, continuously growing and fast moving. At the forefront, this data is meaningless unless the proper management tools are in place to transform different types of data, whether that data is structured (database records), unstructured (documents), or semi-structured (log data). Within major corporations such as the O&G industry, aerospace, finance, health care, non-profits, information technology, law enforcement, manufacturing, and retail- large amounts of data are collected from multiple sources within or attached to the company. The data collected ranges from product production to patient records and financial data, depending on which industry is focused upon. Health care, for example, can produce large amounts of data from their financial structure to patient records. In addition, each one of these industries can gather data from outside entities such as news articles, contract proposals, social media, statistical trends, to allow for valuable insights for decision-makers.  

Data Equity: Extracting Value from Big Data

With the proper tools and maintenance, each of these industries can gain immeasurable value from untapped data. As the data continues to grow, continuously change, and gain new sources, the value of the data grows. Potential applications include:

Data Refinery Plan: Before the first piece of data is extracted, a plan must be in place to ensure success. The plan should consist of defining where data exists at rest and a data flow of where it moves and when. From this data refinery plan, data relationships, mappings and syntaxes are defined allowing the data scientists and engineers to begin planning the processing and storage of viable data. The business plan helps to create a data refinery plan by aligning the business goals with the desired data processes. Business rules for big data should include defining big data retention policies, and sorting data for trend analysis.

Data Extraction/Integration: It is the integration and extraction of the data from previously defined data repositories that occurs during this phase. Typically, the ideal state is real-time or near-real-time data; viewing data that has just been produced. Safeguard measures are implemented using managed keys to encrypt the ingested data at rest and as it moves through the company’s networks. Cloud and on-premises environments are carefully architected to create best practice data integrations and an appropriate user-friendly experience.

Refinement: Gasoline based engines do not navigate through the crude to find the distillates that make it run, why should your business? Despite the complexity of processes in refining crude, the benefits of a refined product far outweigh the costs. Businesses are most agile when real-time information is at their fingertips to provide detailed insight when making crucial decisions. Data refinement must fit with the needs of the business to provide true value.

Production: Just like oil, data is refined to the point where it is fractionalized and distilled into usable information. Once the data has passed through its Data Refinery systems and processes it is ready for business use. When additional requirements are needed, the process starts all over again and the new requirement is added to the process. Agility is a key component in business today. Tools need to be able to respond to the changing needs of a business as the business matures and adapts.

With the tools and expertise from Cyberhill Partners, companies can unlock the value of big data. We can assist in writing data warehouse procedures, data analytics and processing, and custom reporting. Experience in data refinery yields significantly higher success-rate and value from sourced data. The analytics tool on-top of the data is extremely critical, and some solutions support only static data, while Cyberhill Partners has experience with tools that employ support for near-real-time data viewing.

Big Data and Cyberhill Partners

Created to meet the needs of our clients, Cyberhill has partnered with multiple data refinery giants to include Azure Data factory, Amazon Redshift, and ThoughtSpot. Wherever you may be at in your business process with data refinery, Cyberhill Partners can help you navigate to the solution that best meets the needs of your business while ensuring all the power of data refinery is at your fingertips.

Cyberhill Partners provide the opportunity to immerse your business in big data and help map business trends. As an example, big data plays a role in business intelligence and insights. These insights allow executive leaders to make effective business decisions. Based on these decisions, information trickles down to the lowest level to decrease or increase production, ingest financials, determine allocation of resources, etc. Cyberhill Partner’s focus is dependent upon the needs of our clients and, based on the information provided, we create the data syntax which allows for real-time business intelligence.

Cyberhill Partner’s principal tenants of refining data, ensure success at all levels:

Real-time Data Ingest: Supporting real-time or near-real-time data requires underlying systems to have real-time functionality that allows for the massive processing of data within milliseconds.

Complex & Controlled Data Flow: Scalable and flexible data flow controls and code to handle complex data flow must be present and the tools to support these functions.

Scheduling and Monitoring: Allowing for the scheduling of data extraction is very important as some data is refreshed periodically while other is real-time. Additionally, monitoring these schedules and extractions for errors and files transferred is also critical to the overall successful Data Refinery.

Storage and Data Management: Repositories which can capture and store massive data in a highly scalable and secure form are vital.

By using big data to gain new insights, corporations can create enhanced business value that will give them a competitive edge. In order to create this beneficial value, companies need to learn how to manipulate data, mine it and determine if the information is valuable. The advent of advanced data algorithms, significant Data Factory toolsets and the application of software engineering to the data problem have yielded Cyberhill Partners a mature Data Refinery business model.


By using big data to gain crucial and valuable insights, corporations can create enhanced business value that will give them a competitive and advantageous edge. In order to create this beneficial value, companies need to learn how to determine what information is valuable, manipulate the data, mine it, refine it to meet the needs of the organization. By going through these steps, corporations will be able to make effective business decisions based on real time data. Cyberhill Partners can help determine what tools can best suit the needs of your business and make that data work for you.