Data Warehouse
What it is and why it matters
A data warehouse (or enterprise data warehouse) stores large amounts of data that has been collected and integrated from multiple sources. Because organizations depend on this data for analytics or reporting purposes, the data needs to be consistently formatted and easily accessible – two qualities that define data warehousing and makes it essential to today’s businesses.
History of the Data Warehouse
In the 1970s and '80s, data began to proliferate and organizations needed an easy way store and access their information. Computer scientist Bill Inmon, who’s considered the father of data warehousing, began to define the concept in the 1970s and is credited as coining the term “data warehouse.” He published Building the Data Warehouse, a book lauded as a fundamental source on data warehousing technology, in 1992. Inmon’s definition of the data warehouse takes a “top-down” approach, where a centralized repository is established first, and then data marts – which contain specific subsets of data – are created within that repository.
Ralph Kimball, another technology expert who published The Data Warehouse Toolkit in the mid '90s, took a slightly different tactic to the data warehousing concept with his “bottom up” approach, where individual data marts are developed first and later integrated together to create a data warehouse.
Data warehousing remains relevant today, yet it’s evolving as the industry changes to accommodate cloud computing and real-time analytics. One emerging data storage tool that's similar to a data warehouse is a data lake, which was brought about by disruptive low-cost technologies such as Apache Hadoop. Data lakes are often used in conjunction with unfettered data streaming in and storing without processing or building schemas.
Manage Your Data Beyond Boundaries
How can you gain insight from the huge amount of information in your database? Data management domain expert Matthew Magne describes a scenario where you can stream, cleanse and profile data into your data lake – and then extract knowledge in real time.
Why are data warehouses important?
Data is essential to organizations making informed decisions, so it stands to reason that data warehouses are just as important because they store all that data. Data warehouses can:
- Store large amounts of data in a central database – and in a standard format.
- Integrate data from many different sources and standardize it, so it’s ready for analytics or reporting.
- Maintain historical records, since it can store months or even years of data.
- Keep data secure by storing it in a single location. Access can be granted only to those who need specific data.
- Provide quick, easy access to data to enable faster business decisions.
Data Warehousing in Today’s World
A data warehouse often means the difference between informed decisions – and data chaos. Learn how data warehouses are affecting the world we live in.
Data Warehouse Modernization
Data warehouses remain relevant when it comes to big data and analytics, but most need modernization. Find out what you need to do, and why it's important to modernize.
Hadoop, Simplified
What’s possible with Hadoop – and how can it support your enterprise data warehouse? Learn how to support and extend your data warehouse ecosystem.
Data Integration Landscape
Move away from an ad hoc approach to data integration and look for a more comprehensive solution that can execute a variety of data integration programs.
Data Integration Déjà Vu
Learn how data integration has evolved over the years, what might be in store for the future and how you can keep your data integration approach current.
Data Warehouses vs. Other Storage Systems
While data warehousing is a common storage solution for data, it's not the only solution. Here’s how data warehouses compare to similar types of technology.
Data Warehouse
Stores a large amount of enterprise data encompassing several subject areas.
|
Can be difficult to build.
|
Large size.
|
Data is structured and ready to use for analytics or reporting.
|
Data Mart
Stores a smaller amount of data; data typically covers a single subject area and is used by one department, such as marketing or sales.
|
Faster and easier to build than a data warehouse.
|
Limited memory.
|
Data is structured and ready to use for analytics or reporting.
|
Data Lake
Stores a large amount of raw data.
|
Data remains in its unaltered state until it’s needed.
|
Enables users to query smaller, more relevant and more flexible data sets.
|
How It Works
A data warehouse begins with the data itself, which is collected from both internal and external sources. Data is typically stored in a data warehouse through an extract, transform and load (ETL) process, where information is extracted from the source, transformed into high-quality data and then loaded into a warehouse. Businesses perform this process on a regular basis to keep data updated and prepared for the next step.
When an organization is ready to use its data for analytics or reporting, the focus shifts from data warehousing to business intelligence tools. Technologies including visual analytics and data exploration are used to help businesses gain important insights from data.
Read More About This Topic
- Key questions to kick off your data analytics projectsThere’s no single blueprint for starting a data analytics project. Technology expert Phil Simon suggests these 10 questions as a guide.
- What is a data lake & why does it matter?As containers for multiple collections of data in one convenient location, data lakes allow for self-service access, exploration and visualization. In turn, businesses can see and respond to new information faster.
- 5 data management best practices to help you do data rightFollow these 5 data management best practices to make sure your business data gives you great results from analytics.
- Data lineage: Making artificial intelligence smarterLear how data lineage plays a vital role in understanding data, making it a foundational principle of AI.