Datalager

Hvad det er og hvorfor det er vigtigt

Et datalager (eller virksomhedsdatalager) gemmer store mængder data, som er blevet indsamlet og integreret fra flere kilder. Da organisationer er afhængige af disse data til analyse- eller rapporteringsformål, skal dataene være konsekvent formaterede og let tilgængelige - to egenskaber, der definerer datalagring og gør det vigtigt for nutidens virksomheder.

Datalagerets historie

I 1970'erne og 80'erne blev databrug mere udbredt , og organisationer havde brug for en nem måde at gemme og få adgang til deres information på. Datalog Bill Inmon, som anses for at være stamfar til datalagring, begyndte at kortlægge konceptet i 1970'erne og er kendt for at have opfundet begrebet "datalager". Han udgav bogen Building the Data Warehouse i 1992, der blev rost som en fundamental kilde til datalagringsteknologi. Inmon's definition af datalageret har en "top-down"-tilgang, hvor der først etableres et centralt lager, og efterfølgende oprettes der 'data marts' - som indeholder specifikke delmængder af data - inden for dette lager.

Ralph Kimball, en anden teknologiekspert, der udgav The Data Warehouse Toolkit i midten af 90'erne, valgte en lidt anden taktik for datalagerkonceptet med sin "bottom up"-tilgang, hvor individuelle 'data marts' udvikles først og senere integreres med hinanden for at danne et datalager.

Datalagring er stadig relevant i dag, men det udvikler sig i takt med, at branchen ændrer sig for at imødekomme cloud computing og realtidsanalyser. Et nyt datalagringsværktøj, der ligner et datalager, er en datasø, som blev skabt af disruptive lavpris-teknologier som Apache Hadoop. Datasøer bruges ofte i forbindelse med uhindret datastrømning og lagring uden bearbejdning eller opbygning af skemaer.

E-commerce retailer enhances customer engagement with cloud-based analytics and AI

With a rapidly growing business and an increasingly dispersed workforce, 1-800-FLOWERS.COM turned to SAS® Viya® hosted on Azure to obtain a more flexible, scalable infrastructure. To get data ready for analytics, the company first consolidates its databases and feeds them into Snowflake, a cloud-based data warehouse.

Data warehousing in today’s world

A data warehouse often means the difference between informed decisions and data chaos. Learn how and why data warehouses and related technologies are being used in our world today.

Faster data, faster insights

For some workloads, a data warehouse and ETL process are the best approach for getting insights from data. Many businesses today use this method, often in conjunction with newer technologies – like streaming data, virtualization and data catalogs.

What is a customer data platform?

Customer data platforms (CDPs) are related to data warehouses. They pull in first-party customer data from multiple sources, like transactional databases, call centers and more. See how they work and why they matter.

Data lake:
What, why and how

A data lake rapidly ingests data and gives decision makers self-service access, exploration and visualization. Perfect for storing unstructured big data like tweets, images, voice and streaming data, data lakes are a common data source for machine-learning applications.

What is a data catalog?

Searching for big data across the business can waste valuable time. A data catalog uses metadata to help users quickly search an organization’s entire data landscape.

Who's using data warehouses?

Banking

Banks use data warehouses for governance and to help ensure they comply with regulations. In banking, different lines of business create multiple operational systems that lead to scattered, inconsistent data. Mergers and acquisitions complicate the issue. With data warehouses, banks can access trusted data and use it for reporting and analytics.

Manufacturing

Manufacturers use data warehouses to access and integrate data from a variety of sources. For example, the data warehouse often stores manufacturing data related to product quality that’s collected from various sources – like call centers, news sites, social media forums or service calls.

Health care

Health care organizations need secure access to standardized data that’s aggregated from a variety of systems, such as clinical, employee, patient and financial operations. After analyzing this trusted data, they are better positioned to optimize operations and resources, provide coordinated care and ensure good health outcomes for all.

Public sector

Governments manage and store all types of crucial – often sensitive – public sector data. This vast data pours in from individual citizens, communities, local, regional and national agencies, government contractors and others. A data warehouse securely stores all this information so it’s ready to use for policymaking and other critical decisions.

With multiple programs, we quickly found ourselves in a situation with data living in silos. To understand how we were doing as an institute, we needed to find a better, more efficient way to integrate and manage multiple data sources across the enterprise. Bonnie Chapman-Beers Director of Evaluation and Innovation Institute for Veterans & Military Families

How a data warehouse works

A data warehouse starts with the data itself, which is collected and integrated from both internal and external sources. Business users access this standardized data in a warehouse so they can use it for analytics and reporting. Business intelligence tools help them explore the data to make better-informed business decisions.

Data is typically stored in a data warehouse through an extract, transform and load (ETL) process. The information is extracted from the source, transformed into high-quality data and then loaded into the warehouse. Businesses perform this process on a regular basis to keep data updated and prepared for the next step.

When an organization is ready to use its data for analytics or reporting, the focus shifts from data warehousing to business intelligence (BI) tools. BI technologies like visual analytics and data exploration help organizations glean important insights from their business data. On the back end, it’s important to understand how the data warehouse architecture organizes data and how the database execution model optimizes queries – so developers can write data applications with reasonably high performance.

In addition to a traditional data warehouse and ETL process, many organizations use a variety of other methods, tools and techniques for their workloads. For example:

  • Data pipelines can be used to populate cloud data warehouses, which can be fully managed by the organization or by the cloud provider.
  • Continuously streaming data can be stored in a cloud data warehouse.
  • A centralized data catalog is helpful in uniting metadata, making it easier to find data and track its lineage.
  • Data warehouse automation tools get new data into warehouses faster.
  • Data virtualization solutions create a logical data warehouse so users can view the data from their choice of tools.
  • Online analytical processing (OLAP) is a way of representing data that has been summarized into multidimensional views and hierarchies. When used with an integrated ETL process, it allows business users to get reports without IT assistance.
  • An operational data store (ODS) holds a subset of near-real-time data that’s used for operational reporting or notifications.

Why are data warehouses important?

Enterprise data warehouses are vital because they integrate and store – in a central database and standard format – all the valuable data organizations use for enterprise decisioning. In turn, organizations can avoid the unpredictable results of taking an ad hoc approach to data access and integration. Data warehouses:

  • Maintain historical data records – storing months or even years’ worth of information.
  • Keep data secure by storing it in a single location where only those who need specific data are granted access to it.
  • Provide easy access to high-quality data, which enables faster, more-informed business decisions.
  • Make big data available for basic reporting as well as for advanced analytics, like machine learning and natural language processing.

Comparison: Data warehouse, data mart and data lake

Data warehouse

  • Purpose: Stores a large amount of enterprise data encompassing several subject areas across the business.
  • Advantages: Is very large; holds vast amounts of data.
  • Disadvantages: Can be difficult to build.
  • Outcome: Data is structured and ready to use for analytics or reporting.

Data mart

  • Purpose: Stores a smaller amount of data, typically covering a single subject area that’s used by one department (like marketing or sales).
  • Advantages: Is faster and easier to build than a data warehouse.
  • Disadvantages: Has limited memory – so it can’t store as much information as a data warehouse.
  • Outcome: Data is structured and ready to extract for analytics or reporting.

Data lake

  • Purpose: Stores a large amount of raw data in its native format – ideal for unstructured big data like tweets, images, voice and streaming data.
  • Advantages: Rapidly ingests data and gives business users fast, self-service access, exploration and visualization capabilities.
  • Disadvantages: Does not provide data that is standardized, unduplicated, quality-checked or transformed.
  • Outcome: Data stays in its raw format and can be repurposed – multiple metadata tags can be assigned for the same data.

SAS® Data Management

Data stored in a data warehouse doesn’t deliver value unless it’s managed well. With data management technology from SAS, you can transform big data into big opportunities with data integration, data governance, event stream processing and data quality technologies.