Data management backgrounder
What it is – and why it matters
You’ve done enough research to know that data management is an important first step in dealing with big data or starting any analytics project. But you’re not too proud to admit that you’re still confused about the differences between master data management and data federation.
Or maybe you know these terms by heart. And you feel like you’ve been explaining them to your boss or your business units over and over again.
Either way, we’ve created the primer you’ve been looking for. Print it out, post it to the team bulletin board, or share it with your mom so she can understand what you do. And remember, a data management strategy should never focus on just one of these areas. You need to consider them all.
Data Access
- what is it? Data is only an asset if you can get to it. Data access refers to an organization’s ability to get to and retrieve information from any source. Data access technology, such as database drivers or document converters, are used to make this step as easy and efficient as possible so you can spend your time using the data – not just trying to find it.
- why is it important? The data that an organization might need can exist in many places – in spreadsheets, text files, databases, emails, business applications, Web pages and social media feeds. Without a good way to access data from these sources, collecting the information becomes a nightmare. Though it is a commonly forgotten element of data management, good data access technology is essential for organizations to extract useful data from any data storage mechanism and format that they have. Without it, trying to get the data you need is like walking into a vast, sprawling library with row after row of bookshelves and being told to look for a specific printed sentence with no instructions, no map, no organization, and no one to help you.
Data Quality
- what is it? Data quality is the practice of making sure data is accurate and usable for its intended purpose. Just like ISO 9000 quality management in manufacturing, data quality should be leveraged at every step of a data management process. This starts from the moment data is accessed, through various integration points with other data, and even includes the point before it is published, reported on or referenced at another destination.
- why is it important? It is quite easy to store data, but what is the value of that data if it is incorrect or unusable? A simple example is a file with the text “123 MAIN ST Anytown, AZ 12345” in it. Any computer can store this information and provide it to a user, but without help, it can’t determine that this record is an address, which part of the address is the state, or whether mail sent to the address will even get there. Correcting a simple, single record manually is easy, but just try to perform this process for hundreds, thousands or even millions of records! It’s much faster to use a data quality solution that can standardize, parse and verify in an automated, consistent way. By doing so at every step, risks like sending mail to a customer’s incorrect address can be eliminated.
Data Integration
- what is it? Once you have accessed the data, what do you do with it? A pretty common next step is to combine it with other data to present the unified results. Data integration is the process that defines the steps to do this, and data integration tools help you design and automate the steps that do this work. The most common types of data integration tools are known as ETL, which stands for extract, transform and load, and ELT, which stands for extract, load and transform. Today, data integration isn’t limited to movements between databases. With the availability of in-memory servers, you might be loading data straight into memory, which bypasses the traditional database altogether.
- why is it important? Data integration is what allows organizations to create blended combinations of data that are ultimately more useful for making decisions. For example, one set of data might include a list of all customer names and their addresses. Another set of data might be a list of online activity and the customer names. By itself, each set of data is relevant and can tell you something important. But when you integrate elements of both data sets, you can start to answer questions like, “Who are my best customers?” “What is the next best offer?” Combining some key information from each set of data would allow you to create the best customer experience.
Data Federation
- what is it? Data federation is a special kind of data integration. The ETL and ELT types of data integration combine data and then store it elsewhere for use, in the past within a data mart or data warehouse. But what if you simply want to look at the combined results without the need to move and store it beforehand? Data federation provides the capacity to do just that, allowing you to access the combined data at the time it is requested.
- why is it important? While many ETL and ELT data integration tools can run very fast, their results can only ever represent a snapshot of what happened at a certain point in time when the process ran. With data federation, a result is generated based on what the sources of data look like at the time the result is requested. This allows for a timelier and potentially more accurate view of information. Imagine you’re buying a gift for your loved one at the store. As you check out, you receive an offer for another item that complements the gift you’ve chosen and happens to be something your loved one would enjoy. Even better – the item is in stock in the same store. Thanks to real-time analysis of next-best offer data and location data, the retailer enhances your shopping experience by delivering a convenient, relevant offer to you at the right time and the right place.
Data Governance
- what is it? Data governance is the exercise of decision-making authority over the processes that manage your organization’s data. Or to put it another way, it’s making sure that your data strategy is aligned to your business strategy.
- why is it important? Data governance starts by asking general business questions and developing policies around the answers: How does your organization use its data? What are the constraints you have to work within? What is the regulatory environment? Who has responsibility over the data? Once the answers to these questions are known, rules that enforce them can be defined. Examples of such rules might be defining what data users can access, defining which users can change the data versus simply view it, and defining how exceptions to rules are handled. Data governance tools can then be used to control and manage the rules, trace how they are handled, and deliver reports for audit purposes.
The auditability aspect of this is perhaps the most vital, as the organization’s leaders have to sign off on the accuracy of financial reports to governance boards, shareholders, customers and governmental bodies. It’s a heavy responsibility and one that carries the risk of censure, heavy fines and even legal action if not handled correctly.
Master Data Management
- what is it? Master data management (MDM) is a set of processes and technologies that defines, unifies and manages all of the data that is common and essential to all areas of an organization. This master data is typically managed from a single location, often called a master data management hub. The hub acts as a common access point for publishing and sharing this critical data throughout the organization in a consistent manner.
- why is it important? Simple: It ensures that different users are not using different versions of the organization’s common, essential data. Without MDM, a customer who buys insurance from an insurer might continue to receive marketing solicitations to buy insurance from the same insurer. This happens when the information managed by the customer relationship database and marketing database aren’t linked together, leading to two different records of the same person – and a confused and irritated customer.
With master data management, all organizational systems and data sources can be linked together and managed consistently on an ongoing basis to make sure that any master data used by the organization is always consistent and accurate. In the big data world, MDM can also automate how to use certain data sources, what types of analytical models to apply, what context to apply them in and the best visualization techniques for your data.