How Data Management Works
As long as businesses have collected data, they’ve had to manage it to avoid the conundrum of “garbage in, garbage out.” As volumes, types and sources of data soar, the need to process data in real time expands – and the urgency to manage data well remains a top priority for business success. Dig into some of the core data management technologies.
Data Access
Data access is the ability to access (or retrieve) information from any source, wherever it’s stored. Certain technologies, like database drivers and document converters, help make this task easier and more efficient.
Why is it important?
Important data resides in many places – text files, databases, emails, data lakes, web pages and social media feeds. Good access technology lets you extract useful data from any type of data storage mechanism or format that’s available, so you can spend more time using the data – not just trying to find it.
Data Integration
Data integration (DI) is a process that combines different types of data to present unified results. With data integration tools, you can design and automate steps to do this work. ETL (extract, transform and load) and ELT (extract, load and transform) are examples of DI tools.
Why is it important?
Data integration creates blended combinations of data, which is useful when making decisions. Because it combines elements of multiple, individual data sets, integrated data can reveal new insights and help you answer different questions.
Data Quality
Data quality is the practice of making sure data is accurate and usable for its intended purpose. This starts the moment data is accessed and continues through various integration points with other data – including the point just before it’s published or reported.
Why is it important?
Poor data quality can lead to costly mistakes. Data that’s outdated, unreliable, incomplete or not fit for its intended purpose will not be trusted – causing problems across the organization. A data quality solution that can standardize, parse and verify in an automated, consistent way reduces those risks.
Data Governance
Data governance is a framework of people, policies, processes and technologies that define how you manage your organization’s data. With data governance software, you can define the rules that enforce your policies – helping align your data and business strategies.
Why is it important?
Governance is usually driven by the need to comply with regulations, like CECL or the GDPR. Via governance policies, you can define what data users can access, who can change (versus view) data, and how to handle exceptions. Data governance tools help you control and manage rules, trace how they’re handled and deliver reports for audits.
Business Glossaries, Lineage and Metadata
Use a business glossary to set data definitions and owners, integrate workflows and flag issues, and visualize lineage and relationships. Data lineage traces data’s path from its origins to its current location as it tracks key details – technical, business and metadata (data about the data).
Why is it important?
Working together, these tools help promote collaboration and align business and IT. When you’re notified of potential issues, you can address them early, before they cause bigger problems. You can also explore data relationships and conduct impact analysis with these tools.
Data Preparation
Data preparation is a task that prepares data for analytics. It involves combining data from various sources, then cleansing and transforming it. If done via a self-service interface, business users can access and manipulate the data they need with minimal training – and without asking IT for help.
Why is it important?
Good models depend on good data preparation. But it’s a time-consuming task. Good data preparation tools reveal sparkling clean data and add value – so data professionals can quickly access, cleanse, transform and structure data for any analytic purpose. The result: higher productivity, better decisions and greater agility.
Augmented Data Management
This approach uses artificial intelligence or machine learning techniques to make processes like data quality, metadata management and data integration self-configuring and self-tuning. For example, SAS can:
Generate a list of suggestions for how to improve data. Actions taken over time will continue to improve results.
Profile data and automatically find personal information, which can be flagged to influence behavior – such as only allowing specified users to access personal data in a table.
Suggest data transformations, then suggest improvements over time using machine learning – done via a discovery engine that analyzes data and metadata.
Provide recommendations to users and suggest next-best actions during the data preparation process.
More About How Data Management Works Today
- Data management for artificial intelligence (AI) and machine learning (ML). Many business processes rely on AI, which is the science of training systems to emulate human tasks through learning and automation. For example, AI and ML techniques are often used in making loan and credit decisions, medical diagnoses and retail offers. With AI and ML, it’s more important than ever to have well-managed data that you understand and trust – because if bad data feeds algorithms that adapt based on what they learn, mistakes can multiply quickly.
- Data management for the Internet of Things (IoT). The data that gushes from sensors embedded in IoT devices is often referred to as streaming data. Data streaming, or event stream processing, involves analyzing real-time data on the fly. This is accomplished by applying logic to the data, recognizing patterns in the data and filtering it for multiple uses as it flows into an organization. Fraud detection, network monitoring, e-commerce and risk management are popular applications for these techniques.
- Bidirectional metadata management. Bidirectional metadata management shares and connects metadata between different systems. SAS, for example, is committed to being part of the open metadata community through its involvement in the OPDi Egeria project – which underscores the need for metadata standards to promote responsible data exchange across varied technology environments.
- Data fabric and semantic layer. The term data fabric describes an organization’s diverse data landscape – where vast amounts and types of data are managed, processed, stored and analyzed, using a variety of methods. The semantic layer plays an important role in the data fabric. Like a business glossary, the semantic layer is a way to link data to commonly defined business terms used across the organization.
- Data management and open source. Open source refers to a computing program or infrastructure in which the source code is publicly available for use and modification by a community of users. Using open source can speed development efforts and reduce costs. And data professionals can thrive if they can work in the programming language and environment of their choice.
- Data federation/virtualization. Data federation is a special kind of virtual data integration that lets you look at combined data from multiple sources without needing to move and store the combined view in a new location. So, you can access combined data exactly when you request it. Unlike ETL and ELT tools that show a snapshot at a point in time, data federation generates results based on what the data sources look like at the time of the request. This gives a timelier and potentially more accurate view of the information.
Data Management Solutions
Trusted data leads to trusted analytics – which is important for the success of every business. And trusted data starts with having a solid data management strategy supported by proven data management technology. Our data management solutions include all the capabilities you need to access, integrate, clean, govern and prepare your data for analytics – including advanced analytics like artificial intelligence and machine learning. It’s all part of a single, integrated platform. Learn how to transform your analytics programs into big opportunities.
Recommended Reading
- 文章 5 data management best practices to help you do data rightFollow these 5 data management best practices to make sure your business data gives you great results from analytics.
- 文章 Data lineage: Making artificial intelligence smarterLear how data lineage plays a vital role in understanding data, making it a foundational principle of AI.
- 文章 Data integration: It ain't what it used to beOnce limited in scope, data integration now supports analytics and data-driven operational processes like real-time insurance claims processing and IoT apps.