Enterprise Data Management — identifying data issues: Vijayant Yadav

Introduction

In the previous article we talked about different types of enterprise data. Here, I will talk about identifying, classifying and quantifying data issues. We will see different classifications for data issues, parameters to measure data quality and parameters to analyze data sources.

Data issues classification

There can be different issues surrounding the data of an organization which impacts the ability to use it to the fullest for organization’s growth. However, it becomes difficult to identify and classify the data issues. To embark on the journey of improvised Data Management, it’s important to identify and classify data issues. Broadly, data issues can be categorized in four classifications:

Classifications of data issues

Data Silos: It happens when the collection of data or information for a department / function / application is isolated and not accessible across the organization.

It can happen due to various reasons like organizational structure, culture of considering every department as separate, lack of common technology etc.

Data Governance: Simply put, aim of Data Governance is to provide minimal and sufficient access to data to different people and teams across the organization for conducting their business. It is about ensuring guaranteed and secured availability of data across the enterprise.

Due to lack of Data Leadership and understanding of its importance, data governance issues like unavailability of the data to right people, unauthorized access, data breaches etc. can happen in the organization.

Data Inconsistency: It happens when there are different values for the same attribute or entity of a business process. It may happen due to lack of data harmonization, data integration or process overlapping technology implementation.

Inefficient Data Processes: Generally, it has been seen with bigger organizations, over the time their business grows, and processes get more complex. This results into inefficient data processes despite using best of breed technologies and tools.

In most of the cases, data management issues are various combinations of these data issues. These data issues are not apparent as defined above. For example, a data inconsistency issue may happen due to lack of data governance; where a person who is not trained and authorized to update a value is able to do it. Or, data silos may have been created due to inefficient data processes; where when a new process was introduced as part of business expansion and data integration was not handled properly, creating data silos.There are endless scenarios which can create various issues in data management. With a proper understanding and analysis, the root cause of these issues can be identified.

Measuring the data quality

One always needs to measure the quality of data that is possessed. It plays an important role to understand the current state and scope of improvement in the data management processes.

Subjective analysis parameters for data quality

Data quality can be measured on these objective criteria:
1. Accuracy
2. Consistency
3. Completeness
4. Conformity
5. Timeliness

Accuracy: It refers to the degree of correctness of the data values stored for an object. For 100% accuracy, a data value must be the right value and in an unambiguous form.

Consistency: It means that data across all systems reflects the same information and are in sync with each other across the enterprise.

Completeness: It is the degree of comprehensiveness that is expected from the data to get the requisite information. Completeness is a measure for mandatory attributes and, is independent of optional data.

Conformity: It refers to degree of adherence of the data to standardized data definitions like data type, size and format. For example, across the organization, date is in the format “mm/dd/yyyy’’

Timeliness: It references whether information is available when it is expected and needed. Timeliness of data is very important. For example, delay in getting information about stocks below threshold levels can disrupt your supply chain operations.

Measurement and improvement of data quality requires time and resources. Thus, it is also very important to analyze the data on these three standards to get high RoI.
1. Relevance
2. Comprehension
3. Objectivity

Relevance: Data being analyzed should be relevant for the intended business purposes. It should also be directly related to the goals of your analysis.

Comprehension: Data should be in such a format that it can be comprehended by the business for further utilization. If the numbers of sales are correct, complete and consistent but fail to provide information which business executives are looking for, it is of no use.

Objectivity: It is linked to the reliability of data source and method of data collection. It measures the ability to arrive at the same result from a method of data collection, regardless of medium which ascertains the data. A real-world example would be a standardized template to receive customer experience ratings rather than letting a person asking subjective questions from the customer.

Data source analysis

One we are able to identify data issues and classify and quantify data across different parameters, we need to define and design a solution for data management. To do that, every data source is analyzed on these parameters:

Parameters to analyze data sources

Volume: It refers to the volume of data is generated from each data source.

Velocity: It refers to the rate at which data is being produced. Some data may be produced on daily basis, while in certain cases, data may be incoming as a continuous 24x7 stream.

Veracity: It refers to the quality of data. It helps in estimating the efforts required to cleanse the data to be able to consume it.

Variety: It refers to the format of the data. The format of the data has to be considered, for being structured, semi-structured or unstructured as it directly impacts on the capabilities required to process it.

All these factors help in designing the data infrastructure properly to meet the demands of the data processing. With the correct understanding of above parameters for data sources, it becomes easy to get answers to different questions about size of the database, need of data lake, big data infrastructure, NoSQL databases, real-time data consumption etc.

With a step by step objective approach we can identify, classify and quantify our data issues. Based on our analysis, we can leverage existing technological capabilities to address them, to build a robust, scalable and flexible data management solution for an enterprise of any scale. There are multiple solutions available in the market to perform same or similar activities, however, every solution has its own strength and weakness, which may not be apparent in the initial phases of implementation. With the time, when data volume is increased, sub-optimal solutions start showing signs of increased efforts and reduced efficiency. It is important to design and define a data strategy which can address current data processes, requirements, pain points and future goals of the organization and industry.

Previous
Previous

The Worldwide Master Data Management Industry is Expected to Reach $39.4 Billion by 2028: Yahoo Finance

Next
Next

IT Consulting: What It Means and How It Can Help Your Business - Svetlana Cherednichenko