A major component of Six Sigma is data, or data mining. In fact, acquiring and storing good quality data is the key to a successful Six Sigma initiative.
Several months ago, I wrote about certifying data through Six Sigma. I mentioned about establishing data standards through service level agreements (SLAs) and administering them by an organized data governance structure. However, before you can do this your data must have some of the following typical metrics: accuracy/precision, completeness, reliability, availability, timeliness/freshness, consistency, and uniqueness.
Now, let’s assume that you have certified data and you’re well on your way to your Six Sigma project. How then do you ensure that you have quality data through and through?
You can ensure data quality (DQ) in two ways: off-line and in-line.
The off-line DQ process is run outside of the certified data production process, while the in-line DQ process is run in synchronization with the certified data production process. The relationship between the two DQ processes is shown in figures and comparative analysis here.
After doing the DQ, it’s now time for you to take action. You can do two processes that ensure your DQ is indeed measurable.
1. Scoring: This process focuses on evaluating the metric data captured in order to provide a measurement (score) of the degree of the data quality. This score is published with the data and available for use in reporting so the end data consumer can understand the degree of confidence that can be placed in the data.
2. Monitoring and control: This process focuses on capturing and dealing with the metric data that is captured during the measurement processes. The emphasis here is data quality and process improvement. This is a straightforward process for determining a course of action to take based on a set of parameters and rules. During this process, the following sequential steps are executed:
- Collect: The monitored data points are collected and stored. The storage may be temporary or persistent.
- Classify: They are then classified and categorized based on the type of check performed, the priority of the data quality check, and user-selected data quality attributes.
- Detect: Rules are executed based on the classification of the data quality data points. If a data quality fault is detected, an action is taken.
- Act/control: If a fault is detected, a sequence of one or more actions is initiated. These may include providing e-mail notification, fixing the fault, aborting the data quality job stream or continuing with the job stream while noting exceptions.
- Log: The detected fault and the resulting actions are stored in a log file that can be used for auditing or analysis.
By going through these steps, you are not only gathering data but already you are also evaluating the kind of data you have. With high quality data, you can then so statistical process controls and other statistical tools en route to improving your processes.
Below is an example of a real-world Six Sigma DQ process.
A major U.S. financial institution is well on its way to implementing a data quality/certification process across all of its enterprise data, which currently comprises 80 sources. The first step in this process was driven by the bank’s regulatory compliance requirements. The bank needed to supply Sarbanes-Oxley compliant demand deposit transaction data to its finance data warehouse where data is aggregated, analyzed and used for reports to management, regulators and investors. As part of the assurance process, this data was processed through a data hub where the mainframe-supplied demand deposit transactions are extracted, converted, and transformed into a form usable by the bank’s financial accounting system. In addition, a robust monitoring and control process was used to implement the tights rules and thresholds required by the bank for detecting potential faults during both profile and process checks:
- Collect: The numbers of records and bytes are captured after key lookup and aggregation steps. A user-defined check of total average monthly balance is also calculated and monitored.
- Classify: These data quality checks are classified as high priority/alert checks.
- Detect: Values are compared to a moving average of previous months’ values. If the values deviate greater than 5%, a fault alarm is raised.
- Act/control: If a fault is detected, an alarm is raised and a message is sent to both system operators and data analysts familiar with demand deposit data transactions. Further processing is stopped until a resolution is reached, which may be a decision to continue processing or to correct errors and re-initiate the transformation process.
- Log: The data quality check point values, the fault and any subsequent actions are logged.
Source: Six Sigma Data Quality Processes