Data Quality and Machine Learning Readiness Test

It is no secret that machine learning is all the rage. And no wonder. It offers the potential to address areas that, so far anyway, have eluded traditional technologies.

At the same time, machine learning algorithms are no better than the data used to train and feed them. Indeed, the quality standards for data used by machine learning algorithms far exceeds most other quality requirements. See “If Your Data is Bad, Your Machine Learning Tools are Useless,” (Redman, hbr.org, April 1, 2018). All this means some fairly steep organizational changes will be required as well.

These concerns motivated this “Data Quality and Machine Learning Readiness Test.” It is designed to help you understand the most important issues, baseline where you are, and sort out which issues you must address in the short term. The test also aims to help you get started on a program that will stand you in good stead for years to come. It may take ten years and hundreds of millions of dollars to become world-class. So, over time, you should ask yourself what is the desired end state for your company. Remember, the goal is to obtain business benefit, not score well on this test.

Click on the image below to view or download the Data Quality and Machine Learning Readiness Test PDF:

Step 1:
Form the test team.
We recommend that you can form an internal test team, guided by an independent, skilled professional. Your internal team should consist of five to nine people with a diversity of skills, including data scientists, data quality professionals, senior managers, technologists and those likely to be impacted. It is recommended that the person and/or groups taking this test should be at a level that understands the overall data program within their organization and/or company.
Step 2:
Take the test.
The test consists of ten “criteria.” For each criterion, there are five statements, each reflecting a different rating of where you stand against the criterion. On one end of the scale is “unaware,” meaning you don’t fully understand the criterion, why it is important, or what you need to do to meet it. Don’t be afraid to rate yourself here. Machine learning, and especially data quality requirements, are new and unfamiliar to most. On the other end of the scale is “world-class,” reflecting long-time efforts, great results, lots of learning (and almost certainly plenty of failures!) along the way. The statements between reflect steps along the way.
Step 3:
Revisit your scores.

While the Data Quality and Machine Learning Readiness Assessment is a test, no one is keeping score.

You will do yourself and your organization far more harm than good by rating yourself higher than you really are. 

Step 4:
Interpret results and take appropriate action

Your success will be driven more by your lowest score than by the average. Thus, take a careful look at the criteria where you rated yourself a 0. These areas imperil your effort. So first, you must educate about what is required. Do not be embarrassed – many organizations are in similar situations, especially early on. But to be clear, DO NOT PROCEED UNTIL YOU HAVE MADE YOURSELF FULLY AWARE OF WHAT YOU MUST DO.

Once you have become aware on all elements of the broad scope of effort required, take a look at the criteria where you rated yourself 1. Obviously, this is the best you can score if you are just starting out. You need to gain some experience. In such situations, many are tempted to start narrow “pilot studies” or “alpha tests.” These often focus on gaining some experience with the technologies used to create predictive models. Unfortunately, this will not help you gain the experience you need across the full range. While you can keep your first efforts small, they should be broad, embracing all areas where you need to gain experience.

Note: We are fully aware how difficult it is to make yourself aware, gain experience, figure out how to get to the next level. Indeed, as our cautions make clear, even evaluating where you stand is difficult. In fact, you will be tempted to score yourself often between ratings. Don’t do that! If you are in doubt about whether you can confidently put yourself into one of the areas, then rate yourself in the lower. If you find later that you are as good as you thought, it’s easy to adjust. Do not hesitate to seek professional help!

Step 5
Use the Readiness Test on an ongoing basis.

It will likely take five to seven years before your data quality and machine learning program advances to a point where you are really happy with it. (Note: You should, of course, expect visible business benefits more quickly, say within two years). So be patient. Re-rate yourself twice a year. And take careful notes. For example, you may find yourself admitting “we thought we were prepared to address quality (or change), but that turned out to be much harder than we expected.” This is healthy. Finally, as you score all 2s and 3s, you should ask yourself “how good do we need to be?” The answer will certainly depend on your desired business results and competitive position.