Techniques to improve data quality
Data quality is an important part of any visualisation; when analysing data we want our analysis to be based upon robust and reliable information, resulting in confidence in actions taken. Poor data quality can undermine this confidence and as such reduce the potential benefits any analysis can bring to your organisation. This blog post looks at some of the causes of poor data quality and some of the techniques you can use within Tableau to support sustained improvements in data quality.
What can cause poor data quality?
Data Quality Teams
It drives me mad when I see organisations having 'Data Quality teams'; employing staff whose sole responsibility is raising the awareness of data quality. In my opinion this creates a parent child culture where by staff perceive that the Data Quality Team will either fix their data quality errors for them or will prompt them when they have data quality errors to correct. If Business Intelligence software, such as Tableau, is implemented effectively it can support staff to receive automated data quality reports and as such empower individuals to take personal accountability for correcting their own errors.
Disconnected reporting - The black box syndrome
If I had a pound for every time a member of staff has said to me 'that number cannot be right' I would be a very rich data geek! In the past you would have imagined I had a magical black box under my desk that would randomly generate performance figures. In reality the only reason someone states 'that number cannot be right' is because they do not understand how the performance has been calculated and as such they have little confidence in the accuracy of reporting.
Often the underlying reason for this mistrust is due to data quality reporting and performance reporting being separate within an organisation. An individual would access their performance in one report and then would access their data quality validation lists within another report. By creating this artificial disconnect between performance and data quality you are immediately allowing individuals the 'get out of jail free card' of doubting the construction of the report rather than them acting upon what the report is telling them.
Reports are refreshed inconsistently
Many organisations have performance reports released or refreshed on a different schedule to that of data quality reports. For example your data quality reports may be refreshed daily but your performance reports may be updated weekly or monthly. By having inconsistent refresh schedules you will be encouraging confusion, doubt and mistrust. If an individual wants to validate a performance figure they will be unable to do so if the validation lists are based on more recent data than the performance reports are based upon.
There is a negative perception of data and reporting
When my organisation first implemented Tableau I sent out a questionnaire to our staff asking for their views on how data was used, at the time, within the organisation. The overwhelming feedback was one of negativity and fear. Data was perceived to be the realm of managers, with managers receiving reports and using this position of power to judge people; criticising mistakes and basing statements and actions on aggregated data. When asked about data quality staff stated they only ever received feedback about the quality of their data entry through their manager and, in the majority of cases, this feedback criticised errors and rarely celebrated success.
This sparked a light bulb moment for me when planning our implementation of Tableau, how can we expect people to correct errors when we aren't giving them direct access to data that will enable them to act in a proactive fashion. It is the equivalent of expecting someone to avoid going into debt without having any access to a bank statement or summary of their balance. This negative perception of data fails to empower individuals, it doesn't encourage proactive validation of data and as such is it any surprise that for some people data validation is a long way down their list of priorities.
What techniques can be taken to support improvements in data quality?
Do not despair! There are techniques you can take to support improvements in data quality, and not only improvements but sustained improvements:
Refresh your data sources as frequently as possible
Most of our visualisations update daily, based upon an overnight refresh of our internal Data Warehouse. By ensuring data is updated overnight you are allowing your users to monitor the effectiveness of any actions they have taken the previous day to correct data quality errors. This has really empowered our end users and given them an ever increasing sense of pride; allowing them to monitor improvements and ensure the actions they have taken have had the desired results on their data.
Combine performance and data quality into single workbooks
Where possible try to combine your performance and data quality into a single visualisation; allowing users to understand their performance and then immediately reconcile this performance through the data quality validation list. This will empower individuals to not only act upon data quality errors but also to have confidence in how performance figures are calculated. Why not take this one step further and use Tableau actions to enable a user to interact with their performance analysis and update their data quality validation lists based upon what they have clicked. For example the visualisation below analyses patient waiting times. It is possible for a user to click on the performance bar chart, that summarises the number of patients waiting for each week band, and the validation list will automatically update to filter patients relating to the week band the user has clicked upon.
An additional advantage of providing data quality validation lists on screen, alongside performance data, is that users can then subscribe to this view and receive the data in their corporate email account every day, week or month (in my opinion the more frequent the better).
There is one other method for providing users with access to their data quality validation lists alongside their performance data; using the Tableau 'view data' functionality. This method works perfectly well, allowing a user to click on a performance figure and to view the underlying data, however, in my opinion, there are some limitations:
A user cannot subscribe to data quality lists if the data isn't displayed on screen, yes a user can click on the visualisation and 'view data' but this is adding additional steps for the end user to act upon the data quality validation list and also requires them to have network access to interact with Tableau.
You cannot order the fields in the 'view data' pop up. As such a user may have to scroll through multiple data fields in order to find the information they require to act upon their data quality errors.
If you are fortunate enough to have Tableau Sever within your organisation, make sure you make the most of it! By utilising subscriptions you are enable individuals to effectively self-manage and self-serve their data quality requirements. When combined with the previous approach of embedding data quality validation lists within your performance reports you will be enabling individuals to receive personalised data quality lists in their corporate email account on a frequency of their choosing. I was amazed at how empowered this made our end users feel; all of a sudden they were in control, they were receiving their own data and they were choosing to act upon it.
To put this into some perspective, at present we have 4,500 active subscriptions across our user base of 3,400 users. The moment we implemented subscriptions was the moment we went from being perceived as an IT service, supplying people with data, to a support service, working in partnership with individuals to make their working lives easier.
Make it easy for individuals to correct errors; using Tableau actions to 'click through to source systems'
If your source system is web based it may be possible to utilise Tableau actions to allow a user to click through from Tableau to the source system, allowing them to instantly act upon data quality validation lists; I learnt this technique from colleagues at Berkshire Healthcare NHS Foundation Trust and our users absolutely love it!
In a dashboard ensure your worksheet you wish to apply the action to has the required field within the detail of the worksheet. For example when applying actions we want to apply the action using the field 'Patient RIO ID' to parse this field from Tableau into the web link of our source system. Click Dashboard, Actions and then Add a URL action. In the URL action pop up window click on the sheet the action is applied to and then select Menu from the 'run action on' list of options.
In the URL section of the action screen enter the URL of where you want to take the end user to within your source system. In the example below I have embedded the Tableau field 'Patient RIO ID' into the URL; this is where the magic starts! When a user clicks on the record of an individual patient within the validation list of the Tableau report they will now be able to take the ID of this patient and parse it through to the source system, not only loading the source system within a new internet window but loading the actual record relating to that patient!
Again it's about making it as easy as possible for users to act upon the data quality information you are providing them with. To ensure our users can easily identify when a click through is available in our Tableau reports we have branded this functionality 'Click and Correct' and created a logo that is visible within the report.
Add value to data quality validation lists by including additional information within the lists
By adding value to data quality validation lists you will be increasing the usefulness of these lists to your end users and as such achieving a greater engagement with the process. Work with your end users to understand what additional information would be useful in supporting them to validate their data quality lists and embed these additional fields within your visualisation.
For example we provide end users with lists of patients who have yet to receive a first appointment and as such are waiting for their treatment to commence. In addition to providing a list of all the patients waiting for a first appointment we also include the following additional fields within the visualisation: Date of any future booked appointment, date of any previously non attended appointment. This additional information allows the services to gain the maximum possible intelligence from the data quality validation list and as such the process becomes easier for them to complete. The easier you can make a process the more likely you are to achieve your desired end result.
Support individuals to understand how to correct errors by including links to training material within the report
Your users are human and mistakes happen; often mistakes happen due to a lack of understanding of how data should be entered into system or what the consequences of poor data quality are. By embedding training material within your reports you are supporting them to understand the action they need to take to correct the error, but more importantly, embedding knowledge within the individual to support them in avoiding errors re-occurring in the future.
Within my organisation we have done this through an initiative referred to as 'Link and learn'; embedding a hyperlink within our Tableau workbooks that directs the user through to online support material that provides a step by step guide describing what an individual needs to do to correct data quality errors.
Look forward as well as looking historically
It is a given that the ultimate aim of providing people with information regarding their data quality is for them to retrospectively correct the errors and as such improve the quality of your data; however from an end user point of view one of the most useful approaches you can take is to provide information on forthcoming potential performance breaches, in addition to retrospective data quality errors.
Whilst providing information on forthcoming potential breaches will not resolve any historical data quality errors, it will help your users plan for future events and as such deliver a more proactive service to your customers or service users. By having a forward view embedded into your data quality visualisations you will be encouraging users to view the report and as such increasing user engagement with data quality reports.
Change people's perceptions of data by celebrating success within your visualisations; rewarding positive performance and encouraging learning across teams; a concept known as 'positive deviance'. Within my organisation we have achieved this through an initiative known as 'Fabulous Friday'; a dedicated visualisation that is incorporated into our dashboards and lists all fully compliant teams for an individual metric. On a Friday I then select one metric and share the 'Fabulous Friday' visualisation as a PDF with all of our Tableau users. This has created a user community that now associates Tableau with celebrating success and as such has helped increase engagement and ultimately support a culture where by individuals now actively want to correct data quality errors so they are included as part of the weekly celebration.
For further information on positive deviance refer to https://en.wikipedia.org/wiki/Positive_deviance
What difference does it make?
A massive difference! The chart below shows one of the Key Performance Indicators (KPI) for my organisation. We have always been compliant in terms of performance for this KPI, however, historically we have always struggled to achieve a near 100% compliance rate, primarily due to poor data quality. Prior to Tableau, we always took the approach of 'managerial nagging' to resolve data quality issues; managers reacting to performance deterioration and asking staff to validate their performance. Within two months of implementing Tableau subscriptions for individual staff, performance improved to 99% and this high level of compliance has been sustained for the last 18 months; this was achieved with no additional managerial resource, it was achieved thanks to staff acting upon the personalised visualisations they were receiving through Tableau.
To achieve sustained improvements in data quality within your organisation do all you can to make it as easy as possible for individuals to gain direct access to timely and clear data quality reports, do all you can to allow them to act upon this data, and do all you can to celebrate when improvements are achieved. Tableau is a fantastic tool to support you to achieve all of these goals however it is only when the technology of Tableau is combined with positive culture and behaviours that the real benefits can be maximised and embedded into the DNA of your organisation.