Imagine unwrapping your favourite chocolate bar to find that instead of the delicious chocolate treat you were expecting, you get a bitter mouthful of something horrible. They’ve forgotten the sugar – yikes! This very rarely happens because, like all good products, they get randomly sampled to check the quality is the same throughout.

Just as with those chocolate bars, we want to make sure there’s sugar in all of our donors’ data. It’s only when it’s reliable that people will want to trust the brand and eat it!

OK, so the chocolate metaphors stop here…for the last few weeks, here in the Publish What You Fund office, we’ve been working hard sampling hundreds of documents for the 2018 Aid Transparency Index.

In this instance, sampling means that the team manually checks specific indicators published via an organisation’s IATI files. We do this to verify the information within the file is properly tagged and meets the definition of the indicator. Although our automatic quality testing tools are essential in order to check if – and for how many of its activities – an organisation publishes information to IATI, sampling is key to verifying the data.

How does sampling work and what is new with the 2018 methodology?

Sampling is conducted for 16 out of our 35 indicators. Five of these indicators represent organisation-level information, such as organisation strategy or audits. The other eleven indicators refer to project-related information, for example the objectives or the results of a particular development intervention.

One objective of our recent methodology review was to raise the bar for data quality and to put the data user in focus. To ensure this, we’ve made sampling more rigorous by:

  • Increasing the sample size. The sampled activities are selected randomly with 50% having to meet our criteria in order to pass. This step comes after our tracker tool has tested all files and activities published by an organisation to IATI.
  • Sampling is only being conducted on current projects, to ensure the data is up-to-date.
  • Titles and descriptions have been added to the list of indicators sampled.

We sample data at two stages in the data collection process. This first round allows an organisation to see how it can improve its information and identify any common mistakes. By the time we sample for the second time, at the end of the data collection process, we usually see a solid improvement. The process helps us to advocate for better quality data with the aim of benefitting data users.

We’ve had a lot of debate in the office about common issues with the data as we have been going through it.

Insights from Sampling

The good news is: compared to previous years, there is more IATI data for us to sample than in previous years. Many organisations pass a significant number of indicators. The bad news is: there is still a lot of work to do to improve data quality and make it user friendly.

When going through the sampling process, we put our data user hats on. Talking with the rest of the team, we have collectively found three major issues with the data:

1. Failing to provide the very basic information
The sampling process revealed that adding titles and descriptions to the list of sampled indicators has proven worthwhile. Titles and descriptions are the most basic bits of information. They provide an important entry point for any data user. If they are non-existent or obscure, it is a barrier to those who might want to look further. However, as the sampling process showed, a number of donors publish titles and descriptions that are useless for the user. They are too short, they include unexplained acronyms or use “aid jargon”. In some instances, no titles or descriptions are provided at all.

2. Unclear and inconsistent documentation
The differences in data quality between donors often stem from the way they document, tag and label their data. When activities span a five year period, for example, multiple tenders, contracts, evaluations or progress reports can be published every year or multiple times in the same year. Sometimes documents tagged for specific indicators do not in fact include the relevant information. We have found several unlabelled documents tagged for one indicator without a pointer to the most recent or relevant one. This means that the user is confronted with a lot of “noisy” data. They need to invest more time to make sense of the data, which is annoying. The real issues occur when different pieces of information do not match up. This problem became most obvious while sampling information on sub-national locations. While the descriptions name one location – usually the capital city – the location coordinates and/or contextual documents indicate that the project is really somewhere else. This inaccurate recording of information doesn’t just waste time but it impedes transparency efforts.

3. Donors need to consistently publish current data
Another change to the methodology that has proven helpful to push for data quality is the sampling of current projects only. It has revealed two major flaws in donors’ data. First of all, in many cases the documents that donors are providing are not current. This ranges from outdated country strategies, to old project appraisals from previous project phases, to evaluations conducted several years ago. Second, in some instances, projects we sampled stated they were still in implementation but no recent transactions were recorded and project dates suggested the activity ended years ago. This suggests data hasn’t been updated to reflect the end of the project cycle or that donors have not provided another explanation as to why these projects are still in implementation. The two issues highlight the importance of consistently updating data, regardless of the Index cycle, so as to provide users with accurate and usable information.

So what next?

  • We have shared the sampling results with all 45 donors being assessed in the Index. They now have the chance to improve their data until March 9th. 
  • We are engaging with donors through emails, calls and meetings to ensure that they have the best support in making their data user-friendly and transparent. So far, a large majority of donors have been in contact, each with specific questions on their sampling results and how to make their data better. Please email if you still have any questions.
  • The Data Quality Tester is always available to donors to keep testing their own data against the Index methodology. It is free and open source!

The sampling process has shown that the mere publication of data is not enough. Donors have to take responsibility for the quality of their information. Only through ensuring ‘sweetness’ across all the data published will there truly be space for development effectiveness and accountability.