A methodology for merging IATI and CRS data
Our women’s economic empowerment team wanted to include as many international funding flows as possible when analysing development assistance data. In this blog, Benjamin Honey explores the pros and cons of merging the two main data sources and describes the methodology we employed.
In October 2020, Publish What You Fund embarked on a multi-year project to track international funding for women’s economic empowerment, women’s financial inclusion, women’s empowerment collectives and gender integration in three focus countries (Kenya, Nigeria, and Bangladesh). To maximise the number of different funding flows that we could include in our analysis, we embarked on a mission to merge data from the two main sources of development assistance data available. Namely, the Organisation for Economic Co-operation and Development’s Development Assistance Committee (OECD-DAC)’s Creditor Reporting System (CRS) and the International Aid Transparency Initiative (IATI). Here we explore some of the benefits of merging these datasets while also outlining some of the difficulties that we experienced in the process.
The first and most obvious reason for merging CRS and IATI data, is to be able to conduct analysis on a larger range of activities than could be done using only one of the sources. In addition to the data from the OECD-DAC member countries that are included in the CRS database, we were able to analyse data from activities further down the aid chain by including IATI data as well. For example, activity details from non-governmental organisations (NGOs) and women’s funds; from non-traditional donors and data from large philanthropies. Although data in the IATI Standard contains a wider array of organisations publishing to it compared to CRS, there still remain some major donors who only publish their data to the CRS database. Figures 1 and 2 illustrate the top 6 donors that only publish to one of the sources along with their total global disbursements and commitments in 2020. Merging IATI and CRS data ensured that as many major international donors as possible were included in our research.
A brief word on duplication
To ensure that our final merged dataset didn’t include the same activities twice (from both CRS and IATI), we identified donors which were present in both data sources and removed duplicated donors’ activities from the source that had the least rich data. The full methodology of this process can be found here. Table 1 shows the total project count and funding amounts for our merged dataset (duplicate projects removed) for Ethiopia, along with the IATI and CRS databases too. As we can see from the table, our merged database allowed us to gain $11bn in extra total funding data compared with the IATI data, and $22bn compared with CRS.
|Total Value ($USD)||$52,619,085,380||$41,717,285,776||$63,932,604,302|
|Number of titles||16493||19464||25488|
Due to the research focus of our project, we were interested in using the OECD gender marker in our analysis. Therefore, when making decisions for each donor on which data source to keep, in addition to comparing the Number of projects and Total funding amount, we also compared the number of OECD gender marked projects. As can be seen in figure 2, the use of the OECD gender marker is much higher in CRS data than in IATI. However, as shown in table 1, only using CRS data would result in 25% fewer projects with 35% less total funding available for analysis. By merging CRS and IATI data, we were able to create a database which was specific to the needs of our project, with a high number of projects and funding, while also maximising the number of gender marked projects in our database. The principles we used to conduct data quality decisions accounting for the gender marker would be the same for any other variables of specific importance to a project. For example, SDG targets, project descriptions, or other policy markers.
Disadvantages and difficulties in the merging process
The process of merging IATI and CRS data for our three focus countries took our team of researchers many months in which we came across a number of challenges. The full process is outlined in our full methodology document where we describe these challenges in detail, however, here are outlined some of the notable challenges we faced:
- Identifying duplicate donors across sources when donor and agency names don’t match up: For example, the United Nations Population Fund publishes its activities under the name ‘UNFPA’ in CRS, whereas in IATI they publish under ‘United Nations Population Fund’. With many other examples of this type of inconsistency in organisation names, we needed an in-depth knowledge of donors’ publishing structures in order to correctly identify all such cases of duplication.
- Donors changing their names: For example, the UK Department for International Development (DFID) changing to UK Foreign, Commonwealth & Development Office (FCDO) posed a challenge when matching duplicate donors across the data sources.
- CRS and IATI have different publication rates: CRS data is only published annually with a minimum publication delay of 11 months, whereas IATI includes current data and is updated in real time. This poses a disadvantage in comparison with using only IATI data because we are only able to analyse trends up to the point that CRS data is available.
You can view the analysis we carried out using these merged datasets on our Women’s Economic Empowerment project page here, and the full methodology for merging the data here.