• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Publish What You FundPublish What You Fund

The Global Campaign for Aid and Development Transparency

  • RSS
  • Twitter
  • Vimeo
  • Youtube
  • LinkedIn
  • Facebook

NEWSLETTER

CONTACT

  • Why it matters
    • Why transparency matters
    • The Story of Aid Transparency
    • What you can do
    • Case studies
  • Aid Index
    • 2022 Index
    • Comparison Chart
    • Methodology
    • Index Archive
    • Tools
  • DFI Index
    • DFI Transparency Index 2023
    • DFI Research
    • DFI Transparency Tool
    • FAQs
    • Project Advisory Board
  • Our Work
    • Women’s Economic Empowerment
    • Localisation
    • Gender Financing
    • Humanitarian Transparency
    • US Foreign Assistance
    • Data Use
    • IATI Decipher
    • Improving UK Aid Transparency
    • Webinars
    • Work Under Development
  • News
    • Reports
    • News
    • Events
    • Blog
  • About Us
    • Board
    • Team
    • Our transparency
    • Our Funders
    • Jobs
    • Annual Reports
    • Friends of…
    • FAQs
Show Search
Hide Search
Home / Blog / Download all IATI data, lightning fast!
blog

Download all IATI data, lightning fast!

By Andy Lulham | May 29, 2018 | Blog

On Thursday, John Adams (IATI TAG chair) asked IATI discuss:

What’s the current recommended way to download the entire IATI dataset in XML? Separate files are OK.

By Friday, I’d made a Minimum Viable Product:

First iteration

By Monday, it looked a bit more polished:

Second iteration

You can view the site here.

Wait, what? But… Why?

IATI Data Dump provides a downloadable zip file of all XML data on the IATI registry, updated daily. While the raw XML is around 7 gigabytes, it compresses down to just 350 megabytes (a whopping 95% saving!) And (without getting too technical) by doing this in one HTTP request instead of ~6,000 (one per dataset), it is muuuuch faster to download. So with a broadband internet connection, you can download the lot in under a minute.

A raw data dump is a really basic requirement for a new IATI datastore. I.e.:

As an analyst,
I need access to all data on the IATI registry, unprocessed and unfiltered,
so that I can analyse it holistically in order to generate insights.

Or even:

As an IATI tool developer,
I need access to all data on the IATI registry, unprocessed and unfiltered,
so that I can process it before presenting it to a user.

It’s so basic, in fact, that most IATI tools and portals already implement it. d-portal does it (†), OIPA does it, the IATI Dashboard does it, the IATI Datastore does it. So at the moment, all of these tools (and lots more) visit the registry, and make a list of every publisher, and the locations of every dataset for every publisher. Then each of them visits the servers of every publisher, downloading each dataset individually. None of these tools make the unprocessed and unfiltered output available as a bulk download. So rather than duplicating the work, why not do it once and share?

Hold on… Doesn’t this create a single point of failure, Andy?

How perceptive of you! Yes, that’s certainly true. But, note that with the IATI Registry API, we already have a single point of failure (and indeed we’ve hit upon this problem recently.) The difference here, though, is that we have a fallback option – downloading every dataset individually. IATI Data Dump just provides a speedy shortcut.

Is it finished?

It’s never finished! But you’re welcome to use it. This is intended more as an illustration of a feature that the proposed IATI datastore could provide.

In the short term, the big piece that’s missing is a clear log of what happened when fetching the data. Perhaps a publisher’s data is mysteriously missing from the zip. Where did it go? It’s likely their server had a problem and was unreachable. But this information should be made available somewhere. I’ve made a ticket for that; I’ll address it very soon.


†: In fact, d-portal previously relied on the IATI Datastore for this. But it didn’t scale well, so they switched to downloading the data directly.

Primary Sidebar

NEWS Topics

Africa Agriculture Aid transparency Aid Transparency Index Australia Budget ID Canada China Climate Change Data Revolution Data use Data Visualisation Development Finance institutions DFI Spotlight DFI Transparency Tool European Commission Financing for Development France Freedom of Information Gender Germany GPEDC Humanitarian Impact International Aid Transparency Initiative Japan Joined-up data Kenya Letters MDGs Newsletter OECD Open data Open government Press Releases Publish What You Fund Road to 2015 Sustainable Development Goals Sweden UK United Nations US Webinar Women's Economic Empowerment World Bank

Twitter

  • ICYMI - See the key findings from the #DFITransparency Index – and why and how development banks should improve the… https://t.co/JAXlPB0Wx0
    Jan 26, 2023
  • #DFITransparency Index launch moderator @GMIngramIV concludes the event: "I'm impressed with what I've heard today… https://t.co/7CqdqeVlDZ
    Jan 25, 2023
  • Samantha Attridge from @ODI_Global: "why do we need mobilisation data? We need data to inform decision making, allo… https://t.co/Q2c1PFN4CU
    Jan 25, 2023
FOLLOW US
  • Contact Us
  • Copyright
  • Privacy Policy
  • RSS
  • Twitter
  • Vimeo
  • Youtube
  • LinkedIn
  • Facebook

Publish What You Fund. China Works, 100 Black Prince Road, London, SE1 7SJ
UK Company Registration Number 07676886 (England and Wales); Registered Charity Number 1158362 (England and Wales)