• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Publish What You FundPublish What You Fund

The Global Campaign for Aid and Development Transparency

  • RSS
  • Twitter
  • Vimeo
  • Youtube
  • LinkedIn
  • Facebook
  • Instagram
  • Bluesky

NEWSLETTER

CONTACT

  • Why it matters
    • Why transparency matters
    • Data use examples
    • Research into aid transparency
    • The Story of Aid Transparency
    • What you can do
    • Case studies
  • Aid Index
    • 2024 Index
    • 2022 Index
    • Comparison Chart
    • Methodology
    • Index Archive
    • Tools
    • The Power of the Aid Transparency Index
  • DFI Index
    • DFI Transparency Index 2023
    • DFI Research
    • DFI Transparency Tool
    • FAQs
  • Our Work
    • Women’s Economic Empowerment
    • Localisation
    • Mobilisation
    • Climate Finance
    • UK Aid Transparency
    • Gender Financing
    • Humanitarian Transparency
    • US Foreign Assistance
    • IATI Decipher
    • Webinars
    • Work Under Development
  • News
    • Reports
    • News
    • Events
    • Blog
  • About Us
    • Board
    • Team
    • Our transparency
    • Our Funders
    • Jobs
    • Annual Reports
    • Friends of…
    • FAQs
  • Training
Show Search
Hide Search
Home / Blog / Download all IATI data, lightning fast!
blog

Download all IATI data, lightning fast!

By Andy Lulham | May 29, 2018 | Blog

On Thursday, John Adams (IATI TAG chair) asked IATI discuss:

What’s the current recommended way to download the entire IATI dataset in XML? Separate files are OK.

By Friday, I’d made a Minimum Viable Product:

First iteration

By Monday, it looked a bit more polished:

Second iteration

You can view the site here.

Wait, what? But… Why?

IATI Data Dump provides a downloadable zip file of all XML data on the IATI registry, updated daily. While the raw XML is around 7 gigabytes, it compresses down to just 350 megabytes (a whopping 95% saving!) And (without getting too technical) by doing this in one HTTP request instead of ~6,000 (one per dataset), it is muuuuch faster to download. So with a broadband internet connection, you can download the lot in under a minute.

A raw data dump is a really basic requirement for a new IATI datastore. I.e.:

As an analyst,
I need access to all data on the IATI registry, unprocessed and unfiltered,
so that I can analyse it holistically in order to generate insights.

Or even:

As an IATI tool developer,
I need access to all data on the IATI registry, unprocessed and unfiltered,
so that I can process it before presenting it to a user.

It’s so basic, in fact, that most IATI tools and portals already implement it. d-portal does it (†), OIPA does it, the IATI Dashboard does it, the IATI Datastore does it. So at the moment, all of these tools (and lots more) visit the registry, and make a list of every publisher, and the locations of every dataset for every publisher. Then each of them visits the servers of every publisher, downloading each dataset individually. None of these tools make the unprocessed and unfiltered output available as a bulk download. So rather than duplicating the work, why not do it once and share?

Hold on… Doesn’t this create a single point of failure, Andy?

How perceptive of you! Yes, that’s certainly true. But, note that with the IATI Registry API, we already have a single point of failure (and indeed we’ve hit upon this problem recently.) The difference here, though, is that we have a fallback option – downloading every dataset individually. IATI Data Dump just provides a speedy shortcut.

Is it finished?

It’s never finished! But you’re welcome to use it. This is intended more as an illustration of a feature that the proposed IATI datastore could provide.

In the short term, the big piece that’s missing is a clear log of what happened when fetching the data. Perhaps a publisher’s data is mysteriously missing from the zip. Where did it go? It’s likely their server had a problem and was unreachable. But this information should be made available somewhere. I’ve made a ticket for that; I’ll address it very soon.


†: In fact, d-portal previously relied on the IATI Datastore for this. But it didn’t scale well, so they switched to downloading the data directly.

Primary Sidebar

NEWS Topics

Africa Agriculture Aid transparency Aid Transparency Index Australia Canada Climate Change Data Revolution Data use Data Visualisation Development Finance institutions DFI Spotlight DFI Transparency Tool European Commission Financing for Development France Freedom of Information Gender Germany Humanitarian Impact International Aid Transparency Initiative Japan Jobs Joined-up data Kenya Letters Localisation MDGs mobilisation Newsletter OECD Open data Open government Press Releases Publish What You Fund Road to 2015 Sustainable Development Goals UK United Nations US USAID Webinar Women's Economic Empowerment World Bank

Twitter (X)

  • Contact Us
  • Copyright
  • Privacy Policy
  • RSS
  • Twitter
  • Vimeo
  • Youtube
  • LinkedIn
  • Facebook
  • Instagram
  • Bluesky

Publish What You Fund. China Works, 100 Black Prince Road, London, SE1 7SJ
UK Company Registration Number 07676886 (England and Wales); Registered Charity Number 1158362 (England and Wales)