In early May 2024, Anastasia Bektimirova and Allan Nixon at think tank UK Onward published a blog post suggesting that “The Government should establish a British Library for Data”. This was part of a wider plan to secure a leading place for the UK in frontier technologies and the future global economy. Just six weeks later in mid-June 2024, the Labour party included a commitment to create a National Data Library in their manifesto. They won an election on that manifesto in early July.
Two months from posting to policy is a fantastic achievement.
But policies are not actions and the new UK government has a lot to do. Despite tax rises, it has very little spare money to do it with. The public would barely notice the government missing this commitment. They wouldn’t notice at all if the commitment were met by a token gesture such as requiring The British Library or the Alan Turing Institute to rebadge some existing work and staff as a National Data Library.
A winning manifesto commitment does not guarantee action.
The policy paper to policy paper amplification pipeline.
On the eve of Labour’s election win, the calls for that action began. Gavin Freeguard published a blog post with some suggestions on how to think about a data library. This included a useful link to a Tony Blair Institute for Global Change report published in late May. That report was a thorough proposal for a National Data Trust which, though focused on health, had very strong National Data Library vibes.
The policy paper to policy paper amplification pipeline was now in full flow.
Then came the PDFs. The ESRC and The Wellcome Trust put out a technical white paper challenge and five groups responded with their visions for a National Data Library.
Eleven things to read, plus the Tony Blair Institute’s closely related work, plus the UK’s AI Action Plan which had either morphed into or always been called the AI Opportunities Action Plan and which, in section 1.2, made its own suggestions for what a National Data Library should do.
Because it’s 2025, I’ve put them all into Google’s fantastic NotebookLM product, listened to a podcast of two AI-Californians discussing the documents, asked lots of questions, including a few silly ones, and used another AI to check lots of what you’re about to read.
Let’s go.
What I read
From the first blog post to the most recent paper the single word that best summarises what I read was “London”. This might seem surprising since all fourteen documents carefully avoid suggesting a location for the new National Data Library they support, but I can explain.
The cover image and the first image of the blog post that first proposes a National Data Library is an AI generated image of a cathedral-like library full of servers with a large main window looking out over the recognisable skyline of the City of London.
To my best guess, a guess backed up by my preferred AI assistant Le Chat Pro by Mistral, 11 of the 14 documents were written and published in London by authors who mostly live and work in London. The exceptions are Oxford and Swindon (twice). The most recent piece, published by the Tony Blair Institute for Global Change in London, and written mostly by authors who live in London, is signed by 11 people. Of these 11, 10 live and work mostly in London. The other is in Cambridge.
In reading I found it useful to get confirmations, especially via James O’Malley and Gavin Freeguard, that no-one in the UK government yet has a clear vision for what the National Data Library they’ve promised should be or do.
Beyond that, and as I digested the approximately 300 A5 pages of text I’d read over a few weeks, and discussed them with colleagues and AI assistants, the following positive and negative opinions emerged.
Positively,
I most liked the documents which avoided using the phrase “user-centred”.
My favourite proposal was the Bennett Institute for Applied Data Science in Oxford’s submission. In keeping with the location of the institute it was in the minority of documents to use Oxford commas. More seriously, it valued a focus on action and output over vaguer talk about being “user-centred” and joining things up. Ben Goldacre and the team’s work on OpenSAFELY raises the value of their opinions enormously.
The UK AI Action Plan backed up this preference for a focus on action and output with a clear proposal that the UK government should “rapidly identify at least 5 high-impact public datasets it will seek to make available to AI researchers and innovators”. Great suggestion. My top ask is the UK’s Bus Open Data archives, but I’ve got a longer list.
The Governing in the Age of AI report by Anastasia Bektimirova and the Tony Blair Institute for Global Change most clearly and thoroughly made the case for a network of National Data Librarians who would be embedded across government to accelerate the release of data, its maintenance, and ensuring it is accessible and visible to those wanting to use it for analysis. I agree.
The HDR UK White Paper’s suggestion of the National Data Library providing “data wrangling services” backed this point up well from another angle.
I of course enjoyed the TBI’s inclusion of a proposal of a national ID card. They are right, it would make data easier for more people to use, and I don’t think we’re sacrificing much privacy by making it easier for data the UK government already has about us to be used for good purposes — it’s being used for the bad purposes we worry about already.
Negatively,
I found constant pleas to break down siloes and join up data across government and partnership working and lots of other feel good but vague ideas frustrating. I was reminded so often of Joe Hill’s blog post on the dangers of “Hubs Theory of Everything” and “Everythingism” that I read it again about five times. Linked data across departments is important, but there’s a lot that we can do more quickly and more easily.
I was unconvinced by any of the proposed commercial models. I have never seen data marketplaces of government data work. The Rail Data Marketplace is the latest in a list of failures and other data portals like The Consumer Data Research Centre which limit commercial use of their data, even data that was released under an open licence, seem to do little better. The only data I can imagine being sold is NHS data, but doing so would unavoidably preference precisely the big and often foreign companies that would cause most concern among the public. I think a National Data Library will be a large cost to taxpayers and the case should be made for it on those terms, probably as an alternative to other research and development funding.
A lot of the proposed action of a National Data Library around engaging with citizens and preserving public trust felt outdated. I am increasingly convinced that the public are more concerned with governments being efficient and accountable to them than they are with privacy and trust earned beyond elections.
Suggestions that a National Data Library should provide compute capacity to analysts felt outdated. I suspect that in almost all cases the limiting cost factor on the public use of very large datasets will be the cost of the government releasing that data within inefficient government IT contracts and not the cost of entrepreneurs analysing the data.
So what do I think the National Data Library should be?
What should a National Data Library be?
My starting position is to oppose the creation of any new national institution in Britain. Our central government is the largest it has been since WW2 and we have the most centralised government in the world. This centralisation contributes to the UK having the weakest national economy and by far the poorest large non-capital cities in North Europe and North America. Further growing the UK central government and its institutions, no matter how arms-length they are claimed to be, is likely, on average and versus the counterfactual, to decrease the prosperity of Britain and of Britons.
I would oppose the creation of new national institutions in Britain even more strongly if they are likely to be headquartered in London. The disproportionately high spending on public research and development and national institutions in the capital is a subsidy to companies in South East England and a substantial dispreference to prosperity in the rest of the UK. For at least fifteen years the UK government has created and funded new national institutions focusing on data, tech, and AI almost exclusively in London and has thus incentivised companies in these sectors to relocate to and grow faster in South East England. This has made our country weaker and poorer and I could not support any further institution that deepened that damage.
But if a National Data Library was created somewhere else, I might support it.
It should,
Build a small team, primarily of technologists with a focus on data analysis and of public servants and legal experts with national and local government data experience.
Back this team with more than enough funding to rapidly identify the 5 high-impact public datasets to make available to AI researchers and innovators as proposed in the UK’s AI Action Plan and get this data released by assisting the current data owners with technical, legal, and budget concerns.
Build a small data hosting service and offer to provide free hosting to public bodies claiming that data sharing costs are a barrier to them sharing data.
Celebrate successful users of public data, especially those in the private sector and in academia and fund the promotion of their work online, on podcasts, at events at the library, at international events, and in the media.
Promote the existence of public datasets and set challenges to use them in productive ways.
Provide visiting workspace and paid fellowships from 3 months to 3 years to people from public sector organisations looking to embed in the National Data Library as a National Data Librarian.
Offer visiting workspace, mentoring, and advice sessions to academics, businesses, and public sector workers looking to work with data.
Gather and collate requests for public sector data release from academics, businesses, and the public sector and work to get it released to them.
Build on the successful engagement approach of ARIA and work in the open, on the web, outside of the constraints of the gov.uk style guide and domain limitations.
Be based in a single physical location outside of South East England, not a distributed or hub and spoke model.
I would argue strongly for that location to be Leeds for two big reasons,
NHS data was highlighted as the most obvious big source of value to be unlocked by a National Data Library. By winning in a competitive market, over 90% of the UK’s GP data is held by two companies based in Leeds: TPP and EMIS. NHS England is based in Leeds and the functions that take over from it are likely to remain based in Leeds. Much of the NHS’s data sharing technologies and platforms are developed and maintained in Leeds by companies such as BJSS. These organisations, especially TPP and EMIS, were key to the success of OpenSAFELY, the widely celebrated example of excellent data use of the type a National Data Library is designed to help more of happen. If getting more value from health data is a core motivation for setting up a National Data Library then picking any other location is a decision to ignore these market signals and miss an opportunity to prove that British national institutions back excellence wherever it emerges, not just near to where they overwhelmingly already exist.
The long-standing promise by the British Library to establish a presence in Leeds remains unfulfilled and their recently announced and enormous expansion plans in London suggests they lack interest. The British Library coming to Leeds has long been an anchor for large plans around innovation and regeneration and much of The Leeds Innovation Arc is already under construction. National government investment such as the National Data Library is key to achieving the potential of this plan for Britain’s fourth largest city and offers the last hope of saving the Grade-I listed Temple Works in the city centre which is a perfect location for a Kings Cross style development.
Especially following cancellation of HS2 and NPR The Temple Works site, a Grade-I listed building in Central Leeds, can probably only be saved with public money and investment of the type long-promised by The British Library in a Northern site to rival its headquarters in St. Pancras.
But if someone else wants to make the case for Birmingham, Manchester, Liverpool, Newcastle or Glasgow, I’m keen to read it.
Thanks for reading. You may agree with me. You may not. As always, I really appreciate coherent disagreement, especially short blog posts or long comments below. I think that calling me silly or daft is a waste of everyone’s time.