A cityscape of Manchester at night.

Clusters and sectors.

Tom Forth, .

A country like Australia doesn’t have to worry much about economic clusters.

Adelaide, Melbourne, Sydney, and Brisbane on the South East coast are separated by over 400 miles each. Perth is 2000 miles to the West of them all. With small exceptions, mining for example, these cities operate independently of each other and independently of the vast rural parts of the country in between them. Australia has cities and it has economic clusters and the two overlap almost perfectly.

Britain’s cities are closer together; Sydney’s single metro line is the same length as the distance from Manchester to Sheffield for example. They are so close that they can share workers, colleges, universities, airports, arenas, research centres, and businesses much more easily.

Rather than acting economically as individual cities, British cities tend to merge and mingle into clusters of activity.

Let’s think this through with an example.

AstraZeneca.

AstraZeneca is a large British pharmaceutical company. Its headquarters is in Cambridge. When Cambridge South station opens right next door to its main building later this year it will be 50 minutes from there to its commercial headquarters in Kings Cross, London.

That commercial headquarters is right next to the UK’s largest centre of public sector biomedical research at The Crick Institute, the UK’s newest scientific funding agency, ARIA, the UK’s national AI institute, The Alan Turing Institute. In addition, there’s Novo Nordisk, GSK, Google, the British Library, the Wellcome Trust, and much more. And on every one of these sites, and on almost every train and cafe between them, there are graduates and researchers (past and current) from Imperial College, UCL, and Cambridge university, with experience living and working at both ends of the trainline, at places in between, and at places within two hours in every other direction from Kings Cross, including Paris, Brussels, and Leeds.

This is a cluster.

Why clusters?

Economists and policy makers care about it because it is out of this cluster that Britain wins Nobel Prizes, cures diseases, improves lives, attracts investment, pays great wages, and raises taxes that fund our public services. But interest in clusters is much broader and deeper than that.

I know it when I see it.

Clusters can be easy to see when they’re obvious, but they are very hard to define objectively. They have no neat boundary, either in terms of sector or geography, and they can change quickly.

If we had to give a name to this cluster containing AstraZeneca we’d probably start with “Pharmaceuticals in London and Cambridge”. But we’d quickly see this was insufficient.

Google’s presence in Kings Cross is largely focused on AI. AstraZeneca’s presence there is the commercial arm of a pharmaceutical company. These don’t feel like the same sector and many people and most automated approaches to identifying a cluster wouldn’t see any connection between the two. They probably wouldn’t pull in AstraZeneca’s Cambridge sites to the cluster even if they did.

But we know from Google Deepmind’s work on AlphaFold (2020) that the overlap between the two companies’ activities is strong. And we can see via Novo Nordisk’s recently opened (2024) AI Pharmaceuticals hub in Kings Cross that this, and the other activity in the area, attracts other companies and organisations doing similar things. And we can see from the two dates I’ve added just how recent this huge change to the nature of the cluster is.

So now we’ve expanded the sector of our cluster to AI from just pharmaceuticals, and we could easily expand its geography further too.

We could argue that, since Novo Nordisk has a second R&D hub in Oxford, the opening of its London office had brought Oxford into the cluster. Pretty soon we’d have a decent case for adding Novo Nordisk’s headquarters in Copenhagen plus AstraZeneca’s main sites in Gothenburg and Stockholm. And of course both companies are present in Boston in the USA.

We could go on forever, and that’s before we even get on to remote work.

But the comforting complex truth of saying that the whole word is a cluster of everything and everyone doesn’t add value. It doesn’t help us do most of the things that we are hoping to do better by understanding what clusters are and what they do.

Clusters are mostly made of people.

I don’t have a full or convincing explanation of why, so if you disagree I’m not going to change your mind, but I think people working and socialising in the same places and the same time matters to clusters. Remote work and online events let people join in, but on average I see the physically present, especially the young, improve faster.

Improving faster means getting to the top of any niche quicker. And getting into the top fifth of a niche, however narrow, is extremely valuable both in itself and as a sign that a person is capable and interested in excellence no matter what the niche.

Proximity matters.

Clusters are people in proximity doing similar things. We could write a mathematical equation to define the strength and size of a cluster if we wanted to by considering the number of people in proximity doing similar things and we’d soon see that we needed to think about how we’d define and measure “proximity” and “doing similar things”, assign weights to each, and then set some threshold below which a person or an organisation in a place doing a thing was no longer part of cluster.

Algorithms for doing this are unsurprisingly called clustering algorithms.

LinkedIn.

The company with the data that would best let us measure clusters and set parameters well in clustering algorithms is LinkedIn.

Microsoft achieve the $26bn value they paid for LinkedIn in other ways, but I’d love to work with their data to extract the extra value that it has in measuring clusters.

People on LinkedIn diligently list their qualifications, their career trajectories, and their locations, usually both where they live and where they work. Enough users upload their CVs and receive endorsements of their skills and performance from clients and colleagues to enrich our understanding of their skills and interests. Companies and organisations list jobs on LinkedIn, people apply to those jobs, and LinkedIn knows which applicants were successful and which weren’t.

Within the parts of the economy where their product is well-used, LinkedIn have a pretty comprehensive dataset of which companies do similar things in nearby places and how ideas and skills flow across the world’s economy.

Coming back to Kings Cross, summary data of over 5000 former and current employees of DeepMind are free to view on LinkedIn. We can see where they live, where they went to University, what they studied, and what roles they do now. With access to LinkedIn’s data we could see even more; what they were doing before they joined DeepMind, where they went after leaving DeepMind, how quickly they got promoted, or how quickly they left, and what they did next. And for each previous employer, each future employer, each university, each school, and each person, we could do the same analysis.

But we don’t have access to LinkedIn data. Scraping it is against their terms and conditions and even where courts have ruled that scraping it is legal we can’t access much of the most valuable data since it is internal to LinkedIn.

Measurement by proxy.

Thankfully we have good proxies for the data LinkedIn has, and sometimes better data that LinkedIn doesn’t have.

We have the websites of companies, which tell us what they do and often how they do it, even if indirectly. We know what jobs these companies advertise, what pay they offer, and what skills they request. We know what universities exist in which places, how many students they train each year in which degrees, the job prospects of those students, and how much research the university produces in what fields. We know what events happen in which places on which topics, though since both Meetup and Eventbrite began limiting their APIs this has become much less complete. We have data from censuses and travel surveys on how far people travel for work of different types.

We have the tools needed to gain a decent understanding of clusters of economic activity in most of the developed world. And if we can do that we can start to provide value to the non-exhaustive list of users who want this data that I listed back in the “why clusters?” section.

Lines on maps and codes on spreadsheets.

Going from the complex truth that the whole word is a cluster of everything and everyone to something useful to people is hard.

People want to know what skills they need to get jobs in clusters. Educational establishments want to know what to teach and research to help clusters grow. Businesses within clusters want to hire people with relevant skills and experience even if they’re not quite sure what they are. But people’s likelihood to gain a skill and join a cluster varies according to many factors, including salary, willingness to relocate or commute times, and training times. And the needs of employers change quickly and are hard to define.

Local and national governments are constrained by borders, laws, and elections. Identifying clusters that lie beyond a politician’s area of influence can leave them powerless. Only identifying clusters in small parts of a politician’s territory can leave them at risk of being beaten by politicians who will find clusters everywhere, however tenuous the claim, or of backing so few winners that the economy stagnates. Trying to align the size of local governments with the typical size of clusters is one way to reduce this risk, and many countries have added metropolitan tiers to government in recent decades to do this, but this can only ever be done poorly since the size of clusters varies by sector and changes over time. Another option is to set the thresholds of “proximity” and “doing similar things” that define whether a place is considered to be part of a cluster to more closely match the geographies that a government has to, or wants to, work in. Since there is no right answer for these parameters this is more reasonable than it sounds and this approach has many advantages. A large disadvantage is that it severely limits comparability — it’s hard to rank clusters by how strong they are if they are all defined differently.

And yet this type of ranking is exactly what governments backing future winners, businesses looking to relocate into clusters, and other businesses looking to invest in that growth ahead of time often want.

All of these considerations and many more make cluster definition and cluster policy hard. If it wasn’t for the stunning success of economic clusters like in Cambridge and at Kings Cross and the crucial role of government and business investment in that success, we probably wouldn’t bother. But with the rewards that are on offer for getting this right, we can probably justify the effort.

blog comments powered by Disqus