Most popular ▴ See a list of all my posts! The experts got it wrong. Why are there no great Windows 10 apps? How moving the Capital helps Hartlepool. Gender bias calculator The Centre of the UK Defending Uber BusTracker Imagination not needed. Part 1. Imagination not needed. Part 2. Imagination not needed. Part 3. Why Birmingham fails Who is London? Innovation on buses. Heathrow

PDFs and Data ▾ Global open data and PDFs. Improving PDFs for Science. Improving PDFs for Planners. PDFAttacher. A Clearer Plan Hybrid PDFs PDF test-off. PDF Profiler Making PDFs play nicely with data

Housing ▾ Counting households. 1. Counting households. 2. The housing market works (where we let it) Hexmaps Adonis is wrong on housing Car free Birmingham

Regional Growth ▾ Measuring tech in the UK and France in 10 steps. Defending the Zombie graph. Channel 4 must move to Mancheseter Measuring innovation 1: meetups Measuring innovation 2: scientific papers. The UK city-size abnormality. Cities not cheese: why France is productive. How moving the Capital helps Hartlepool. Industrial Strategy. Leeds Growth Strategy 5: Limits. Leeds Growth Strategy 4: Focus. Leeds Growth Strategy 3: Inclusive growth. Leeds Growth Strategy 2: Where to grow? Leeds Growth Strategy 1: Why grow? Imagination not needed. Part 1. Imagination not needed. Part 2. Imagination not needed. Part 3. Inclusive growth. The BBC in Manchester 1 The BBC in Manchester 2 What works (growth) North-South divide: we never tried Imitating Manchester Why Birmingham fails Who is London? Researching research Replacing UK steel The Economist & The North The State of the North, 2015 Move the Lords! Calderdale Digital Strategy Maths of inequality Income by MSOA Heathrow and localism The NorthernPowerhouse Centralism and Santa Claus Yorkshire backwards London makes us poor

Transport ▾ Crossrail 2: Where trust in experts dies. Pacers: crap trains, worth keeping. A Yorkshire transport policy. Stop telling me to learn from London. Fixing it ourselves: bus data in the North. Open fare data will be hard. Transport is too complex! Investment is political London loses when it blocks Leeds' growth The Centre of the UK Defending Uber BusTracker Train time map What works (growth) The Value of Time Innovation on buses. Heathrow 1975 WYMetro Plan

Politics & Economics ▾ GDP measures are like toilets. The UK's private postcodes restrict innovation. Yorkshire could learn from Ireland's success. Alternatives to GDP are a waste of time. Fiscal balance in the UK "Not like London" Innovation takes time to measure Fifa and the right In defence of the € GDP mystery Liberal protectionists 5 types of EU voter Asylum responsibilities STEM vs STEAM The Economist & Scotland BBC Bias? Northern rail consultation What holds us back? Saving the Union Summing it up

Positive ▾ Bike Lights Playful Everywhere Greggs vs. Pret Guardian comment generator Consult less, do more! More things for Leeds! Cartoons PubQuest: Birmingham

Tech ▾ What's holding back opendata in the UK? Anti-trust law saved computing 1 Anti-trust law saved computing 2 Open Data Camp Cardiff Why are there no great Windows 10 apps? Tap to pay. Open Data in Birmingham Defending Uber BusTracker Train time map Building a TechNation How the UK holds back TechNorth GDS is Windows 8 OpenData at the BBC SimFlood SimSponge See me speak Digital Health Leeds Empties Leeds Site Allocations Building a Chrome extension I hate webkit Visualising mental health Microsoft's 5 easy wins Epson px700w reset Stay inside the Bubble

Old/incomplete ▾ Orange price rises The future of University Cherish our Capital Dealing with NIMBYs Sponsoring the tube Gender bias calculator MetNetMaker Malaria PhD Symbian Loops Zwack Kegg Project The EU Eduroam & Windows 8 Where is science vital? The Vomcano 10 things London can shove Holbeck Waterwheel

Last modified: 25 May 2014

Building a really simple page-scraping Chrome extension.

and understanding how it works.

Want to parse the content of a website? More comfortable coding in javascript and displaying your results in HTML than you are using Scrapy at a Python command prompt? A google Chrome extension might be perfect for you.

Sadly, the best guide to building a simple but functional page-scraping Chrome extension is quite complicated. So I’ve learned from it and written a much simpler Hello World Chrome extension for page scraping.

Download the source code and the packed extension, and have a look, it's less than 40 lines of code. If you need help installing it follow Google's instructions. For the important part of understanding how it works, I've drawn some pictures.

Get content from a page.

My example will get content from the currently loaded page and display it in the Chrome extension's popup. Here the active tab is on Nokia's homepage and that title is displayed in my extension's popup.

Bundle an extension.

There are five important parts to the extension. The logo, the popup page's html file, the popup page's javascript file, and the manifest.json file which tells Chrome how to bundle these files together into an extension.

Inject the payload.

The fifth important part of the extension solves the cross-site scripting problem. An extension is effectively a little website, and for sensible security reasons scripts from one website can't easily access the content on another website. popup.js can access the content on popup.html and change it, but it's blocked from accessing the content of the currently loaded web page unless that page specifically allows it, which it almost never will.

Chrome has access to both pages and you can tell it to inject and run the payload.js script in the current webpage. Once injected the payload.js script can access and change the content of the currently active tab and send messages back to the popup.js script using the chrome runtime messaging service. Since we've set popup.js as a persistent background script in the extension manifest it will keep listening for messages from popup.js until Chrome closes.

Add more features.

If it all works properly, your extension should display the current tab's title. Once you've seen how it works you can extend this Hello World extension however you like. The payload.js script can do anything it likes with the current web page, including navigating somewhere else, or clicking a link. The chrome runtime messaging service supports JSON objects so you can easily pass formatted data between your extension and the current page.

Thanks for reading, and in case you missed the first download link,

Download the sourcecode and the packed extension.

comments powered by Disqus