← Posts

How I built a Web Scraper to create a Bin Collection API

January 4th, 2019 - 3 min read

A few months back I wrote some code that would go to my local council's website, search for a street name, open the link, get the bins for that week. When I got it working I was amazed at how easy it was. I was able to just take out content from another site and present it however I wanted.

I wrote this project with the end goal of turning it into Voice skills for Alexa and Google Assistant, which I will do, just in a few weeks when I have some spare time.

bin-scraper

Above is the PHP function I wrote to serve the API request. I'm just going to walk you through the code.

So the first thing it does is sets up three arrays, the binArray, the timeArray and the locationArray. These will be used later to store data we capture during our scraping.

The next two variables hold the two URL parameters required for the request. So the API request URL would look like this. https://bin-collections.herokuapp.com/api/bin?street=Calderwood%20Road&area=camglen

We then get have four if statements. Each of the if statements check if the $area parameter equals to one of the areas. It then sets the base URL for the search results on the local council's website.

The next thing we do is to setup Goutte, which is a PHP web scraper library. I used a Laravel wrapper for Goutte in this project. We send Goutte to the base URL, the one we set in the last bit.

On the page we then go ahead and find the result links and we click on the first one we find and we grab the URL from that and store that in the $linkvariable.

Now we go to the street page we found in the search results.

The next part of the code, we find parts in the web page and take out the text of it and we store it in the arrays we created a minute ago.

So for example, we store the location in the locationArray and so on.

Then we go ahead and send a JSON response with the content of each of the arrays.

On the frontend of my bin collection site, I'm using a Vue component that makes HTTP requests to the API I built.

I'm pretty proud of the whole thing. It could probably be simplified quite a bit but at least it works.