Wednesday, March 9, 2016

Raspberry Pi Hadoop Cluster

I am currently in the process of learning Hadoop architecture, administration and the MapReduce programming model. I started reading about Hadoop and took free online courses but there is something missing. I wanted to try out what I read or what I was told on those training. In some exercises in the training a VM was used as a single node Hadoop server. But for me it doesn’t make sense, so I tried out setting up more VM and configured them into a Hadoop cluster. But still the experience was not very satisfying because it lacks the touch of the hardware. I wasn’t really running a cluster but simply just a bunch of virtual machines connected together in a virtual network. So I thought “How about I build a cluster of cheap computers.” In fact, that’s what Hadoop was designed for, to run on a cluster of commodity hardware. I tried googling around about cheap cluster computer and found a few blogs and videos about guys who ran MPI and Hadoop on a mini cluster of Raspberry Pi boards. So I decided to build my own mini cluster of Raspberry Pis. After all, the best way to learn new things is to get your hand dirty.

There are a couple of Raspberry Pi boards laying around my desk at home. I used them in some experiments before. But I needed more, so I ordered 3 more of these boards. Raspberry Pi is a S$50 single board computer with ARM7 1Ghz  dual-core CPU, 1GB of RAM and a microSD slot for storage.

I also bought SD cards and, USB power cables, an 8-port network switch and a high-current USB power supply/charger. Luckily I was able to find all off them locally in the neighbourhood shops except for the Raspberry Pi which I ordered online and received the next day.

I started thinking about how to clump this boards together in a rack-like structure where it is easy to cable them up to a network switch. I found few ideas online about using some stand-off bolts and nuts and acrylic boards to stack them up together. The problem is I don’t have these materials so I started walking around the house to come up with ideas and to look for materials. The mounting holes in the Raspberry Pi are 2mm wide so I started looking for bolt and screws of this size but didn’t find any. Then I went to the laundry area where I found a wire-made clothes hanger. The wire core metal is around 2mm diameter covered with a PVC plastic insulator around 1 mm thick. I grabbed it and decided to build a mini rack out of it. I created a half round loop and literally stitched the boards together.

The network switch I bought is powered by a 5V power supply at 600mA so the USB power supply can actually power it.
I downloaded a Linux distro called Raspbian Jessie it is a lightweight variant of Debian linux intended for headless Raspberry Pi server. It’s a stripped down version of Raspbian Wheezy without the UI. I then updated it with latest libraries and installed Java 8 JDK.

I followed the steps by that was described in this blog but with some extra steps to configure a second interface card with a Wi-Fi dongle and voila. I now have a Hadoop cluster running.

I later bought a set of acrylic stackable case to make the rack more stable and look good. I stripped out the board of the cheap network switch (I bought online for $7 from china) and mounted it the one of the acrylic stackable case so it will look uniform. Now I am hadooping with this little beast. I later upgraded to Hadoop 2 and installed Apache spark for future Apache Spark experiments and adventures.


  1. You are MacGyver with a laundry trick 😋 ! I wish you keep us updated with real life home hardware experience 'cause so far doing non performing Hadoop helloword with VM is frustrating ! Congrats for this achievement !

  2. Hi, I am really happy to found such a helpful and fascinating post that is written in well manner. Thanks for sharing such an informative post.R Programming Online Training | Hadoop Online Training