Thursday, February 11, 2016

How to set up spark stand-alone cluster on Ubuntu

I have been exposed to Spark lately resulting into this second post related to it.

This post assume that you know fundamental of Spark. If not then may be you should go here first.

There are several alternatives for setting up Spark cluster. Out of which the most basic one is stand-alone mode for which you won't require any external cluster management tools. Generally stand-alone mode is sufficient for small cluster of size up-to 10 nodes.

Now before you get bored let's start with cluster set-up:

For this example we will assume that we have three nodes with host-names: Node1, Node2 and Node3.

Sunday, January 31, 2016

How to install Apache Spark on Ubuntu 14.04

I have used Ubuntu 14.04 LTS for this tutorial. However following steps should work with newer versions of Ubuntu and with also other Debian based Linux distros.

Before we begin with spark, we need to install other dependencies.

Installing java:

Following set of commands will install Java8 on your system. You can skip this steps if you already have Java8 installed on your system. If you are having any other older version of Java installed then it recommended to upgrade it to Java8.

$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer

In Webupd8 ppa repository also providing a package to set corresponding environment variables ...

$ sudo apt-get install oracle-java8-set-default

In order to verify whether Java8 is successfuly installed, fire following command:
$ java -version

and output should be:

java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

You might also want to install Scala which is generally preferred language for Spark programming:

You need to download Scala from here, extract the files in some location for example /usr/local/scala/. Alternatively you can fire following set of commands to achieve the same...

$ wget
$ sudo mkdir /usr/local/scala
$ sudo tar xvf scala-2.10.6.tgz -C /usr/local/scala/

Now in order to make scala reachable from any location on your file system, we need to set/modify some environment variables.

Go to your home folder using this command: $ cd ~
And open .bashrc file in your favorite editor: $ vi .bashrc
Append following lines at the end of the file:

export SCALA_HOME=/usr/local/scala/scala-2.10.6

Execute the modified .bashrc file with this command in order to make the changes effective.
$ source .bashrc

To verify successful scala install fire this command:
$ scala -version

It should return following output

Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL

Note: We have used 2.10 version of Scala as in order to use latest stable Scala vesion (2.11) we need to manually build spark from it's source which is quite time consuming. Moreover Spark does not yet support its JDBC component for Scala 2.11. Reference:

So in-case you have requirements so that you must have to use Scala 2.11 then you can download spark source and build it by following instruction given on this link.

Now we are set to install Spark.

Download Spark from this page:

From package type drop down select pre-build package matching your Hadoop version. Also as mentioned above note, you always have option to download source from the same link and build spark tailored to your needs.

Once the download is complete, you may extract the package in some appropriate location.

We are all set. Let's test spark with some example script. Go to bin directory under the extracted package and fire this command from terminal.

$./run-example SparkPi 10

You may get following output:

Pi is roughly 3.14634

Bingo!!! Next step to get started with spark is here:

Queries, doubts, suggestions?? Comments are free ...:D

Sunday, October 18, 2015

DropCue: A new approach to manage your tasks

While surfing through web few days ago I came across this unique web-app for productivity, Dropcue. It is actually yet another task management application, however it give a unique way to do the same. As their tagline says "let's just drop a message...for everything!", you need to drop a message with some annotations, hash tags for your tasks, calendar, to-dos.

For example: "Take medicine @Everyday at evening remindbefore 15minutes . #health".

As you can see that in a single line message we have mentioned event time(@Everyday), reminder (remindbefore 15minutes) and task category (#health). Unlike conventional task applications like Wunderlist, you don't need to navigate thru several UI elements in order to set all above details.

It also allows you to created groups/teams and manage team tasks.

Ex: "Ok team , let's meet @Everyweek for code review. #codeReview."

It can be also integrated with Google calendar for email/SMS notifications.

If you found above brief details interesting then can check out this application at

Do comment your opinions. It's free ... :D

Monday, September 7, 2015

GitLab: Free Github alternative for your private project

(Caution: Trust me you might wan to skip first paragraph. It's full of emotional shit.)
It's been more than year since I left this blog orphan. However I have been noticing some activities on this blog and that make me think that it definitely deserve some attention from its owner. So here I am just got out from my cave. Bear with me if this post doesn't turn out that informative (It's been a year and I am rusty enough to write some crap :P).  

Coming to the main topic... This one's on techie track.Github, one of most popular place for opensource projects because of its powerful features for collaboration, code management, issue tracking, code management. Although you can have Github for your private projects, it is not free. Fortunately there is an alternative for Github for your private repos (Gitlab), which is even better than Github.See why GitLab is better than GitHub

You can setup Gitlab on your private server within few minutes and you will be all set to start committing your code.

Step 1: Go to this link to download Gitlab community edition package.
Step 2: Choose operating system of your choice from drop-down:

And yep as highlighted in the screenshot, it is also compatible with RPi. (Cool huh B-))

Step 3: After that follow the installation instructions and we are all set.

Will update this post with some more details if needed.
Meanwhile Comments are most welcome for doubts, suggestions etc. etc ..

Friday, July 25, 2014

Get a quick tour of MongoDB in few minutes

Before few day I wrote about Mongo University in article MongoUniversity: Learn MongoDB and get certified.

However learning MongoDB through Mongo University is time consuming process and needs patience. So if you want to have quick walkthrough of MongoDB then the right place is You will find online MongoShell here. In the shell type tutorial in order to start interactive tutorial of MongoDB.

Follow the steps of the interactive tutorial and within few minutes you will break the ice with MongoDB.

Popular Posts