Hadoop Tutorials.CO.IN
Big Data - Hadoop - Hadoop Ecosystem - NoSQL - Spark

Installing Apache Spark on Ubuntu

by Tanmay Deshpande

Getting Started with Apache Spark

In an earlier article we saw how to install Hadoop on Ubuntu, now it's time to learn how to install Apache Spark on Ubuntu.

The pre-requisite to this tutorial is to have Hadoop installed on your Ubuntu if you haven't done yet, you can refer to this article

Download the latest version of Apache Spark from this link You can select the pre-build Spark compatible with the Hadoop version you have.

Next, you can extract the tar-gun zip file using following command

$tar -xzf spark-1.1.0-bin-hadoop1.tgz

This would extract the file into a folder spark-1.1.0-bin-hadoop1/. You may rename the folder to spark for simplicity

$sudo mv spark-1.1.0-bin-hadoop1 spark

Now it's time to set the environment variables in ~./bashrc file so that on every login, we would have easy execution to spark.

Here we need to set SPARK_HOME=/usr/local/spark Follow below given steps to append spark related environment variables in .bashrc file.

$vi ~./bashrc

Bashrc Edit

Save the .bashrc file, exit from the terminal and re-open to see the changes being effective.

That's all your Apache Spark is ready to use. Not let's quickly upload one file to HDFS and try running a WordCount program using Spark Java API.

To upload a file on HDFS (Make sure your Hadoop cluster is up and running)

$hadoop fs -mkdir /in
$hadoop fs -copyFromLocal LICENSE.txt /in

To run a WordCount program, execute following command

$cd /usr/local/spark/bin
$./run-example JavaWordCount /in

This would start the Spark Master and Worker daemons. And you would see the program would print the words and their count on screen.

Wordcount Output

You can find the source code this program over here



Follow us on Twitter

Recommended for you