In an earlier article we saw how to install Hadoop on Ubuntu, now it's time to learn how to install Apache Spark on Ubuntu.
The pre-requisite to this tutorial is to have Hadoop installed on your Ubuntu if you haven't done yet, you can refer to this article
Download the latest version of Apache Spark from this link You can select the pre-build Spark compatible with the Hadoop version you have.
Next, you can extract the tar-gun zip file using following command
$tar -xzf spark-1.1.0-bin-hadoop1.tgz
This would extract the file into a folder spark-1.1.0-bin-hadoop1/. You may rename the folder to spark for simplicity
$sudo mv spark-1.1.0-bin-hadoop1 spark
Now it's time to set the environment variables in ~./bashrc file so that on every login, we would have easy execution to spark.
Here we need to set
Follow below given steps to append spark related environment
Save the .bashrc file, exit from the terminal and re-open to see the changes being effective.
That's all your Apache Spark is ready to use. Not let's quickly upload one file to HDFS and try running a WordCount program using Spark Java API.
To upload a file on HDFS (Make sure your Hadoop cluster is up and running)
$hadoop fs -mkdir /in $hadoop fs -copyFromLocal LICENSE.txt /in
To run a WordCount program, execute following command
$cd /usr/local/spark/bin $./run-example JavaWordCount /in
This would start the Spark Master and Worker daemons. And you would see the program would print the words and their count on screen.
You can find the source code this program over here