Validating your virtual machine setup

After the vagrant up command has completed, you'll have a CentOS virtual machine with the following installed:

Hadoop HDFS
Hadoop YARN
Hive
Spark

Let's take a look at each one and validate that it's installed and setup as expected.

SSH into your virtual machine.

vagrant ssh

Run an example MapReduce job.

# create a directory for the input file
hdfs dfs -mkdir wordcount-input

# generate sample input and write it to HDFS
echo "hello dear world hello" | hdfs dfs -put - wordcount-input/hello.txt

# run the MapReduce word count example
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop*example*.jar \
  wordcount wordcount-input wordcount-output

# validate the output of the job - you should see the following in the output:
#     dear   1
#     hello  2
#     world  1
hdfs dfs -cat wordcount-output/part*

Launch the Hive shell.

$ hive

Create a table and run a query over it.

CREATE EXTERNAL TABLE wordcount (
    word STRING,
    count INT
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/vagrant/wordcount-output';

select * from wordcount order by count;

Next launch the interactive Spark shell.

spark-shell --master yarn-client

Run word count in Spark.

// enter paste mode
:paste
sc.textFile("hdfs:///user/vagrant/wordcount-input/hello.txt")
   .flatMap(line => line.split(" "))
   .map(word => (word, 1))
   .reduceByKey(_ + _)
   .collect.foreach(println)

<ctrl-D>

sc.stop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VALIDATING.md

VALIDATING.md

Validating your virtual machine setup

Files

VALIDATING.md

Latest commit

History

VALIDATING.md

File metadata and controls

Validating your virtual machine setup