FsShell
is ready, and-ls /
will automatically executed;SimpleWordCount
job is ready;
Single Node Cluster
is needed, so start up the cluster bystart-dfs.sh
andstart-yarn.sh
.
HDFS daemon is listening on
127.0.0.1:9000
by default, so extra address, which is in same network, should be opened for remote requests if this job is tried on remote host. Otherwise,SimpleWordCount#getJob()
will return an emptyJob
, which would absolutely an exception.
Following configuration is also required:
spring:
hadoop:
config:
fs.defaultFS: hdfs://${host.hdfs:localhost}:9000
fs-uri: hdfs://${host.hdfs:localhost}:9000
resource-manager-host: ${host.yarn:localhost}
resource-manager-scheduler-host: ${host.yarn:localhost}
inputPath
andoutputPath
ofJob
should definitely aHDFS
URI. Relative path or absolute path could not work properly. The correct way is specify qualified paths, such as:
java -jar spring-hadoop-example-1.0.0-SNAPSHOT.jar --swc.input=hdfs://localhost:9000/user/soiff/input --swc.output=hdfs://localhost:9000/user/soiff/output
Attention:
outputPath
should not be exist, otherwise job will abort.