Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hadoop streaming failed with error code 5 #227

Open
JohnnyxB opened this issue May 24, 2015 · 0 comments
Open

hadoop streaming failed with error code 5 #227

JohnnyxB opened this issue May 24, 2015 · 0 comments

Comments

@JohnnyxB
Copy link

I have created a multi-node hadoop cluster using my two laptops and have successfully tested it.
After that I have installed RHadoop upon the hadoop environment. All the necessary packages are installed and path variables are set.

Then, trying to run a wordcount example as follows:

map <- function(k,lines) {
   words.list <- strsplit(lines, "\\s")
   words <- unlist(words.list)
   return(keyval(words, 1))
}

reduce <- function(word, counts) {
 keyval(word, sum(counts))
}

wordcount <- function(input, output = NULL) {
   mapreduce(input = input, output = output, input.format = "text", map = map, reduce = reduce)
}

hdfs.root <- "wordcount"
hdfs.data <- file.path(hdfs.root, "data")
hdfs.out <- file.path(hdfs.root, "out")
out <- wordcount(hdfs.data, hdfs.out)

I get the following error:

15/05/24 21:09:20 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/05/24 21:09:20 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/05/24 21:09:20 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with     processName=JobTracker, sessionId= - already initialized
15/05/24 21:09:21 INFO mapreduce.JobSubmitter: Cleaning up the staging area     file:/app/hadoop/tmp/mapred/staging/master91618435/.staging/job_local91618435_0001
15/05/24 21:09:21 ERROR streaming.StreamJob: Error Launching job : No such file or directory
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 5
Called from: mapreduce(input = input, output = output, input.format = "text", 
    map = map, reduce = reduce)

Prior to running this I have created two hdfs folders wordcount/data and wordcount/out and uploaded some text to the first using comman line.

A further issue is: I have two users on my computer: hduser and master. The first is created for the hadoop installation. I suppose that when I open R/RStudio I run it as master, and because hadoop is created for hduser there are some permission issues which lead to this error. As one can read on the 4. line of the output the system tries to find master91618435, which, as I suspect, should be hduser....

My question is, how can I get rid of this error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant