You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Hadoop streaming with -io typedbytes and set mapred.reduce.tasks=2, but I finally got only one output file. And if I set mapred.reduce.tasks=0, then I got many output files. I am very confused.
SO my question is:
How to make mapred.reduce.tasks = num (num >1) config valid when I using -io typedbytes in streaming?
PS: my mapper's output is (key:string of python, value:array of numpy) .
And my .sh file:
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.2.1.jar
-D mapred.reduce.tasks=2
-fs local
-jt local
-io typedbytes
-inputformat org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat
-input FFT_SequenceFile
-output pinvoutput
-mapper 'pinvmap.py'
-file pinvmap.py
The text was updated successfully, but these errors were encountered:
I am using Hadoop streaming with -io typedbytes and set mapred.reduce.tasks=2, but I finally got only one output file. And if I set mapred.reduce.tasks=0, then I got many output files. I am very confused.
SO my question is:
How to make mapred.reduce.tasks = num (num >1) config valid when I using -io typedbytes in streaming?
PS: my mapper's output is (key:string of python, value:array of numpy) .
And my .sh file:
hadoop jar $HADOOP_HOME/contrib/streaming/hadoop-streaming-1.2.1.jar
-D mapred.reduce.tasks=2
-fs local
-jt local
-io typedbytes
-inputformat org.apache.hadoop.mapred.SequenceFileAsBinaryInputFormat
-input FFT_SequenceFile
-output pinvoutput
-mapper 'pinvmap.py'
-file pinvmap.py
The text was updated successfully, but these errors were encountered: