jobFileProcessor.sh complains about missing arguments. #87

hcoyote · 2014-05-16T17:02:37Z

Running from origin/master.

I have things patched up enough to get the jobFilePreprocessor.sh and jobFileLoader.sh connecting to our Hadoop environment. The last step in hraven-etl.sh invokes jobFileProcessor.sh, but this throws errors about missing arguments.

I poked around in the code and it's not really clear what these should be. machinetype appears like it should be set to "default" if not explicitly set, but the arg processor makes this argument required. Additionally, I can't find a great deal of discussion on what's supposed to be in the cost properties.

ERROR: Missing required options: z, m

usage: JobFileProcessor  [-b <batch-size>] -c <cluster> [-d] -m
       <machinetype> [-p <processFileSubstring>] [-r] [-t <thread-count>]
       -z <costfile>
 -b,--batchSize <batch-size>                        The number of files to
                                                    process in one batch.
                                                    Default 100
 -c,--cluster <cluster>                             cluster for which jobs
                                                    are processed
 -d,--debug                                         switch on DEBUG log
                                                    level
 -m,--machineType <machinetype>                     The type of machine
                                                    this job ran on
  -p,--processFileSubstring <processFileSubstring>   use only those process
                                                     records where the
                                                     process file path
                                                     contains the provided
                                                     string. Useful when
                                                     processing production
                                                     jobs in parallel to
                                                     historic loads.
  -r,--reprocess                                     Reprocess only those
                                                     records that have been
                                                     marked to be
                                                     reprocessed. Otherwise
                                                     process all rows
                                                     indicated in the
                                                     processing records,
                                                     but successfully
                                                     processed job files
                                                     are skipped.
  -t,--threads <thread-count>                        Number of parallel
                                                     threads to use to run
                                                     Hadoop jobs
                                                     simultaniously.
                                                     Default = 1
  -z,--costFile <costfile>                           The cost properties
                                                     file on local disk

The text was updated successfully, but these errors were encountered:

vrushalic · 2014-05-16T17:08:01Z

Hi Travis,

Yes, I can see that the jobFilePreprocessor.sh was not updated. Give me a few mins to update it now.
I will add some more documentation to a sample cost file in the conf dir. The job cost will be stored as a column in hbase.

The cost properties file could even be empty file, since it simply won't calculate the cost then.

…ng a sample cost file

vrushalic · 2014-05-16T18:15:52Z

Updated the script and added a sample file. Please give this a try and let me know.

# By Vrushali Channapattan # Via Joep Rottinghuis (1) and Vrushali Channapattan (1) * 'master' of https://github.com/twitter/hraven: Issue twitter#87 Updating jobFileProcessor.sh with latest arguments and adding a sample cost file Updating class names to reflect their intention better, adding some more tests and cleaning up documentation Updating formatting Modifying to include AppService, App and AppKey classes, also making a single api call for new jobs given a cluster and making user as a query param Updating to move get new jobs to job history service Updating some more comments Updating java docs Updating to remove capacity info, ensure APIs don't mix service class calls Updating to add final modifiers, removing abstract in interfaces, changing to return Object instead of long updating to enable different schedulers via factory and interface Issue twitter#82 Allowing for different schedulers to be supported, presently adding for fair scheduler and Updating other things as per review comments. minor formatting changes Issue twitter#82: Add a newJobs REST API, Issue twitter#81: Correct the timestamp being stored in appVersion table, Issue twitter#80: Have queue/pool name returned at flow level Conflicts: bin/etl/jobFileProcessor.sh hraven-core/src/main/java/com/twitter/hraven/FlowKey.java hraven-core/src/main/java/com/twitter/hraven/datasource/JobHistoryService.java

hcoyote · 2014-05-21T21:50:55Z

Thanks, I'll see if I can get it working tomorrow.

vrushalic · 2014-05-28T22:39:55Z

Hi,
Did this work for you?

thanks
Vrushali

vrushalic pushed a commit that referenced this issue May 16, 2014

Issue #87 Updating jobFileProcessor.sh with latest arguments and addi…

d1b12f8

…ng a sample cost file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

jobFileProcessor.sh complains about missing arguments. #87

jobFileProcessor.sh complains about missing arguments. #87

hcoyote commented May 16, 2014

vrushalic commented May 16, 2014

vrushalic commented May 16, 2014

hcoyote commented May 21, 2014

vrushalic commented May 28, 2014

jobFileProcessor.sh complains about missing arguments. #87

jobFileProcessor.sh complains about missing arguments. #87

Comments

hcoyote commented May 16, 2014

vrushalic commented May 16, 2014

vrushalic commented May 16, 2014

hcoyote commented May 21, 2014

vrushalic commented May 28, 2014