Hadoop2.0 #18

vrushalic · 2013-08-05T23:47:02Z

Enabling parsing of 2.0 job history files without referring to 2.0 packages/jars

…referring to 2.0 packages/jars

vrushalic · 2013-08-05T23:48:13Z

I have not yet added a mapping between old keys and new keys between 1 .0 and 2.0. For instance, in 1.0 there was a "TOTAL_MAPS" and in 2.0, it now is "totalMaps". Will add it if needed after discussion.

sjlee · 2013-08-06T00:06:52Z

hraven-etl/src/main/java/com/twitter/hraven/etl/JobHistoryFileParserHadoop2.java

+public class JobHistoryFileParserHadoop2 implements JobHistoryFileParser {
+
+	private JobKey jobKey;
+	@SuppressWarnings("unused")


This warning is actually telling us something. JobId is being set but is never read, which means it is really not being used. Should we remove this field? The actual information is captured in jobNumber.

yes, it was being used earlier in a key generation but not anymore. am removing it.

vrushalic · 2013-08-06T22:50:56Z

hraven-etl/src/main/java/com/twitter/hraven/etl/JobHistoryFileParserHadoop2.java

+    	/**
+    	 * populate the hash set for counter names
+    	 */
+    	counterNames.add(CounterTypes.mapCounters.toString());


Will iterate over the enum.values instead of adding individual ones

…to be compatible, also formatting changes

ghelmling · 2013-08-08T22:56:43Z

hraven-core/src/main/java/com/twitter/hraven/Constants.java

+  /** Column qualifier prefix to namespace total-specific counter data */
+  public static final String TOTAL_COUNTER_COLUMN_PREFIX = "gt";
+  public static final byte[] TOTAL_COUNTER_COLUMN_PREFIX_BYTES = Bytes
+      .toBytes(TOTAL_COUNTER_COLUMN_PREFIX);


Looking at a sample file, I think "TOTAL_COUNTERS" is the same as what we're treating as just "COUNTERS" in Hadoop 1.0. So I don't think we need this extra prefix. Correct me if I'm wrong.

There are task level counters which are called "COUNTERS" and the job level ones which are called "TOTAL_COUNTERS", "MAP_COUNTERS" AND "REDUCE_COUNTERS"

… other review changes

jrottinghuis · 2013-08-14T02:08:08Z

hraven-etl/src/main/java/com/twitter/hraven/etl/JobHistoryFileParserHadoop2.java

+    JSONArray fields = j1.getJSONArray(FIELDS);
+
+    for (int k = 0; k < fields.length(); k++) {
+      JSONObject allEvents = new JSONObject(fields.get(k).toString());


Could fields.get(k) be null ? If so, defensive coding would use fields.isNull(k) first to check.

Sounds good, I will add the null check. It is probably very unlikely that there are no fields in the job history file since that means an empty job history file was created which I don't think will happen unless there is a bug in the history file writing. But agree adding the null check in any case.

jrottinghuis · 2013-08-14T02:12:26Z

hraven-etl/src/test/java/com/twitter/hraven/etl/TestJobHistoryFileParserHadoop2.java

@@ -0,0 +1,96 @@
+/*
+ * Copyright 2012 Twitter, Inc. Licensed under the Apache License, Version 2.0 (the "License"); you


Minor nit: for new files you probably want to use the current year.

… and year in copy right, also adding in some more keys I saw in 2.0.5 history files

…hadoop 2.0 cluster to process 2.0 job history files

ghelmling · 2013-08-21T01:08:23Z

hraven-etl/src/main/java/com/twitter/hraven/etl/JobHistoryFileParserFactory.java

+      // the first 10 bytes contain Avro-Json
+      return new String(contents, 0, HADOOP2_VERSION_LENGTH);
+    }
+    throw new IllegalArgumentException(" Unknown format of job history file: " + contents);


This seems a little unexpected if it's possible we're reading a v1 file. Maybe we should return null in this case instead?

Hi Gary, (maybe I didn't follow completely) The sub-string for a 1.0 history file is "Meta VERS" since the first line in 1.0 is 'Meta VERSION="1" .;'. Hence this won't throw the exception. I have a unit test at TestJobHistoryFileParserFactory.java that confirms that we get back a JobHistoryFileParserHadoop1 object when it's a 1.0 file.
Is that what you were referring to?

I see. Still seems a little weird that in the 2.0 case it returns the version string, and in the 1.0 case it returns a truncated string (doesn't seem quite consistent with the method name). I would either name it getVersion2StringFromFile or change it to return a boolean value and do the string comparison in this method -- hasVersion2String(). But I guess it works as is as well.

Yes, so I have changed it now. Made the function check for both versions' strings and return accordingly

ghelmling · 2013-08-22T22:41:43Z

Two minor comments to resolve: mixed static and instance context, and a wording fix in an exception message. Otherwise this looks good to merge to me.

vrushalic · 2013-08-26T17:32:24Z

Thanks Gary! I will fix these and update the request today.

ghelmling · 2013-08-27T00:13:42Z

hraven-etl/src/main/java/com/twitter/hraven/etl/JobHistoryFileParserHadoop2.java

+  /**
+   * utitlity function for printing all Task puts
+   */
+  public void printAllTaskPuts() {


No need to duplication this functionality betwee printAllJobPuts() and printAllTaskPuts(). You should just have a single method: printPuts(List puts). Then pass either jobPuts or taskPuts as an argument.

…other review changes

Integrate Hadoop2.0 job history support.

Vrushali Channapattan added 2 commits July 29, 2013 16:34

[Issue twitter#5] make hRaven hadoop 2.0 compatible

7e521af

[Issue twitter#15] Enabling parsing of 2.0 job history files without …

ccb7728

…referring to 2.0 packages/jars

sjlee reviewed Aug 6, 2013
View reviewed changes

[Issue twitter#15] fixing as per review suggestions

e698ad6

ghost assigned vrushalic Aug 6, 2013

[Issue twitter#15] updating some more formatting and javadoc comments

fbc2660

vrushalic reviewed Aug 6, 2013
View reviewed changes

Vrushali Channapattan added 2 commits August 7, 2013 13:54

[Issue twitter#5] Adding a mapping for hadoop 2.0 keys to hadoop 1.0 …

2e379f5

…to be compatible, also formatting changes

[Issue twitter#5] updating regex for job history file names in 2.0

37261f0

ghelmling reviewed Aug 8, 2013
View reviewed changes

Vrushali Channapattan added 3 commits August 13, 2013 14:23

[Issue twitter#5] Creating enum class for Hadoop 2.0 record types and…

4192cc4

… other review changes

resetting the formatting changes

f00b926

[Issue twitter#5] adding some more comments and correcting formatting

7327ff7

jrottinghuis reviewed Aug 14, 2013
View reviewed changes

Vrushali Channapattan added 2 commits August 14, 2013 15:11

[Issue twitter#5] modifying as per review suggestions for null checks…

0cd6334

… and year in copy right, also adding in some more keys I saw in 2.0.5 history files

[Issue twitter#5] Updating to work as a hadoop 1.0 compiled jar on a …

2d7af69

…hadoop 2.0 cluster to process 2.0 job history files

ghelmling reviewed Aug 21, 2013
View reviewed changes

Vrushali Channapattan added 2 commits August 26, 2013 13:50

[Issue twitter#5] updating as per review suggestions

8dbe3d8

fixing comments in unit tests

d1b84a0

ghelmling reviewed Aug 27, 2013
View reviewed changes

[Issue 35] fixing the task keys so that they are the same as 1.0 and …

fc5a1dc

…other review changes

ghelmling added a commit that referenced this pull request Aug 27, 2013

Merge pull request #18 from vrushalic/hadoop2.0

448f9ee

Integrate Hadoop2.0 job history support.

ghelmling merged commit 448f9ee into twitter:master Aug 27, 2013

vrushalic mentioned this pull request Aug 27, 2013

make hRaven hadoop 2.0 compatible #5

Closed

vrushalic mentioned this pull request Sep 30, 2013

Ensure hadoop 1.0 job history files can be processed on 2.0 cluster #23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hadoop2.0 #18

Hadoop2.0 #18

vrushalic commented Aug 5, 2013

vrushalic commented Aug 5, 2013

sjlee Aug 6, 2013

vrushalic Aug 6, 2013

vrushalic Aug 6, 2013

ghelmling Aug 8, 2013

vrushalic Aug 8, 2013

jrottinghuis Aug 14, 2013

vrushalic Aug 14, 2013

jrottinghuis Aug 14, 2013

ghelmling Aug 21, 2013

vrushalic Aug 21, 2013

ghelmling Aug 22, 2013

vrushalic Aug 26, 2013

ghelmling commented Aug 22, 2013

vrushalic commented Aug 26, 2013

ghelmling Aug 27, 2013

		@@ -0,0 +1,96 @@
		/*
		* Copyright 2012 Twitter, Inc. Licensed under the Apache License, Version 2.0 (the "License"); you

Hadoop2.0 #18

Hadoop2.0 #18

Conversation

vrushalic commented Aug 5, 2013

vrushalic commented Aug 5, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghelmling commented Aug 22, 2013

vrushalic commented Aug 26, 2013

Choose a reason for hiding this comment