Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

appstatus file rename failed for spark rolling logs #318

Closed
shivprakashy opened this issue Dec 15, 2022 · 5 comments
Closed

appstatus file rename failed for spark rolling logs #318

shivprakashy opened this issue Dec 15, 2022 · 5 comments

Comments

@shivprakashy
Copy link
Contributor

shivprakashy commented Dec 15, 2022

I've encountered an exception when I enabled the spark monitoring configurations
spark.eventLog.rolling.enabled and spark.eventLog.rolling.maxFileSize for spark-submit

Stacktrace:

22/12/16 02:07:54 WARN COSAPIClient: file status spark-events/eventlog_v2_local-1671136623482/appstatus_local-1671136623482 returned 404
22/12/16 02:07:55 ERROR Utils: Uncaught exception in thread Thread-4
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
	at java.lang.String.substring(String.java:1931)
	at com.ibm.stocator.fs.cos.COSAPIClient.rename(COSAPIClient.java:1412)
	at com.ibm.stocator.fs.ObjectStoreFileSystem.rename(ObjectStoreFileSystem.java:264)
	at org.apache.spark.deploy.history.EventLogFileWriter.renameFile(EventLogFileWriters.scala:141)
	at org.apache.spark.deploy.history.RollingEventLogFilesWriter.stop(EventLogFileWriters.scala:359)
	at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:283)
	at org.apache.spark.SparkContext.$anonfun$stop$17(SparkContext.scala:2115)
	at org.apache.spark.SparkContext.$anonfun$stop$17$adapted(SparkContext.scala:2115)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.SparkContext.$anonfun$stop$16(SparkContext.scala:2115)
	at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1484)
	at org.apache.spark.SparkContext.stop(SparkContext.scala:2115)
	at org.apache.spark.api.java.JavaSparkContext.stop(JavaSparkContext.scala:550)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
	at java.lang.Thread.run(Thread.java:748)

My observations are as follows:
The spark-events log files and directories are being created successfully and the logs are also flushed as expected but when the spark job finishes the spark tries to rename file eg. appstatus_app-1670931151193.inprogress to appstatus_app-1670931151193 and it fails.

The following items may be addressed:

  1. The string out of range should be fixed so that the rename operation can continue on execution.
    Solution: appstatus file rename failed for spark rolling logs #318 (comment)

  2. There should be an appstatus named file without .inprogress in the same spark-events/eventlog_v2_local-1671136623482/ directory.
    Solution: appstatus file rename failed for spark rolling logs #318 (comment)

  3. The stocator library fails to read the rolling logs properly, thence Spark History Server don't show the applications as completed or incomplete. A workaround is using s3a configuration for Spark History Server.
    Solution: separate issue created for this.

The unit test cases are passing as earlier #318 (comment)

@shivprakashy
Copy link
Contributor Author

Attaching some screenshots for reference.

spark-submit --conf spark.eventLog.enabled=true --conf spark.eventLog.rolling.enabled=true --conf spark.eventLog.rolling.maxFileSize=10m spark-app.py`

created following directory and files in COS

Screenshot 2022-12-23 at 14 24 22

Screenshot 2022-12-23 at 14 24 36

Spark History console shows the following (the application in incomplete application)
Screenshot 2022-12-23 at 14 25 12

Note that this happened because COS couldn't rename the appstatus named file.

@shivprakashy
Copy link
Contributor Author

with the changes (in the linked PR) we have expected log files for Spark History Server (appestatus file is rename not breaking)

i.e appestats file is renamed

Screenshot 2022-12-23 at 17 21 15

@shivprakashy
Copy link
Contributor Author

Solution description:
The rename of appstatus file fails because the newSrcKey.substring() reach out of bound for pathToKey(src).length() + 1 for the below statement
String filename = newSrcKey.substring(pathToKey(src).length() + 1);

DEBUGGING: Below are the values assigned to variables while executing the above line, the length of string from pathToKey(src) and newSrcKey are equal so expectation of additional (+1) string is invalid

src ->cos://dev-shiv.devshivcos/spark-events/eventlog_v2_local-1671139972947/appstatus_local-1671139972947.inprogress
newSrcKey ->spark-events/eventlog_v2_local-1671139972947/appstatus_local-1671139972947.inprogress
pathToKey(src) ->spark-events/eventlog_v2_local-1671139972947/appstatus_local-1671139972947.inprogress

The variable filename is assigned with empty string ("") and that's okay.

@shivprakashy
Copy link
Contributor Author

Solution description:
After out of range exception handling the execution completed but it didn't yield the expected appstatus file in the spark-events/eventlog_v2_local-1671136623482/ directory and the problem was extra "/" in the final filename (newDstKey)

The extra "/" lead to create appstatus_local-1671139972947 named directory and copied the same appstatus_local-1671139972947.inprogress file inside it.

DEBUGGING: Below lines of code removes the trailing "/" from newDstKey if pathToKey(src) endsWith ".inprogress" (assuming .inprogress files will be created by Spark History Server rolling logs)

if (pathToKey(src).endsWith(".inprogress")) {
  newDstKey = newDstKey.substring(0, newDstKey.lastIndexOf("/"));
}

Before above lines
newDstKey spark-events/eventlog_v2_local-1671791765199/appstatus_local-1671791765199/

After above lines
newDstKey spark-events/eventlog_v2_local-1671791765199/appstatus_local-1671791765199

@shivprakashy
Copy link
Contributor Author

Screenshot 2022-12-23 at 23 12 40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant