refactor tachyon-env.sh #206

rootfs · 2014-05-29T18:42:55Z

tachyon-env.sh serves dual purposes: set java executable path and java commandline options. While we still need its first function, the static java options needs to get out of there.

We really need a key/value kind of configuration either json pairs or something like Hadoop's core-site.xml, that contains runtime configuration. This helps in the following area:

construct Configuration object from well-structured key/value. We thus don't have to be conditionally parse system properties in constructing underFilesystem
change configuration dynamically, something the static java commandline options cannot do.

timothysc · 2014-05-29T19:40:56Z

+1 tachyon.xml

It would cleanup the long java-opts.

haoyuan · 2014-05-29T21:50:52Z

+1

@rootfs Do you mind creating a JIRA for this? Thanks.
https://spark-project.atlassian.net/browse/TACHYON/

timothysc · 2014-06-10T19:56:41Z

https://spark-project.atlassian.net/browse/TACHYON-93, but it requires updated credz to edit.

AtlasPilotPuppy · 2014-06-25T21:18:13Z

+1 tachyon.json
Json is much more readable in my opinion.

childsb · 2014-07-11T19:08:57Z

+1 for XML

-1 for JSON which is difficult to deal with in Java, easy to malform, and very difficult to read (IMO). One wrong comma or colon and your document is unparsable, then good luck finding the offending character.

hsaputra · 2014-07-11T19:20:14Z

Just want to clarify that you would like to move the Java options from the tachyon-env.sh file?

The options are used directly by .sh files so it will be harder to read it from XML in my opinion

rootfs · 2014-07-11T19:26:02Z

@hsaputra we are working on relocating system properties into configuration files. Obviously some java options (classpath, e.g.) have to stay as system properties, but most tachyon.* can be safely relocated.

See this pull request for info
#213

hsaputra · 2014-07-11T19:48:31Z

@rootfs thanks for clarifying

haoyuan · 2014-07-12T21:03:05Z

How do you guys think about simple key/value pairs to configure the system? Like how Spark works: http://spark.apache.org/docs/latest/configuration.html . It is really simple.

The downside for XML is too length. e.g. configure one parameter requires at least 4 lines and image you may have 100 parameters in the future.

aarondav · 2014-07-12T23:03:47Z

An alternative, JSON-ic solution is Typeafe's Config, which actually uses a "human-readable" variant of JSON called HOCON. This is strictly more general than simple key-value pairs, as it also supports that structure, and the Config library is written entirely in and for Java.

On the other hand, Spark opted not to use it due to its increased complexity (over simple Properties files) and the fact that Spark's config names are not properly hierarchical, and both of these are valid points for Tachyon as well.

hsaputra · 2014-07-13T05:36:04Z

More and more projects are using YAML[1] for configuration file. It is like JSON but easier to read.

[1] http://www.yaml.org/spec/1.2/spec.html

childsb · 2014-07-13T17:02:41Z

@haoyuan KVP is great for smaller or standalone project. The technical problem here is a desire to participate and provide a plugin model for tachyon. The things you want to consider for a plugin architecture are:

plugins are anonymous. the host application is unaware of plugins and communicates via an established interface.
plugins define and require their own properties without limitation. plugins can change independent of host app and still work with host.
details of the plugin should remain unknown to the host application, but still configurable by the end user.
Configuration is standard across plugins (not necessarily same config file, but don't want to make end users edit KVP, JSON, YAML, XML depending on the plugins since there may be lots in a system).
configuration is as simple for end user as possible.
plugin dependency + version support. This seems overkill, but as tachyon evolves dependencies will arise, (plugin x works with spark version y but not y+1)

Most projects that offer plugin support do so via XML configuration. The reason is that XML's flexibility allows simple configuration without the host application knowing anything about the plugins. KVP doesn't scale well in the plugin paradigm as there's no hierarchy and any implied hierarchy convention imposes structure on plugin properties. This is solvable if designing plugin system ground up, but if working with existing plugins it can be big issue.

Even if host+plugins are authored from ground up KVP is limiting in the amount of config required for anonymous plugins to talk to a master application. While its true that 4 lines of config is required for simple name=value properties in XML, the number of lines required to work around hierarchy with KVP exceeds that and is extremely difficult to read/use. If you attempt to define a KVP file that supports N plugins with Y configuration properties each and a config loader that knows nothing about the plugins you run into the problem quickly.

Another issue poorly handled by KVP is property collision. You can avoid this with a namespace in the property name, but this becomes awkward and can break goal #1. properties get pretty long with a namespace too. every sub-property for a plugin will need a namespace prefix, and with multiple versions of a plugin possibly defined the prefix could include version strings etc..

If tachyon wasn't going to allow under-filesystem plugins, KVP would be fine. @hsaputra If tachyon is defining its own plugin structure YAML, JSON or another hierarchical standard would work well, but if we wish to work with existing apache.hadoop filesystem plugins (which gets a whole lot of under-filesystem support for free), we'd end up translating from {yaml,JSON,KVP,WSDL}-->XML, and the translation is error prone. Even if tachyon does define its own plugin structure i would probably lobby for XML just because wider adoption so lower barrier to entry for plugin dev.

Im not proposing we use something as heavy as Eclipse XML/plugin structure, but using simple XML tachyon will work with existing plugins, can make use of free/open APIs with no new build reqs, and have infinite room for expansion as the project grows. And there's lots of tooling to simplify XML config for end user. In reality the end user shouldn't see too much of this config anyways.

Source: eclipse web tools project contributor and this is a hot topic there. also worked on three large ground-up plugin applications one using JSON, one using WSDL. the JSON project started with KVP until hitting these issues then converted to JSON + XML. XML for backend, JSON for web front.

rootfs · 2014-07-14T11:54:55Z

I see two sub issues here. One is to integrate with other ecosystem such as Hadoop HDFS, and the other is to configure Tachyon itself. The latter seems to gather most attention.

Since Tachyon uses Hadoop HDFS or compatible as underlying storage, it is a great convenience to be able to load Hadoop's XML configuration directly. As @childsb pointed out, this will accelerate Tachyon's adoption. This can be done independent of configuring Tachyon's run time option.

haoyuan · 2014-08-06T21:24:27Z

Tracking in JIRA: https://tachyon.atlassian.net/browse/TACHYON-8?jql=project%20%3D%20TACHYON

haoyuan closed this as completed Aug 6, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor tachyon-env.sh #206

refactor tachyon-env.sh #206

rootfs commented May 29, 2014

timothysc commented May 29, 2014

haoyuan commented May 29, 2014

timothysc commented Jun 10, 2014

AtlasPilotPuppy commented Jun 25, 2014

childsb commented Jul 11, 2014

hsaputra commented Jul 11, 2014

rootfs commented Jul 11, 2014

hsaputra commented Jul 11, 2014

haoyuan commented Jul 12, 2014

aarondav commented Jul 12, 2014

hsaputra commented Jul 13, 2014

childsb commented Jul 13, 2014

rootfs commented Jul 14, 2014

haoyuan commented Aug 6, 2014

refactor tachyon-env.sh #206

refactor tachyon-env.sh #206

Comments

rootfs commented May 29, 2014

timothysc commented May 29, 2014

haoyuan commented May 29, 2014

timothysc commented Jun 10, 2014

AtlasPilotPuppy commented Jun 25, 2014

childsb commented Jul 11, 2014

hsaputra commented Jul 11, 2014

rootfs commented Jul 11, 2014

hsaputra commented Jul 11, 2014

haoyuan commented Jul 12, 2014

aarondav commented Jul 12, 2014

hsaputra commented Jul 13, 2014

childsb commented Jul 13, 2014

rootfs commented Jul 14, 2014

haoyuan commented Aug 6, 2014