Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor tachyon-env.sh #206

Closed
rootfs opened this issue May 29, 2014 · 14 comments
Closed

refactor tachyon-env.sh #206

rootfs opened this issue May 29, 2014 · 14 comments

Comments

@rootfs
Copy link
Contributor

rootfs commented May 29, 2014

tachyon-env.sh serves dual purposes: set java executable path and java commandline options. While we still need its first function, the static java options needs to get out of there.

We really need a key/value kind of configuration either json pairs or something like Hadoop's core-site.xml, that contains runtime configuration. This helps in the following area:

  1. construct Configuration object from well-structured key/value. We thus don't have to be conditionally parse system properties in constructing underFilesystem

  2. change configuration dynamically, something the static java commandline options cannot do.

@timothysc
Copy link

+1 tachyon.xml

It would cleanup the long java-opts.

@haoyuan
Copy link
Contributor

haoyuan commented May 29, 2014

+1

@rootfs Do you mind creating a JIRA for this? Thanks.
https://spark-project.atlassian.net/browse/TACHYON/

@timothysc
Copy link

https://spark-project.atlassian.net/browse/TACHYON-93, but it requires updated credz to edit.

@AtlasPilotPuppy
Copy link

+1 tachyon.json
Json is much more readable in my opinion.

@childsb
Copy link
Contributor

childsb commented Jul 11, 2014

+1 for XML

-1 for JSON which is difficult to deal with in Java, easy to malform, and very difficult to read (IMO). One wrong comma or colon and your document is unparsable, then good luck finding the offending character.

@hsaputra
Copy link
Contributor

Just want to clarify that you would like to move the Java options from the tachyon-env.sh file?

The options are used directly by .sh files so it will be harder to read it from XML in my opinion

@rootfs
Copy link
Contributor Author

rootfs commented Jul 11, 2014

@hsaputra we are working on relocating system properties into configuration files. Obviously some java options (classpath, e.g.) have to stay as system properties, but most tachyon.* can be safely relocated.

See this pull request for info
#213

@hsaputra
Copy link
Contributor

@rootfs thanks for clarifying

@haoyuan
Copy link
Contributor

haoyuan commented Jul 12, 2014

How do you guys think about simple key/value pairs to configure the system? Like how Spark works: http://spark.apache.org/docs/latest/configuration.html . It is really simple.

The downside for XML is too length. e.g. configure one parameter requires at least 4 lines and image you may have 100 parameters in the future.

@aarondav
Copy link
Contributor

An alternative, JSON-ic solution is Typeafe's Config, which actually uses a "human-readable" variant of JSON called HOCON. This is strictly more general than simple key-value pairs, as it also supports that structure, and the Config library is written entirely in and for Java.

On the other hand, Spark opted not to use it due to its increased complexity (over simple Properties files) and the fact that Spark's config names are not properly hierarchical, and both of these are valid points for Tachyon as well.

@hsaputra
Copy link
Contributor

More and more projects are using YAML[1] for configuration file. It is like JSON but easier to read.

[1] http://www.yaml.org/spec/1.2/spec.html

@childsb
Copy link
Contributor

childsb commented Jul 13, 2014

@haoyuan KVP is great for smaller or standalone project. The technical problem here is a desire to participate and provide a plugin model for tachyon. The things you want to consider for a plugin architecture are:

  1. plugins are anonymous. the host application is unaware of plugins and communicates via an established interface.

  2. plugins define and require their own properties without limitation. plugins can change independent of host app and still work with host.

  3. details of the plugin should remain unknown to the host application, but still configurable by the end user.

  4. Configuration is standard across plugins (not necessarily same config file, but don't want to make end users edit KVP, JSON, YAML, XML depending on the plugins since there may be lots in a system).

  5. configuration is as simple for end user as possible.

  6. plugin dependency + version support. This seems overkill, but as tachyon evolves dependencies will arise, (plugin x works with spark version y but not y+1)

Most projects that offer plugin support do so via XML configuration. The reason is that XML's flexibility allows simple configuration without the host application knowing anything about the plugins. KVP doesn't scale well in the plugin paradigm as there's no hierarchy and any implied hierarchy convention imposes structure on plugin properties. This is solvable if designing plugin system ground up, but if working with existing plugins it can be big issue.

Even if host+plugins are authored from ground up KVP is limiting in the amount of config required for anonymous plugins to talk to a master application. While its true that 4 lines of config is required for simple name=value properties in XML, the number of lines required to work around hierarchy with KVP exceeds that and is extremely difficult to read/use. If you attempt to define a KVP file that supports N plugins with Y configuration properties each and a config loader that knows nothing about the plugins you run into the problem quickly.

Another issue poorly handled by KVP is property collision. You can avoid this with a namespace in the property name, but this becomes awkward and can break goal #1. properties get pretty long with a namespace too. every sub-property for a plugin will need a namespace prefix, and with multiple versions of a plugin possibly defined the prefix could include version strings etc..

If tachyon wasn't going to allow under-filesystem plugins, KVP would be fine. @hsaputra If tachyon is defining its own plugin structure YAML, JSON or another hierarchical standard would work well, but if we wish to work with existing apache.hadoop filesystem plugins (which gets a whole lot of under-filesystem support for free), we'd end up translating from {yaml,JSON,KVP,WSDL}-->XML, and the translation is error prone. Even if tachyon does define its own plugin structure i would probably lobby for XML just because wider adoption so lower barrier to entry for plugin dev.

Im not proposing we use something as heavy as Eclipse XML/plugin structure, but using simple XML tachyon will work with existing plugins, can make use of free/open APIs with no new build reqs, and have infinite room for expansion as the project grows. And there's lots of tooling to simplify XML config for end user. In reality the end user shouldn't see too much of this config anyways.

Source: eclipse web tools project contributor and this is a hot topic there. also worked on three large ground-up plugin applications one using JSON, one using WSDL. the JSON project started with KVP until hitting these issues then converted to JSON + XML. XML for backend, JSON for web front.

@rootfs
Copy link
Contributor Author

rootfs commented Jul 14, 2014

I see two sub issues here. One is to integrate with other ecosystem such as Hadoop HDFS, and the other is to configure Tachyon itself. The latter seems to gather most attention.

Since Tachyon uses Hadoop HDFS or compatible as underlying storage, it is a great convenience to be able to load Hadoop's XML configuration directly. As @childsb pointed out, this will accelerate Tachyon's adoption. This can be done independent of configuring Tachyon's run time option.

@haoyuan
Copy link
Contributor

haoyuan commented Aug 6, 2014

@haoyuan haoyuan closed this as completed Aug 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants