-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor tachyon-env.sh #206
Comments
+1 tachyon.xml It would cleanup the long java-opts. |
+1 @rootfs Do you mind creating a JIRA for this? Thanks. |
https://spark-project.atlassian.net/browse/TACHYON-93, but it requires updated credz to edit. |
+1 tachyon.json |
+1 for XML -1 for JSON which is difficult to deal with in Java, easy to malform, and very difficult to read (IMO). One wrong comma or colon and your document is unparsable, then good luck finding the offending character. |
Just want to clarify that you would like to move the Java options from the tachyon-env.sh file? The options are used directly by .sh files so it will be harder to read it from XML in my opinion |
@rootfs thanks for clarifying |
How do you guys think about simple key/value pairs to configure the system? Like how Spark works: http://spark.apache.org/docs/latest/configuration.html . It is really simple. The downside for XML is too length. e.g. configure one parameter requires at least 4 lines and image you may have 100 parameters in the future. |
An alternative, JSON-ic solution is Typeafe's Config, which actually uses a "human-readable" variant of JSON called HOCON. This is strictly more general than simple key-value pairs, as it also supports that structure, and the Config library is written entirely in and for Java. On the other hand, Spark opted not to use it due to its increased complexity (over simple Properties files) and the fact that Spark's config names are not properly hierarchical, and both of these are valid points for Tachyon as well. |
More and more projects are using YAML[1] for configuration file. It is like JSON but easier to read. |
@haoyuan KVP is great for smaller or standalone project. The technical problem here is a desire to participate and provide a plugin model for tachyon. The things you want to consider for a plugin architecture are:
Most projects that offer plugin support do so via XML configuration. The reason is that XML's flexibility allows simple configuration without the host application knowing anything about the plugins. KVP doesn't scale well in the plugin paradigm as there's no hierarchy and any implied hierarchy convention imposes structure on plugin properties. This is solvable if designing plugin system ground up, but if working with existing plugins it can be big issue. Even if host+plugins are authored from ground up KVP is limiting in the amount of config required for anonymous plugins to talk to a master application. While its true that 4 lines of config is required for simple name=value properties in XML, the number of lines required to work around hierarchy with KVP exceeds that and is extremely difficult to read/use. If you attempt to define a KVP file that supports N plugins with Y configuration properties each and a config loader that knows nothing about the plugins you run into the problem quickly. Another issue poorly handled by KVP is property collision. You can avoid this with a namespace in the property name, but this becomes awkward and can break goal #1. properties get pretty long with a namespace too. every sub-property for a plugin will need a namespace prefix, and with multiple versions of a plugin possibly defined the prefix could include version strings etc.. If tachyon wasn't going to allow under-filesystem plugins, KVP would be fine. @hsaputra If tachyon is defining its own plugin structure YAML, JSON or another hierarchical standard would work well, but if we wish to work with existing apache.hadoop filesystem plugins (which gets a whole lot of under-filesystem support for free), we'd end up translating from {yaml,JSON,KVP,WSDL}-->XML, and the translation is error prone. Even if tachyon does define its own plugin structure i would probably lobby for XML just because wider adoption so lower barrier to entry for plugin dev. Im not proposing we use something as heavy as Eclipse XML/plugin structure, but using simple XML tachyon will work with existing plugins, can make use of free/open APIs with no new build reqs, and have infinite room for expansion as the project grows. And there's lots of tooling to simplify XML config for end user. In reality the end user shouldn't see too much of this config anyways. Source: eclipse web tools project contributor and this is a hot topic there. also worked on three large ground-up plugin applications one using JSON, one using WSDL. the JSON project started with KVP until hitting these issues then converted to JSON + XML. XML for backend, JSON for web front. |
I see two sub issues here. One is to integrate with other ecosystem such as Hadoop HDFS, and the other is to configure Tachyon itself. The latter seems to gather most attention. Since Tachyon uses Hadoop HDFS or compatible as underlying storage, it is a great convenience to be able to load Hadoop's XML configuration directly. As @childsb pointed out, this will accelerate Tachyon's adoption. This can be done independent of configuring Tachyon's run time option. |
tachyon-env.sh serves dual purposes: set java executable path and java commandline options. While we still need its first function, the static java options needs to get out of there.
We really need a key/value kind of configuration either json pairs or something like Hadoop's core-site.xml, that contains runtime configuration. This helps in the following area:
construct Configuration object from well-structured key/value. We thus don't have to be conditionally parse system properties in constructing underFilesystem
change configuration dynamically, something the static java commandline options cannot do.
The text was updated successfully, but these errors were encountered: