-
-
Notifications
You must be signed in to change notification settings - Fork 164
Project Goals
andychu edited this page Jan 20, 2017
·
39 revisions
I was asked what the project goals were, so I dumped the 10,000 foot view on this page. It look like 5-10 years of work, so it will have to be pruned.
For a more immediate view of the project goals, see the Oil blog.
Design a modern Unix shell that can do everything bash/zsh/etc. can do, and more:
- System Administration
- Building Linux distributions (e.g. Arch Linux uses bash for PKGBUILD).
- Startup scripts
- Configure and build scripts. Reproducible and distributed builds.
- Distributed Computing
- Building containers
- Specifying remote jobs
- Feedback and Monitoring: performance measurement, security testing.
- Data Science / Scientific Computing
- Heterogeneous "big data" and small data pipelines. The language should scale down as well as scale up, i.e. low startup latency for small jobs.
- Incorporate features of "workflow languages" and systems in the MapReduce family.
- Concise data cleaning, transformation, and metrics. Non-goal: mathematical modeling. That should be left to specialized languages like R, Julia, and Matlab.
- Reproducible Research.
- Interactive Computing
- A general purpose REPL (terminal and probably a Jupyter kernel).
- Document Publishing
- http://oilshell.org/ and many programming books are built and orchestrated with shell scripts / Makefiles
- Easy upgrade path from bash, the most popular shell in the world.
- To do this, I've written a very compatible bash parser, which will allow automatic conversion of bash (osh) to oil. So the language has a different syntax and a superset of bash semantics.
- Consistent syntax.
- POSIX sh and bash have evolved many quirks.
- Fix sh and bash semantics to be more developer-friendly (in a backward compatible way).
- Proper Arrays
- Strict mode for developer productivity (enhanced set -o errexit, nounset, pipefail)
- Enhance the shell language; treat it as a real programming language.
- Fill in obvious gaps, like abspath, etc.
- Compound data structures
- Example: Completion functions in bash have a bad API involving globals and are difficult to write. It should feel more like writing completion functions in Python or JavaScript.
- Selected influences: Python, R, Ruby, Perl 6, Lua (API), ML, C and C++. Power Shell.
- Reduce language cacophony in shell programming by reimplementing tools closely related to the shell.
- Example: combine shell, awk, and make.
- Also combine tools like find (which has its own expression parser and starts processes), and xargs/GNU parallel, which both start processes. GNU parallel is actually mentioned in the bash manual.
- Richer constructs for concurrency and parallelism.
- Folding in
make -j
andxargs -P
goes a long way.
- Folding in
- Allow secure programs to be written.
- In emitting strings: escaping
- In reading strings: error checking should be easy, better control over "read" delimiters, etc.
- Fix issues with globs and flags, i.e. untrusted file system and untrusted variables
- C and C++ bindings
- provide access to advanced Linux kernel features - namespaces, cgroups, seccomp, tracing, /proc, etc. (but remain portable to other Unices)
- It should be possible to write a busybox in oil.
- Should be the best language for writing quick command line tools.
- In particular, replace the getopt interface in bash with something much better.
- Expand the range of things that can be done with the "polyglot" model.
- Coprocesses
- Built-in serialization formats like CSV, JSON, maybe HTML
- Maybe some binary formats as libraries
- Imperative on the scale of code, but declarative/functional/concurrent on scale of architecture, not unlike
sh
itself.
- Proper error messages like Clang/Swift. Static Parsing.
- Provide end-to-end tracing and profiling tools (e.g. for pipelines that run for hours)
- Library-based design like LLVM. Example: the same parser is used in batch mode as well as completion mode, which is not true of all shell implementations. The parser can be used for auto-formatting and linting, which is also not true of other implementations.
- Few dependencies so it can be used in bootstrapping Unix systems and clusters. (e.g. distributed as a C++ file and optional oil source.)
- Much of oil should be written in oil (which means the VM needs to be fast enough for this).
- Expose our toolkit for little languages -- lexing, parsing, AST representation, etc. So that other languages can be built in the same way.
- Metaprogramming with ASTs as first class data structures.