Portable object-oriented shell scripting
Define classes and instantiate objects in shell with the usual shell functions syntax.
Compatible with most Bourne like shells like bash/dash/ksh/osh/zsh
Basic demo of how clash works here
What you would expect from basic object-oriented programming is supported:
- classes
- objects
- attributes
- methods
- constructor
- destructor
- generated getters
- generated setters
Not supported:
- inheritance
Most modern shell builtin high level constructs such as local variables, arrays, dictionaries and such are not defined by POSIX and usually differ in behavior and syntax between different shells.
Portability is at the core of this project, there is a minimal use of non POSIX features, allowing most of the code to be highly portable between many different shells.
Shell | Version(s) tested | Support |
---|---|---|
bash | 3.0 / 4.4 / 5.0 | Full |
busybox | 1.28 (ash w/ bash compat) | Full |
dash | 0.5.10.2 | Full |
FreeBSD sh | FreeBSD 12.1 release | Full |
ksh | AT&T 93u+ (2012) | clash 100%, lib 50% |
mksh | R57 | Full |
oksh | 5.2.14 | Full |
osh | 0.6.0 | Full |
yash | 2.48 | Full |
zsh | 5.6.2 | Full |
# source the class definition
. ./clash
# Create a class
class Car \
brand \
speed \
_start \
__init__ \
# Methods are defined externally and are registered without the class name
# (see _start above)
Car_start() {
printf 'Starting %s: ' "$self"
if [ "$speed" -gt 100 ]; then
echo 'Vroooooooom'
else
echo 'Vroom'
fi
}
# If declared, the constructor is called when creating the object
Car__init__() {
echo "New $1 added to the garage"
"$self"_brand_is "$1"
"$self"_speed_is "$2"
}
# Create a new car object named 'cool_bolid', Car__init__ function is called
Car cool_bolid Ferrari 200
# Create another one, Car__init__ function is called
Car family_truck Toyota 90
New Ferrari added to the garage
New Toyota added to the garage
# Call the _start method
cool_bolid_start
family_truck_start
Starting cool_bolid: Vroooooooom
Starting family_truck: Vroom
# check the family truck speed (generated getter)
echo "Family truck speed: $(family_truck_speed)"
# Modify the speed (generated setters are suffixed by _is)
echo 'Upgrading truck engine...'
family_truck_speed_is 120
# Start it again
family_truck_start
Family truck speed: 90
Upgrading truck engine...
Starting family_truck: Vroooooooom
Simple implementations of List and Vector have been done so far
(Dict is mostly a proof of concept), they can be found in the lib
directory.
Following-up with our example:
# Load the list module (containing the List class)
. ./lib/list
# create a list of my cars
List my_cars cool_bolid family_truck
# Create a new one
Car minivan Toyota 80
# And add it to the list
my_cars_append minivan
# Print the elements
printf 'My cars: '
my_cars_print
New Toyota added to the garage
My cars: cool_bolid family_truck minivan
# Let's say we want the list of our Toyota cars, let's create a helper function
is_toyota() {
# $1 is the car object, return 0 if the brand is Toyota, else 1
[ "$("$1"_brand)" = Toyota ]
}
# And use the List filter method with our newly created helper function
my_cars_filter my_toyota_cars is_toyota
# Check the result
printf 'My Toyota cars: '
my_toyota_cars_print
My Toyota cars: family_truck minivan
Note: this code can be found in the examples
directory.
Gathering and organizing data can be painful and dirty in shell, especially when avoiding non-portable features like arrays/maps. Object-oriented programming is one solution to those issues, but does not exist natively in shell.
The goal of this project to provide a simple solution to the missing object paradigm and to be as portable as possible (working with dash/bash/ksh/zsh/busybox ash), the only "non POSIX" feature used being local
(more info in the notes section).
Most importantly the syntax to create and use classes and objects is the exact same as usual shell, and that is because what clash does is only generate functions, almost no parsing is done, and the grammar isn't altered using aliases.
Two functions are generated per attribute, one for the getter with the attribute value hardcoded, and the other for the setter which when called redefines the original getter function with a new hardcoded value.
Methods are just generated functions wrapping a local declaration of all the attributes and a call of the initially declared function so that it has access to all the attributes in its frame.
All these methods and attributes are generated using eval
and playing with quotes (a lot of quotes) to preserve IFS characters.
This is something that comes up really often, I've found that Oil Shell main FAQ covers this topic thoroughly:
- I don't understand. Why not use a different a programming language?
- Shouldn't we discourage people from writing shell scripts?
- Shouldn't scripts over 100 lines be rewritten in Python or Ruby?
Be sure to check out Oil in more detail if you can, it's a very impressive project.
Pretty bad! But this depends heavily on which shell you are using and the number and size of the objects you are creating. Creating a dozen of objects with a few methods and attributes won't be noticeable. However creating hundreds or thousands of objects will lead to (very) slow execution and a high load.
Considering how clash works this is to be expected, for instance one single method call needs to do as many command substitutions as the number of attributes of the object:
truck_start() {
# Populate all object attributes for the actual function call
local self=truck
local speed=$(truck_speed)
local brand=$(truck_brand)
Car_start "$@" # Car_start function must be created by the user
}
Here we have two attributes, speed
and brand
, so usually at least two
new processes are spawned to just prepare the local variables used in the
method. This is where some shells shine more than others, ksh93 for instance
does not necessarily spawn a new process for command substitution.
For example let's consider the following:
say_hello() {
echo 'Hello!'
}
for i in 1 2 3 4 5; do
result=$(say_hello)
done
Is there a need to create a new process to capture say_hello
output? No, not
necessarily, and echo
usually is a builtin, which means everything happening
here can be strictly done by the shell. Now let's see what the most common
shells do:
bash/mksh/zsh/dash/busybox/osh/yash
sh$ strace -cfe process bash say_hello.sh
strace: Process 27040 attached
strace: Process 27041 attached
strace: Process 27042 attached
strace: Process 27043 attached
strace: Process 27044 attached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
48.70 0.000337 33 10 5 wait4
28.76 0.000199 199 1 execve
21.97 0.000152 30 5 clone
0.58 0.000004 2 2 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.000692 18 6 total
Total 5 new processes (excluding the initial execve), all clone
calls, but
ksh93 on the other hand:
sh$ strace -cfe process ksh say_hello.sh
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
97.10 0.000536 536 1 execve
2.90 0.000016 8 2 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.000552 3 1 total
0 clone calls, ksh93 executes that script in a single process, fascinating
right? But you might think that it just understands that result
is never used,
or that maybe if you were creating some variables inside that subshell it would
have needed to fork to be able to throw away the environment one it's done.
But no, even in all those cases ksh93 does not fork.
Now let's take an example with clash to show how big of a difference this can make:
sh$ shuf --input-range 1-30 > shuffled-numbers.txt # use a fixed random sequence
And here is sort.sh
. ./clash
. ./lib/list
List foo $(cat shuffled-numbers.txt)
foo_sort
foo_print
This creates a list foo
of 30 shuffled elements, sorts it and prints it on
the standard output.
(Please note that the results may vary a little due to the shuffling part of this script, the version of the shells used, your machine etc. This is in no way a serious benchmark)
sh$ strace -cfe process ksh sort.sh
[...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
73.24 0.000594 54 11 9 execve
17.39 0.000141 23 6 4 wait4
7.64 0.000062 31 2 clone
2 clone calls for ksh93 that's not too bad, especially since one them is
caused by the need to execute cat
, which isn't a builtin.
The failed execve
calls are mostly due to some compatibility checks when
sourcing clash, we can ignore them.
Let's also check the running time that we'll compare with the other shells
sh$ time ksh sort.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
real 0m0.081s
user 0m0.074s
sys 0m0.007s
Now let's try it with the other shells:
sh$ strace -cfe process bash sort.sh
[...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
72.67 0.085384 50 1686 843 wait4
27.11 0.031848 37 843 clone
0.22 0.000255 127 2 execve
Around 800 clone calls, yes that's right, creating, sorting and printing a list of 30 elements spawns no less than 800 processes with bash/zsh/dash/busybox.
And looking at bash running time:
sh$ time bash sort.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
real 0m0.396s
user 0m0.293s
sys 0m0.119s
Roughly 5 times slower.
Don't see mksh mentioned? Well it is a bit special since it turns out that
in mksh printf
is not a builtin, and clash heavily relies on printf
which
explains the following results:
sh$ strace -cfe process mksh sort.sh
[...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
93.57 0.829082 212 3908 1954 wait4
3.83 0.033943 17 1954 clone
2.43 0.021552 19 1112 execve
Almost 2000 clone and 1000 execve, this is wild.
sh$ time mksh sort.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
real 0m5.224s
user 0m2.015s
sys 0m3.282s
5 seconds to sort a list of 30 elements, now you know what to use to train your machine learning models.
Want more? Let's try with 1000 elements instead of 30, see how it goes with bash:
sh$ strace -cf bash sort.sh
[...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
66.95 33.078252 367 90108 45054 wait4
27.64 13.652942 303 45054 clone
1.89 0.932990 1 535447 rt_sigprocmask
[...]
sh$ time bash sort.sh
[...]
real 1m8.619s
user 0m17.712s
sys 0m53.141s
More than a minute to sort 1000 elements, please also note that on the total
90108 wait4
syscalls, half of them resulted in errors (ECHILD, No child
processes, haven't investigated why though), same goes for zsh although it
definitely runs a bit faster.
Interestingly dash shows a negligible number of wait4
errors, and takes
around 3x less time to run on average:
sh$ strace -cf dash sort.sh
[...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
45.31 4.239508 94 45056 2 wait4
39.09 3.657587 81 45054 clone
5.53 0.516913 5 97154 read
[...]
On the other hand busybox is the complete opposite with 90% of its wait4
calls
failing.
All this to say that you can't expect good performance when starting to use a lot of objects, that's the way shell command substitution works and clash design definitely doesn't help with that.
Even ksh93 that avoids spawning new processes struggles to keep the lead in terms of speed when going over a few thousands of objects in use.
This is a valid point, especially one of the first thing that comes to mind is
why generate a function to set the attribute value (obj_attr_is foo
) when you
could just do obj_attr=foo
.
Well obj_attr=foo
works fine, but what about objname=obj; ${objname}_attr=foo
? It doesn't unless you cheat with assignment builtins such
as declare
/typeset
/local
or use eval
, which is going to either reduce
the portability of your script or is going to make it really hard to read. This
pattern of ${obj}_attr
is something you encounter all the time, once you pass
a variable name as a parameter of a function, how do you directly use its
original global variable name? You can't.
This is especially true for object-oriented programming, how do you make use of
$self
? You can't use a full object name in class methods either, or else it
would only work for a specific instance and breaks the whole point of classes
being a generic model.
Let's take an example:
. ./clash
class Car \
running \
_start
Car_start() {
"$self"_running=true
}
Car truck
truck_start
When calling that method we get:
truck_running=true: command not found
Because the shell didn't recognize an assignment (due to trying to expand a
variable in the lvalue) it deduced that we were trying to run a command called
truck_running=true
, which of course does not exist.
Now we could trick the shell into deferring the assignment after the evaluation
of $self
by using an assignment builtin like declare/typeset/export
:
declare -g "$self"_running=true # -g to set it in the global scope
It works!
Though the main issue here is portability, busybox and dash do not
support declare
and typeset
. On the other hand readonly
, export
and
local
have a very specific purpose which doesn't fit here.
So let's just use the almighty eval
!
eval "$self"_running=true
That's even easier to read than declare -g
.
But now what if we want to assign a value passed in parameter of the method?
class Car \
running \
song_playing \
_start
Car_start() {
eval "$self"_running=true
eval "$self"_song_playing="$1"
}
Car truck
truck_start 'Another One Bites the Dust'
You should see something like: One: command not found
, this is due to how
eval
works. Simply put, eval
just adds another round of expansion, meaning
you can take your line, expand the variables and then run it as if it was
written in your script as is. So basically the eval
line is replaced by:
truck_song_playing=Another One Bites the Dust
And this is not valid due to that space after Another
, the shell understands
it as pass truck_song_playing=Another
as an environment variable for the One
command, which does not exist. We could add some quotes, like "'$1'"
, but what
if the string passed contains a single quote as well? This would break. Instead
we should find a way to defer $1
expansion to happen after $self
expansion.
So basically we want this to expand first to:
truck_song_playing=$1
Simple enough, single quote your variable so that it gets expanded after
eval
first expansion:
eval "$self"_song_playing='$1'
Fantastic! Though you might understand where this is going, this is way more
complicated than it should be, and usually to avoid repeated complexity we tend
to create functions, which is clash <obj_attr>_is
method whole point.
Using variables is not all bad though, if used internally it would definitely
improve a few things, including speeding up methods by avoiding one command
substitution per attribute when setting up the method context. This strategy is
available in clash, but disabled by default since performance improvement
was not as meaningful as expected (to use it set CLASH_HYBRID=true
in the
shell environment).
Also one thing to note is that using variables would mean bypassing getattr
and setattr
hooks when using direct access, which would defeat their purpose.
A zero command substitution approach is possible but it would require introducing even more abstractions to make it work, here's a proof of concept of the idea:
https://github.com/lhoursquentin/clash-encore/blob/master/POC.sh
Nowadays (2020), the Bourne shell and /bin/sh
are usually two different
things. sh
is on most systems just a symlink to another modern shell, usually
to bash, dash, mksh or busybox:
sh$ ls -l /bin/sh
lrwxrwxrwx. 1 root root 4 Aug 5 03:21 /bin/sh -> bash
So basically in this case running a script with sh
is exactly the same as
running bash --posix
, which also doesn't mean that it only uses POSIX defined
features but that it respects POSIX, which means that you can still use bash
arrays in POSIX mode for instance even if those do not appear once in the
specification.
So does clash work with the Bourne shell? No, the original Bourne shell is not POSIX compliant, it didn't even support functions.
clash should be able to work with any POSIX compliant shell supporting
local
assignments, and if it doesn't feel free to open an issue.
- The
local
builtin (typeset
for ksh93) is not specified by POSIX, but all modern POSIX shells support it (see compatibility table) - This project was initially inspired by the amazing bash-oo-framework (https://github.com/niieani/bash-oo-framework), a huge project aiming to provide modern features to bash.
- This whole project is mostly a proof of concept that went too far. Building this was both extremely exciting (being able to see that even strict POSIX shells could do anything with a few layers of abstraction) and very disappointing (incredibly bad performance, making this only usable on a very limited set of use cases and overall a waste of computing resources). I'm not actively working on this project anymore but issues, questions & discussions are of course welcomed.