I've implemented a simple SAT solver in Python, Mojo and C++, then compared their runtime performance in the style of The Great Computer Language Shootout.
SAT represents the Boolean satisfiability problem, which is one of the canonical computational problems known to be NP-complete.
DIMACS is the standard input file format used at the SAT competition and most SAT solvers support this format.
Example (from pigeonhole2.cnf)
p cnf 6 9
1 2 0
3 4 0
5 6 0
-1 -3 0
-1 -5 0
-3 -5 0
-2 -4 0
-2 -6 0
-4 -6 0
Although a SAT solver can be very complicated, this exercise is to compare the compiler optimizations and the solver code is intentionally kept very simple. I started with a Python version and then ported it into Mojo and C++.
Version | Source File |
---|---|
Python version | sat-solver-python.py |
Mojo version | sat-solver-port.mojo |
C++ version | sat-solver-cpp.cpp |
The Python version is 118 LoC. The Mojo version is 128 LoC with some more type declarations. The C++ version came in at 142 LoC.
In terms of algorithm and data structure, they are comparable. The Python version mainly uses list
, the Mojo version uses DynamicVector
and the C++ version uses vector
. SAT solvers are memory access intensive using those data structures. So, their internal implementation will have a great impact on the result.
A sample of DIMACS inputs are generated using the gen-pigeonhole-cnf.mojo
script. The script generates SAT formulas in DIMACS format based on the pigeonhole principle with a given number of holes as its argument. All generated formulas are unsatisfiable, forcing simple SAT solvers to explore the entire search space.
Usage:
mojo run gen-pigeonhole-cnf.mojo <holes>
Run make
to build Mojo version and C++ version binaries and benchmarks (pigeonhole2.cnf through pigeonhole8.cnf).
Run python3 run-benchmark.py
to execute the whole experiment. The script runs 4 different solver configurations over the generated benchmark input files.
Configuration | Command Line |
---|---|
Python version | python3 sat-solver-python.py |
Mojo version (JIT) | mojo run sat-solver-port.mojo |
Mojo version (precompiled) | ./sat-solver-port |
C++ version (precompiled) | ./sat-solver-cpp |
Mojo version was compiled with the default optimization enabled (without any options). The C++ version was compiled with clang
and -O1
optimization option. (But, any higher optimization option didn't seem to make much difference.)
Here're the runtime measurements in seconds:
python3 sat-solver-python.py | mojo run sat-solver-port.mojo | ./sat-solver-port | ./sat-solver-cpp | |
---|---|---|---|---|
pigeonhole2.cnf | 0.0249 | 0.0895 | 0.0037 | 0.0032 |
pigeonhole3.cnf | 0.0250 | 0.0900 | 0.0042 | 0.0033 |
pigeonhole4.cnf | 0.0361 | 0.0916 | 0.0050 | 0.0036 |
pigeonhole5.cnf | 0.2142 | 0.1123 | 0.0243 | 0.0057 |
pigeonhole6.cnf | 3.2518 | 0.4473 | 0.3361 | 0.0342 |
pigeonhole7.cnf | 58.0513 | 6.3182 | 5.8706 | 0.5018 |
pigeonhole8.cnf | 1114.3073 | 117.5253 | 112.2467 | 9.3582 |
(Measurements were recorded on a MacBook Pro with M1 Pro and 16GB RAM running macOS Sonoma 14.2.1)
Here're the speed-ups over the original Python runtime.
mojo run sat-solver-port.mojo | ./sat-solver-port | ./sat-solver-cpp | |
---|---|---|---|
pigeonhole2.cnf | 0.28 | 6.67 | 7.73 |
pigeonhole3.cnf | 0.28 | 5.97 | 7.45 |
pigeonhole4.cnf | 0.39 | 7.18 | 10.13 |
pigeonhole5.cnf | 1.91 | 8.80 | 37.67 |
pigeonhole6.cnf | 7.27 | 9.68 | 95.13 |
pigeonhole7.cnf | 9.19 | 9.89 | 115.68 |
pigeonhole8.cnf | 9.48 | 9.93 | 119.07 |
The Mojo version has over 9x speedup over the larger inputs. The difference between JIT and precompile was not significant and that's understandable since the size of code is very small.
Interestingly, Python is indeed lighter at loading, recording 0.025 seconds compared to Mojo's 0.0895 seconds for pigeonhole2.cnf
and pigeonhole3.cnf
, which are so small that the most of the runtime must be just loading/compiling. In any case, their loading/compile time is quite small. And Mojo's faster runtime quickly overtake Python version over the larger inputs.
The C++ version turned out to be much faster than the Mojo version (over 95x~119x speedup over the baseline Python version). That suggests Mojo still has a long way to go in terms of code generation.
The ported Mojo version couldn't use for-each loop directly over the content of DynamicVector
. Instead, I had to use an index to access the vector indirectly. It seems to be the main bottleneck.
def is_consistent_clause(borrowed that: Self, clause: Clause) -> Bool:
for idx_lit in range(len(clause)):
lit = clause[idx_lit]
...
I implemented another version using the (unsafe) Pointer
type (in sat-solver-unsafe.mojo) and the Mojo's performance matched C++'s.
Mojo is aiming to be a superset of Python. Even at an early version (0.6.1), it was straightforward to port the Python version to Mojo. But, there were several issues as indicated in the code comments. Here is the summary of my expereince.
I think these are minor compatibility issues that can be easily fixed as Mojo matures.
- No list comprehension: Use the
DynamicVector
type and loops to initialize. class
definition: Usestruct
definition.- Type casing like
int(x)
orstr(y)
: Use Mojo types and library functions. - No
TextIOWrapper
type: UseFileHandle
. DynamicVector
instance couldn't be returned in a tuple.sys.argv
is a function, not a list: Changeargv[1]
toargv()[1]
.- Closures can't capture by reference, yet: Pass references explictly to the closure.
Also, I experienced a few unexpected issues that might be just bugs in the current version of Mojo (0.6.1).
The variable scope of local variables in def
functions are not hoisted to the top of the function.
- Symptom: Local variables that are assigned under a
then
-clause wasn't accessible after the if-statement. - Workaround: Those variables need to be explicitly declared outside of the if-statement.
- I filed modularml/mojo#1574 for this, but it seems to be intentional deviation from Python.
Even if all of those compiler and compatibility issues are resolved, I don't think some of those borrowed
/inout
annotations on arguments could be elided altogether. That means not all Python code can be ported without a change. I expect many Python code will need some additional annotations in their Mojo port, even if some of that could be automated.
As an alternative, Mojo could feature an "auto" ownership for def
functions, instead of owned
default, allowing the compiler to decide owned
/borrowed
/inout
based on the usage of the arguments within the function definition.
The issue will be more pronounced with closures, because Python closures are expected to capture local variables by reference. The owned
default ownership is more likely to break closures, even after Mojo implementing capture declarations in closures.