-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark for simple programs #520
Comments
But we can do better --- the modulo operation |
Here is a C implementation: #include <stdint.h>
#include <stdio.h>
int32_t is_prime(int32_t n) {
int32_t factors = 0;
for (int32_t i=2; i < n; i++) {
if (n % i == 0) factors += 1;
}
return factors == 0;
}
int main() {
int32_t limit = 20000;
int32_t total = 0;
for (int32_t i=2; i < limit; i++) {
if (is_prime(i)) total += 1;
}
printf("%d\n", total);
} On my computer: $ clang -O3 a.c
$ time ./a.out
2262
./a.out 0.13s user 0.00s system 98% cpu 0.137 total It's slower than LPython, so perhaps something is not quite right yet. |
Benchmark using Intel i5: Linux - Ubuntu 22CPython 3.10.4: $ time python examples/expr2.py
2262
python examples/expr2.py 21.42s user 0.02s system 99% cpu 21.457 total LPython with $ lpython --fast examples/expr2.py
$ time ./a.out
2262
./a.out 0.57s user 0.00s system 99% cpu 0.573 total About LPython without $ lpython examples/expr2.py
$ time ./a.out
2262
./a.out 3.61s user 0.00s system 99% cpu 3.606 total About Clang: $ clang -O3 -march=native a.c
$ time ./a.out
2262
./a.out 0.92s user 0.00s system 99% cpu 0.929 total |
Ok, so LPython is faster than Clang for you as well. I think the C code is probably not optimal, or somehow our hand written modulo is faster than the one Clang generates. We should look into the llvm to understand better. But overall not a bad start. |
Here are the results on my machine (Apple MAC M1): % time PYTHONPATH=~/repos/lpython/src/runtime/ltypes python expr.py
2262
PYTHONPATH=~/repos/lpython/src/runtime/ltypes python expr.py 11.28s user 0.04s system 99% cpu 11.332 total % lpython --fast expr.py
% time ./a.out
2262
./a.out 0.20s user 0.01s system 97% cpu 0.216 total % clang -O3 expr.c -o. expr.out
% time ./expr.out
2262
./expr.out 0.16s user 0.00s system 98% cpu 0.169 total |
Here is the quick sort benchmark: from ltypes import i32, f32
from numpy import empty
import random
def _swap(arr: f32[500000], i: i32, j: i32):
tmp: f32
tmp = arr[i]
arr[i] = arr[j]
arr[j] = tmp
def Partition(arr: f32[500000], low: i32, high: i32) -> i32:
i: i32
i = low - 1
pivot: f32
pivot = arr[high]
j: i32
for j in range(low, high):
if (arr[j] <= pivot):
i += 1
_swap(arr, i, j)
_swap(arr, high, i+1)
return i+1
def QuickSort(arr: f32[500000], low: i32, high: i32):
if (low < high):
pivot: i32
pivot = Partition(arr, low, high)
QuickSort(arr, low, pivot-1)
QuickSort(arr, pivot+1, high)
def main():
arr: f32[500000]
arr = empty(500000)
i: i32
for i in range(500000):
arr[i] = random.random()
QuickSort(arr, 0, 500000-1)
print("Array sorted using Quick Sort: ")
for i in range(500000):
print(arr[i])
main() Equivalent C code: #include<stdio.h>
#include <stdlib.h>
#include <time.h>
void swap (float* a, float* b)
{
float tmp = *a;
*a = *b;
*b = tmp;
}
int Partition(float arr[], int low, int high){
int i = low - 1;
float pivot = arr[high];
for(int j = low; j < high; j++) {
if(arr[j] <= pivot) {
i++;
swap(&arr[i], &arr[j]);
}
}
swap(&arr[high], &arr[i+1]);
return i+1;
}
void QuickSort(float arr[], int low, int high){
if(low < high) {
int pivot = Partition(arr, low, high);
QuickSort(arr, low, pivot-1);
QuickSort(arr, pivot+1, high);
}
}
int main() {
srand(time(0));
int size = 500000;
float arr[size];
for (int i = 0; i < size; i++)
arr[i] = (rand() / (double) RAND_MAX);
QuickSort(arr, 0, size-1);
printf("Array sorted using Quick Sort: ");
for (int i = 0; i < size; i++)
printf("%f ", arr[i]);
printf("\n");
return 0;
} CPython: $ time python examples/expr2.py > tmp
python examples/expr2.py > tmp 10.00s user 0.60s system 108% cpu 9.729 total LPython: $ lpython examples/expr2.py; time ./a.out > tmp
./a.out > tmp 0.86s user 2.10s system 99% cpu 2.961 total LPython with $ lpython --fast examples/expr2.py
Internal Compiler Error: Unhandled exception
Traceback (most recent call last):
Binary file "/home/thirumalai/Open_Source/lpython/src/bin/lpython", in _start()
File "./csu/../csu/libc-start.c", line 392, in __libc_start_main_impl()
File "./csu/../sysdeps/nptl/libc_start_call_main.h", line 58, in __libc_start_call_main()
File "/home/thirumalai/Open_Source/lpython/src/bin/lpython.cpp", line 879, in ??
err = compile_python_to_object_file(arg_file, tmp_o, runtime_library_dir, compiler_options, time_report);
File "/home/thirumalai/Open_Source/lpython/src/bin/lpython.cpp", line 373, in ??
res = fe.get_llvm3(*asr, diagnostics);
File "/home/thirumalai/Open_Source/lpython/src/lpython/python_evaluator.cpp", line 57, in LFortran::PythonCompiler::get_llvm3(LFortran::ASR::TranslationUnit_t&, LFortran::diag::Diagnostics&)
run_fn);
File "/home/thirumalai/Open_Source/lpython/src/libasr/codegen/asr_to_llvm.cpp", line 4762, in LFortran::asr_to_llvm(LFortran::ASR::TranslationUnit_t&, LFortran::diag::Diagnostics&, llvm::LLVMContext&, Allocator&, LFortran::Platform, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
pass_inline_function_calls(al, asr, rl_path);
File "/home/thirumalai/Open_Source/lpython/src/libasr/pass/inline_function_calls.cpp", line 394, in LFortran::pass_inline_function_calls(Allocator&, LFortran::ASR::TranslationUnit_t&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
LFORTRAN_ASSERT(asr_verify(unit));
File "/home/thirumalai/Open_Source/lpython/src/libasr/asr_verify.cpp", line 533, in LFortran::asr_verify(LFortran::ASR::TranslationUnit_t const&, bool)
v.visit_TranslationUnit(unit);
File "/home/thirumalai/Open_Source/lpython/src/libasr/asr_verify.cpp", line 111, in LFortran::ASR::VerifyVisitor::visit_TranslationUnit(LFortran::ASR::TranslationUnit_t const&)
this->visit_symbol(*a.second);
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3365, in LFortran::ASR::BaseVisitor<LFortran::ASR::VerifyVisitor>::visit_symbol(LFortran::ASR::symbol_t const&)
void visit_symbol(const symbol_t &b) { visit_symbol_t(b, self()); }
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3166, in ??
case symbolType::Function: { v.visit_Function((const Function_t &)x); return; }
File "/home/thirumalai/Open_Source/lpython/src/libasr/asr_verify.cpp", line 280, in LFortran::ASR::VerifyVisitor::visit_Function(LFortran::ASR::Function_t const&)
visit_stmt(*x.m_body[i]);
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3379, in LFortran::ASR::BaseVisitor<LFortran::ASR::VerifyVisitor>::visit_stmt(LFortran::ASR::stmt_t const&)
void visit_stmt(const stmt_t &b) { visit_stmt_t(b, self()); }
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3185, in ??
case stmtType::Assignment: { v.visit_Assignment((const Assignment_t &)x); return; }
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3615, in LFortran::ASR::BaseWalkVisitor<LFortran::ASR::VerifyVisitor>::visit_Assignment(LFortran::ASR::Assignment_t const&)
self().visit_expr(*x.m_value);
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3421, in LFortran::ASR::BaseVisitor<LFortran::ASR::VerifyVisitor>::visit_expr(LFortran::ASR::expr_t const&)
void visit_expr(const expr_t &b) { visit_expr_t(b, self()); }
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3268, in ??
case exprType::ArrayRef: { v.visit_ArrayRef((const ArrayRef_t &)x); return; }
File "/home/thirumalai/Open_Source/lpython/src/libasr/asr_verify.cpp", line 365, in LFortran::ASR::VerifyVisitor::visit_ArrayRef(LFortran::ASR::ArrayRef_t const&)
visit_array_index(x.m_args[i]);
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 4342, in LFortran::ASR::BaseWalkVisitor<LFortran::ASR::VerifyVisitor>::visit_array_index(LFortran::ASR::array_index_t const&)
self().visit_expr(*x.m_right);
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3421, in LFortran::ASR::BaseVisitor<LFortran::ASR::VerifyVisitor>::visit_expr(LFortran::ASR::expr_t const&)
void visit_expr(const expr_t &b) { visit_expr_t(b, self()); }
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3232, in ??
case exprType::BinOp: { v.visit_BinOp((const BinOp_t &)x); return; }
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3898, in LFortran::ASR::BaseWalkVisitor<LFortran::ASR::VerifyVisitor>::visit_BinOp(LFortran::ASR::BinOp_t const&)
self().visit_expr(*x.m_left);
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3421, in LFortran::ASR::BaseVisitor<LFortran::ASR::VerifyVisitor>::visit_expr(LFortran::ASR::expr_t const&)
void visit_expr(const expr_t &b) { visit_expr_t(b, self()); }
File "/home/thirumalai/Open_Source/lpython/src/libasr/../libasr/asr.h", line 3267, in ??
case exprType::Var: { v.visit_Var((const Var_t &)x); return; }
File "/home/thirumalai/Open_Source/lpython/src/libasr/asr_verify.cpp", line 357, in LFortran::ASR::VerifyVisitor::visit_Var(LFortran::ASR::Var_t const&)
require(symtab_in_scope(current_symtab, x.m_v),
LFortranException: ASR verify failed: Var::m_v `high_Partition` cannot point outside of its symbol table Clang: $ clang -O3 -march=native quickSort.c; time ./a.out > tmp
./a.out > tmp 0.31s user 0.04s system 99% cpu 0.348 total |
Perfect! Can you comment out the |
Yup, thanks for that. It resolves the above issue, but I get core dumped. $ lpython --fast examples/expr2.py; time ./a.out > tmp
[2] 330898 segmentation fault (core dumped) ./a.out > tmp
./a.out > tmp 0.78s user 1.91s system 93% cpu 2.894 total $ lpython examples/expr2.py; time ./a.out > tmp
./a.out > tmp 0.87s user 1.85s system 99% cpu 2.724 total |
That's another bug that we have to investigate and fix. You can comment all the optimizations for |
Core dumped is caused by |
Perfect. What is the performance of the |
Benchmark:
CPython 3.10.2 speed on Apple M1 Max:
LPython speed:
So about 55x speedup.
The text was updated successfully, but these errors were encountered: