Benchmark
Import
from Benchmark import Benchmark
General Usage
Loop through each number up to n
and calculate the total in the fibonacci sequence:
alias n = 35
Define the recursive version first:
fn fib(n: Int) -> Int:
if n <= 1:
return n
else:
return fib(n-1) + fib(n-2)
To benchmark it, create a nested fn
that takes no arguments and doesn't return anything, then pass it in as a parameter:
fn bench():
fn closure():
for i in range(n):
_ = fib(i)
let nanoseconds = Benchmark().run[closure]()
print("Nanoseconds:", nanoseconds)
print("Seconds:", Float64(nanoseconds) / 1e9)
bench()
Nanoseconds: 50322420
Seconds: 0.05032242
Define iterative version for comparison:
fn fib_iterative(n: Int) -> Int:
var count = 0
var n1 = 0
var n2 = 1
while count < n:
let nth = n1 + n2
n1 = n2
n2 = nth
count += 1
return n1
fn bench_iterative():
fn iterative_closure():
for i in range(n):
_ = fib_iterative(i)
let iterative = Benchmark().run[iterative_closure]()
print("Nanoseconds iterative:", iterative)
bench_iterative()
Nanoseconds iterative: 0
Notice that the compiler has optimized away everything, LLVM can change an iterative loop to a constant value if all the inputs are known at compile time through constant folding
, or if the value isn't actually used for anything with Dead Code Elimination
both of which could be occurring here.
There is a lot going on under the hood, and so you should always test your assumptions with benchmarks, especially if you're adding complexity because you think it will improve performance, which often isn't the case.
Max iters
Set max iterations and a 1s max total duration
from Time import sleep
fn bench_args():
fn sleeper():
print("sleeping 300,000ns")
sleep(3e-4)
print("0 warmup iters, 4 max iters, 0ns min time, 1_000_000_000ns max time")
let nanoseconds = Benchmark(0, 5, 0, 1_000_000_000).run[sleeper]()
print("average time", nanoseconds)
bench_args()
0 warmup iters, 4 max iters, 0ns min time, 1_000_000_000ns max time
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
average time 363769
Note there is some extra logic inside Benchmark
to help improve accuracy, so here it actually runs 6 iterations
Max Duration
Limit the max running time, so it will never run over 0.001 seconds and will not hit the max iters of 5:
fn bench_args_2():
fn sleeper():
print("sleeping 300,000ns")
sleep(3e-4)
print("\n0 warmup iters, 5 max iters, 0 min time, 1_000_000ns max time")
let nanoseconds = Benchmark(0, 5, 0, 1_000_000).run[sleeper]()
print("average time", nanoseconds)
bench_args_2()
0 warmup iters, 5 max iters, 0 min time, 1_000_000ns max time
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
average time 364582
Min Duration
Try with a minimum of 3 million nanoseconds, so it ignores the max iterations and runs 5 normal runs:
fn bench_args():
fn sleeper():
print("sleeping 300,000ns")
sleep(3e-4)
let nanoseconds = Benchmark(0, 2, 1_500_000, 1_000_000_000).run[sleeper]()
print("average time", nanoseconds)
bench_args()
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
average time 366545
Warmup
You should always have some warmup iterations, there is some extra logic for more accurate results so it won't run exactly what you specify:
fn bench_args():
fn sleeper():
print("sleeping 300,000ns")
sleep(3e-4)
let nanoseconds = Benchmark(1, 2, 0, 1_000_000_000).run[sleeper]()
print("average time", nanoseconds)
bench_args()
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
average time 364094