Benchmark

Import

from Benchmark import Benchmark

General Usage

Loop through each number up to n and calculate the total in the fibonacci sequence:

alias n = 35

Define the recursive version first:

fn fib(n: Int) -> Int:
    if n <= 1:
       return n 
    else:
       return fib(n-1) + fib(n-2)

To benchmark it, create a nested fn that takes no arguments and doesn't return anything, then pass it in as a parameter:

fn bench():
    fn closure():
        for i in range(n):
            _ = fib(i)

    let nanoseconds = Benchmark().run[closure]()
    print("Nanoseconds:", nanoseconds)
    print("Seconds:", Float64(nanoseconds) / 1e9)

bench()

Nanoseconds: 50322420
Seconds: 0.05032242

Define iterative version for comparison:

fn fib_iterative(n: Int) -> Int:
    var count = 0
    var n1 = 0
    var n2 = 1

    while count < n:
       let nth = n1 + n2
       n1 = n2
       n2 = nth
       count += 1
    return n1

fn bench_iterative():
    fn iterative_closure():
        for i in range(n):
            _ = fib_iterative(i)

    let iterative = Benchmark().run[iterative_closure]()
    print("Nanoseconds iterative:", iterative)

bench_iterative()

Nanoseconds iterative: 0

Notice that the compiler has optimized away everything, LLVM can change an iterative loop to a constant value if all the inputs are known at compile time through constant folding, or if the value isn't actually used for anything with Dead Code Elimination both of which could be occurring here.

There is a lot going on under the hood, and so you should always test your assumptions with benchmarks, especially if you're adding complexity because you think it will improve performance, which often isn't the case.

Max iters

Set max iterations and a 1s max total duration

from Time import sleep

fn bench_args():
    fn sleeper():
        print("sleeping 300,000ns")
        sleep(3e-4)
    
    print("0 warmup iters, 4 max iters, 0ns min time, 1_000_000_000ns max time")
    let nanoseconds = Benchmark(0, 5, 0, 1_000_000_000).run[sleeper]()
    print("average time", nanoseconds)

bench_args()

0 warmup iters, 4 max iters, 0ns min time, 1_000_000_000ns max time
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
average time 363769

Note there is some extra logic inside Benchmark to help improve accuracy, so here it actually runs 6 iterations

Max Duration

Limit the max running time, so it will never run over 0.001 seconds and will not hit the max iters of 5:

fn bench_args_2():
    fn sleeper():
        print("sleeping 300,000ns")
        sleep(3e-4)
    
    print("\n0 warmup iters, 5 max iters, 0 min time, 1_000_000ns max time")
    let nanoseconds = Benchmark(0, 5, 0, 1_000_000).run[sleeper]()
    print("average time", nanoseconds)

bench_args_2()

0 warmup iters, 5 max iters, 0 min time, 1_000_000ns max time
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
average time 364582

Min Duration

Try with a minimum of 3 million nanoseconds, so it ignores the max iterations and runs 5 normal runs:

fn bench_args():
    fn sleeper():
        print("sleeping 300,000ns")
        sleep(3e-4)

    let nanoseconds = Benchmark(0, 2, 1_500_000, 1_000_000_000).run[sleeper]()
    print("average time", nanoseconds)

bench_args()

sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
average time 366545

Warmup

You should always have some warmup iterations, there is some extra logic for more accurate results so it won't run exactly what you specify:

fn bench_args():
    fn sleeper():
        print("sleeping 300,000ns")
        sleep(3e-4)

    let nanoseconds = Benchmark(1, 2, 0, 1_000_000_000).run[sleeper]()
    print("average time", nanoseconds)

bench_args()

sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
sleeping 300,000ns
average time 364094