Buffer

The buffer doesn't own the underlying memory, it's a view over data that is owned by another object

from Buffer import Buffer
from DType import DType
from Pointer import DTypePointer

Allocate 8 uint8 and pass that pointer into the buffer:

let p = DTypePointer[DType.uint8].alloc(8)
let x = Buffer[8, DType.uint8](p)

zero

Zero all the valuees to make sure no garbage data is used:

x.zero()
print(x.simd_load[8](0))

[0, 0, 0, 0, 0, 0, 0, 0]

Get Item and Set Item

Loop through and set each item:

for i in range(len(x)):
    x[i] = i

print(x.simd_load[8](0))

[0, 1, 2, 3, 4, 5, 6, 7]

Copy Init

Copy the buffer x to y, change the dynamic size to 4, and multiply all the values by 10

var y = x
y.dynamic_size = 4

for i in range(y.dynamic_size):
    y[i] *= 10

Now print the values from the original buffer x, to show they point to the same data:

print(x.simd_load[8](0))

[0, 10, 20, 30, 4, 5, 6, 7]

simd_store

Utilize Single Instruction Mutliple Data by manipulating 32 bytes of data at the same time:

let first_half = x.simd_load[4](0) * 2
let second_half = x.simd_load[4](4) * 10

x.simd_store(0, first_half)
x.simd_store(4, second_half)

print(x.simd_load[8](0))

[0, 20, 40, 60, 40, 50, 60, 70]

simd_nt_store

nt is non-temporal

Skips the cache for memory that isn't going to be accessed soon, so if you have a large amount of data it doesn't fill up the cache and block something else that would benefit from quick access.

x.simd_nt_store(0, second_half)
print(x.simd_load[8](0))

[40, 50, 60, 70, 40, 50, 60, 70]

simd_fill

Store the value in the argument for chunks of the width provided in the parameter

x.simd_fill[8](10)
print(x.simd_load[8](0))

[10, 10, 10, 10, 10, 10, 10, 10]

stack_allocation

Returns a buffer with the data allocated to the stack

x.stack_allocation()
print(x.simd_load[8](0))

[10, 10, 10, 10, 10, 10, 10, 10]

bytecount

Count the total bytes

print(x.bytecount())

aligned_simd_store

Some registers work better with different alignments e.g. AVX-512 performs better with 64 bit alignment, so you might want padding for a type like a UInt32

x.aligned_simd_store[8, 8](0, 5)

aligned_simd_load

Some registers work better with different alignments e.g. AVX-512 performs better with 64 bit alignment, so you might want padding for a type like a UInt32

print(x.aligned_simd_load[8, 8](0))

[5, 5, 5, 5, 5, 5, 5, 5]

aligned_stack_allocation

Allocate to the stack with a given alignment for extra padding

x.aligned_stack_allocation[8]()

prefetch

Specifies hows soon until the data will be visited again and how the data will be used, to optimize for the cache

from Intrinsics import PrefetchOptions
x.prefetch[PrefetchOptions().for_read().high_locality()](0)