Performance tuning

When compiled with -prod, V's generated C code usually performs well. However, in specialized scenarios, additional compiler flags and attributes can further optimize the executable for performance, memory usage, or size.

[!NOTE] These are rarely needed, and should not be used unless you profile your code, and then see that there are significant benefits for them. To cite GCC's documentation: "Programmers are notoriously bad at predicting how their programs actually perform".

Tuning Operation	Benefits	Drawbacks
`@[inline]`	Performance	Increased executable size
`@[direct_array_access]`	Performance	Safety risks
`@[packed]`	Memory usage	Potential performance loss
`@[minify]`	Performance, Memory usage	May break binary serialization/reflection
`_likely_/_unlikely_`	Performance	Risk of negative performance impact
`-fast-math`	Performance	Risk of incorrect mathematical operations results
`-d no_segfault_handler`	Compile time, Size	Loss of segfault trace
`-cflags -march=native`	Performance	Risk of reduced CPU compatibility
`-compress`	Size	Harder to debug, extra dependency `upx`
`PGO`	Performance, Size	Usage complexity

Tuning operations details

`@[inline]`

You can tag functions with @[inline], so the C compiler will try to inline them, which in some cases, may be beneficial for performance, but may impact the size of your executable.

When to Use

Functions that are called frequently in performance-critical loops.

When to Avoid

Large functions, as it might cause code bloat and actually decrease performance.
Large functions in if expressions - may have negative impact on instructions cache.

`@[direct_array_access]`

In functions tagged with @[direct_array_access] the compiler will translate array operations directly into C array operations - omitting bounds checking. This may save a lot of time in a function that iterates over an array but at the cost of making the function unsafe - unless the boundaries will be checked by the user.

When to Use

In tight loops that access array elements, where bounds have been manually verified or you are sure that the access index will be valid.

When to Avoid

Everywhere else.

`@[packed]`

The @[packed] attribute can be applied to a structure to create an unaligned memory layout, which decreases the overall memory footprint of the structure. Using the @[packed] attribute may negatively impact performance or even be prohibited on certain CPU architectures.

When to Use

When memory usage is more critical than performance, e.g., in embedded systems.

When to Avoid

On CPU architectures that do not support unaligned memory access or when high-speed memory access is needed.

`@[aligned]`

The @[aligned] attribute can be applied to a structure or union to specify a minimum alignment (in bytes) for variables of that type. Using the @[aligned] attribute you can only increase the default alignment. Use @[packed] if you want to decrease it. The alignment of any struct or union, should be at least a perfect multiple of the lowest common multiple of the alignments of all of the members of the struct or union.

Example:

// Each u16 in the `data` field below, takes 2 bytes, and we have 3 of them = 6 bytes. // The smallest power of 2, bigger than 6 is 8, i.e. with `@[aligned]`, the alignment // for the entire struct U16s, will be 8: @[aligned] struct U16s { data [3]u16 }

When to Use

Only if the instances of your types, will be used in performance critical sections, or with specialised machine instructions, that do require a specific alignment to work.

When to Avoid

On CPU architectures, that do not support unaligned memory access. If you are not working on performance critical algorithms, you do not really need it, since the proper minimum alignment is CPU specific, and the compiler already usually will choose a good default for you.

[!NOTE] You can leave out the alignment factor, i.e. use just @[aligned], in which case the compiler will align a type to the maximum useful alignment for the target machine you are compiling for, i.e. the alignment will be the largest alignment which is ever used for any data type on the target machine. Doing this can often make copy operations more efficient, because the compiler can choose whatever instructions copy the biggest chunks of memory, when performing copies to or from the variables which have types that you have aligned this way.

`@[minify]`

The @[minify] attribute can be added to a struct, allowing the compiler to reorder the fields in a way that minimizes internal gaps while maintaining alignment. Using the @[minify] attribute may cause issues with binary serialization or reflection. Be mindful of these potential side effects when using this attribute.

When to Use

When you want to minimize memory usage and you're not using binary serialization or reflection.

When to Avoid

When using binary serialization or reflection, as it may cause unexpected behavior.

`_likely_/_unlikely_`

if _likely_(bool expression) { - hints to the C compiler, that the passed boolean expression is very likely to be true, so it can generate assembly code, with less chance of branch misprediction. In the JS backend, that does nothing.

if _unlikely_(bool expression) { is similar to _likely_(x), but it hints that the boolean expression is highly improbable. In the JS backend, that does nothing.

When to Use

In conditional statements where one branch is clearly more frequently executed than the other.

When to Avoid

When the prediction can be wrong, as it might cause a performance penalty due to branch misprediction.

When to Use

For production builds where you want to reduce the executable size and improve runtime performance.

When to Avoid

Where it doesn't work for you.

`-fast-math`

This flag enables optimizations that disregard strict compliance with the IEEE standard for floating-point arithmetic. While this could lead to faster code, it may produce incorrect or less accurate mathematical results.

The full specter of math operations that -fast-math affects can be found here.

When to Use

In applications where performance is more critical than precision, like certain graphics rendering tasks.

When to Avoid

In applications requiring strict mathematical accuracy, such as scientific simulations or financial calculations.

`-d no_segfault_handler`

Using this flag omits the segfault handler, reducing the executable size and potentially improving compile time. However, in the case of a segmentation fault, the output will not contain stack trace information, making debugging more challenging.

When to Use

In small, well-tested utilities where a stack trace is not essential for debugging.

When to Avoid

In large-scale, complex applications where robust debugging is required.

`-cflags -march=native`

This flag directs the C compiler to generate instructions optimized for the host CPU. This can improve performance but will produce an executable incompatible with other/older CPUs.

When to Use

When the software is intended to run only on the build machine or in a controlled environment with identical hardware.

When to Avoid

When distributing the software to users with potentially older CPUs.

`-compress`

This flag executes upx to compress the resultant executable, reducing its size by around 50%-70%. The executable will be uncompressed at runtime, so it will take a bit more time to start. It will also take extra RAM initially, as the compressed version of the app will be loaded into memory, and then expanded to another chunk of memory. Debugging such an application can be a bit harder, if you do not account for it. Some antivirus programs also use heuristics, that trigger more often for compressed applications.

When to Use

For really tiny environments, where the size of the executable on the file system, or when deploying is important (docker containers, rescue disks etc).

When to Avoid

When you need to debug the application
When the app's startup time is extremely important (where 1-2ms can be meaningful for you)
When you can not afford to allocate more memory during application startup
When you are deploying an app to users with antivirus software that could misidentify your app as malicious, just because it decompresses its code at runtime.

PGO (Profile-Guided Optimization)

PGO allows the compiler to optimize code based on its behavior during sample runs. This can improve performance and reduce the size of the output executable, but it adds complexity to the build process.

When to Use

For performance-critical applications where the added build complexity is justifiable.

When to Avoid

For small, short-lived, or rapidly-changing projects where the added build complexity isn't justified.

PGO with Clang

This is an example bash script you can use to optimize your CLI V program without user interactions. In most cases, you will need to change this script to make it suitable for your particular program.

#!/usr/bin/env bash

# Get the full path to the current directory
CUR_DIR=$(pwd)

# Remove existing PGO data
rm -f *.profraw
rm -f default.profdata

# Initial build with PGO instrumentation
v -cc clang -prod -cflags -fprofile-generate -o pgo_gen .

# Run the instrumented executable 10 times
for i in {1..10}; do
    ./pgo_gen
done

# Merge the collected data
llvm-profdata merge -o default.profdata *.profraw

# Compile the optimized version using the PGO data
v -cc clang -prod -cflags "-fprofile-use=${CUR_DIR}/default.profdata" -o optimized_program .

# Remove PGO data and instrumented executable
rm *.profraw
rm pgo_gen

Performance tuning

Tuning operations details #

@[inline] #

@[direct_array_access] #

@[packed] #

@[aligned] #

@[minify] #

_likely_/_unlikely_ #

-fast-math #

-d no_segfault_handler #

-cflags -march=native #

-compress #

PGO (Profile-Guided Optimization) #

Tuning operations details

`@[inline]`

`@[direct_array_access]`

`@[packed]`

`@[aligned]`

`@[minify]`

`_likely_/_unlikely_`

`-fast-math`

`-d no_segfault_handler`

`-cflags -march=native`

`-compress`

PGO (Profile-Guided Optimization)