BLOGRESUMEPORTFOLIOGALLERYinit.elGitHubYouTube

libCat Arithmetic Wrappers
November 25, 2022
1,887 words

 For very nearly a year now, I've been exploring new ways that I can use C++ through libCat. This article doesn't discuss what libCat does, which is a lot. I recently held my second lecture on libCat covering some of that. tl;dw it's a non-POSIX runtime for Linux using modern C++ to provide faster compile times than an STL, safer APIs than POSIX or libC, and more explicit (and potentially performant) memory and error handling, among many other features. It is so unusual that it is fundamentally incompatible with the STL or libC (there is no malloc() or pthreads), so it presents an opportunity to rethink everything.

 One of the first ideas I had was type-safe arithmetic wrappers, which have been explored in C++ before, but mine would be used all the way down to Linux syscalls and Vulkan Loader calls. Many ideas in libCat are not strictly new, but the existing libraries for arithmetic, allocation, error handling, etc. are hard to use when they are merely layered on top of decades' worth of C and C++ code. Also, while other type-safe arithmetic libraries exist, such as type-safe and stronk, both very cool, none that I can find so far do everything I want and how I want to do it.

 Explicit arithmetic overflow semantics have been explored to some extent in C++. For the unaware, when an integer overflows or underflows, the behavior to expect isn't obvious. There are broadly four different semantics you might want:

  • Undefined overflow, such as in Carbon or Zig, which allows a compiler to assume overflow does not exist, and make aggressive optimizations.

  • Wrapping overflow, such as in unsigned integers in C. This is the most obvious behavior for unsigned or two's complement binary arithmetic.

  • Saturating overflow, such as the +| operator in Zig. Overflow saturates to the upper or lower bounds that an integer can represent. This was proposed for C++ in p0543.

  • Trapping overflow, such as in Rust (unless compiled for releases, in which case it wraps). This causes a program to crash or otherwise error when arithmetic overflow occurs. You can get this in C++ using -ftrapv, or with ubsan (including unsigned integers using -fsanitize=unsigned-integer-overflow).

 p0543 suggested adding free functions for saturating arithmetic. That paper targets both C and C++, and argues that type-safe arithmetic seems "slightly over-engineered." More recently, the original author of type-safe argues that additional arithmetic types are "strictly more powerful". It is easy to imagine a library implementing arithmetic types in terms of free functions, providing both APIs where they are useful.

 I for one am dissatisfied by either solution. In the Zig language, arithmetic and bitwise operators have suffixes for their overflow semantics. 255 + 1 has undefined behavior, but 255 +% 1 has wrapping semantics, and 255 +| 1 has saturating semantics. I love this syntax (Swift uses &+ for its wrapping, which is also nice). It seems unlikely to me that C++ will have such syntax any time soon. The only fairly recent mention of this that I have found is in p0907, which defined signed integers as two's complement in C++20, and mentioned explicit overflow operators as "out of scope" for that paper.

 The most convenient syntax I can think of for this in C++ is member-access. Imagine this:

// 4 byte signed integer holding 2'147'483'647
int4 num = int4_max;

// Add 100 with saturating overflow semantics.
(num.sat + 100) == int4_max;

// Add 100 with wrapping overflow semantics.
(num.wrap + 100) == int4_min + 99;

// Add 100 with undefined overflow semantics.
num + 100;

 This is the syntax that is available right now in libCat. You might notice int4 instead of int32_t or i32 or something familiar. Personally, when writing SIMD code in libCat, I found mixing bits and lanes to be very confusing, and since bytes is the quantity I almost always care about, my arithmetic types are named int4, uint8, float4, int4x8 (a SIMD vector), etc. I'm not going to discuss the SIMD API in this article.

 There are also free functions and types with explicit overflow. Types with explicit overflow also support the member-access overflow syntax, so you might write:

// 4 byte signed integer holding 2'147'483'647
wrap_int4 num = int4_max;

// Add 100 with wrapping overflow semantics.
(num + 100) == int4_min + 99;

// Add 100 with saturating overflow semantics.
(num.sat + 100) == int4_max;

 These types are all trivial classes, and they are usable as non-type template parameters. My unsigned integers have undefined overflow by default, as well, like signed integers. You can also access the underlying C type through a .raw member on each of these. The members are all in a union, so they share the same storage and are perfectly interchangeable.

 For reasons unknown to me, p0593 declared that all members of a union shall not be implicit lifetime types, so this is undefined behavior, but GCC explicitly permits type-punning unions outside of a constexpr context, and given the free functions, explicit overflow types, and static_casts, it is technically possible to control overflow without my member access syntax, but I will be using it in future application-level code. There are many other reasons why GCC currently seems to be the only servicable compiler for libCat, although I believe Clang also permits this particular design.

 In my research, I have continually encountered this article about branchless saturating arithmetic. I attempted to implement this in libCat using inline assembly, but I got stuck on its sat_adds64b() function, because that involves an 8-byte immediate value. As far as I can tell, GAS and GCC inline assembly cannot use larger than 4-byte immediate values, so this simply wasn't possible. I'm not torn up about it, though, because while examining my code in Compiler Explorer, I became skeptical that this extremely popular article is offering the best advice. To be fair, it does not claim to teach a perfect implementation.

 GCC and Clang have several intrinsics for tolerating arithmetic overflow. It turns out that they optimize very nicely for saturating overflow! Now, I am not an x86 wizard, but take a look for yourself: godbolt.com

 I have a feeling that the GCC optimized code is superior.

int sat_mul(int, int):
        mov    edi,         eax
        xor    esi,         edi
        shr    $0x1f,       edi
        add    $0x7fffffff, edi
        imul   esi
        cmovc  edi,         eax
        retq

int gcc_sat_mul(int, int):
        imul    edi, esi
        mov     eax, 2147483647
        cmovno  eax, edi
        ret

 I have not checked exhaustively, but many of the signed arithmetic routines seem to compare similarly. Also interesting, GCC could not constant fold the constexpr call to std::numeric_limits::max(). I had to add a [[unlikely]] attribute to make it branchless, unless I made a consteval call for the maximum size instead. In libCat, the analogous function is consteval because, by my convention, anything that can be is. Clang actually has the opposite effect, which optimizes the code correctly, and [[unlikely]] ironically makes this not branchless! Portable performance seems tricky. Using a ternary operator instead of an if statement appears to have no effect on these under -O0 or -O3.

 One C++ library that provides arithmetic wrappers and free functions for saturating arithmetic is saturating. It works like this: functions.hpp.

 Whoa.

 I don't want to critique another cat's templates. Some things are sacred, man! But I'm having a hard time comprehending this. One interesting feature is that it supports saturating floating-point arithmetic. That concept sounds generally nonsense, but here it's a bit of a misnomer. This library actually supports automatically normalizing floating-point arithmetic, i.e., after every operation, the value holds between the range of -1.0 and 1.0. This seems similar to HLSL's snorm and unorm floating-point qualifiers. libCat currently does not have such a data type.

 x86-64 also has instructions for saturating SIMD, but they are only used for small integer arithmetic.

 Anyways, a caveat to __builtin_add_overflow() is that its two arguments must be the same type, and they must be primitive integral types. So libCat's type-safe arithmetic must be unwrapped before passed into the intrinsic, and arguments must be casted to the same type. This is straightforward to implement in a library so long as you understand std::conditional-based metaprogramming, type deduction, type erasure, concepts, and atomic constraints. Purely trivial stuff, am I right?

// Add two integers with saturating overflow.
template <is_raw_integral T>
[[nodiscard, gnu::optimize(2)]] constexpr auto sat_add(T lhs, T rhs) -> T {
    T sum;
    bool overflow = __builtin_add_overflow(lhs, rhs, &sum);
    if (overflow) {
        return Limits<T>::max();
    }
    return sum;
}

// Erase redundant type information for `sat_add()`.
template <is_integral T, is_integral U>
    requires(is_safe_comparison<T, U>)
[[nodiscard]] constexpr auto sat_add(T lhs, U rhs)
    -> detail::PromotedArithmetic<T, U> {
    // Saturating arithmetic inputs must be raw and cast to the same size.
    using Type = ToRawArithmetic<Larger<T, U>;tg&;
    return sat_add(static_cast<Type>(lhs), static_cast<Type>(rhs));
}

 I also have type-safe intptr and uintptr wrappers. These are kind of like intptr_t in libC, except that their type remembers which type it came from, and it can only cast back into its original type, not any other sort of pointer. This prevents a kind of undefined behavior that is very easy to write in C, and I've been bitten by weird optimizations around reinterpret_cast before, myself.

 These are considered arithmetic types for the purposes of libCat concept programming, but they have a pawful of limitations compared to normal arithmetic, including they do not support member-access overflow syntax. It would be too complicated to implement that, which I do not believe I will ever use, but when GCC supports "deducing this", it should be very simple to streamline away around 200 lines from my intptr implementation and almost automatically get that overflow syntax for free.

 Part of what I love about concepts is that they make expressing generic constraints very elegant. libCat does not have separate containers for floating point and integral wrappers, as many other libraries do. They are mostly similar types, but floats just have some extra constraints.

 For defining a constraint on the safe comparison or conversion between two arithmetic types, I evaluate if a left-hand type is the same size or larger than an operand, if they have the same signed-ness (i.e. no adding unsigned values to signed integers or vice versa), and the same float-ness (no adding ints to floats or vice-versa). For that, I have:

template <is_arithmetic From, is_arithmetic To>
inline constexpr bool is_safe_comparison =                                
    // `ToRawArithmetic` is needed here to prevent a recursive constraint    
    // in evaluating the `<` operator.                                    
    ((is_signed<ToRawArithmetic<From>> == is_signed<ToRawArithmetic<To>>)) &&   
    (is_floating_point<From> == is_floating_point<To>);                   
                                                                          
// Any arithmetic can safely convert to a type larger than itself, but not   
// smaller than itself.                                                   
template <is_arithmetic From, is_arithmetic To>                           
inline constexpr bool is_safe_conversion = (sizeof(From) <= sizeof(To)) &&
                                           (is_safe_comparison<From, To>);

 Expressing type promotion between integers and floats simultaneously isn't much more complicated. Adding an int4 to an intptr should produce an intptr, since that type holds the most information. Otherwise, I just select the larger type, and default to left-hand-side. No other considerations are necessary because the previously mentioned constraints prevent other possibilities.

template <typename T, typename U>
consteval auto promoted_arithmetic() {
    if constexpr (is_arithmetic_ptr<T>) {
        return T{};
    } else if constexpr (is_arithmetic_ptr<U>) {
        return U{};
    } else {
        if constexpr (sizeof(T) >= sizeof(U)) {
            return Arithmetic<ToRawArithmetic<T>, OverflowPolicies::undefined>{}
        } else {
            return Arithmetic<ToRawArithmetic<U>, OverflowPolicies::undefined>{}
        }
    }
}

template <typename T, typename U>
    requires((is_signed<T> == is_signed<U>) && (is_floating_point<T> ==
                                                is_floating_point<U>))
using PromotedArithmetic = decltype(promoted_arithmetic<T, U>());

 One cool feature in GCC and Clang is you can get explicit-width primitive types without libC. Instead of std::int32_t, there is an automatically compiler-defined macro __INT32_TYPE__ which removes all guesswork. This type is set by a compiler, not by a library, as Celestia intended. I use these macros to define the underlying type of all my arithmetic aliases, such as:

using int4 = Arithmetic<__INT32_TYPE__, OverflowPolicies::undefined>;

 There are many other templates involved, but those are the most interesting ones, in my opinion.

 So yeah, those are my ideas for safe and ergonomic arithmetic in C++. I'm not married to anything in libCat, and welcome any feedback, especially comparisons to prior art that I haven't encountered yet.