From 00f4ead2203600e43ec5ae756d93c2042afe96dd Mon Sep 17 00:00:00 2001 From: Adesh Gupta Date: Mon, 11 May 2026 16:03:14 +0530 Subject: [PATCH] <__msvc_int128.hpp>: use __umulh on ARM64/ARM64EC (#6184) Adds an ARM64/ARM64EC fast path to _Base128::_UMul128 that uses the __umulh intrinsic for the high 64 bits and a plain 64-bit multiply for the low 64 bits, in place of the Knuth-base-2^32 fallback. Microbench on Snapdragon X Elite (5M random uint64 pairs * 5 reps): Knuth fallback : ~82 ms (~3.27 ns/op) __umulh path : ~27 ms (~1.08 ns/op) Speedup : ~3.03x Disassembly collapses from ~30 ops (incl. /GS cookie push) to 4 ops (umulh / mul / str / ret). _STL_128_INTRINSICS is intentionally not enabled for ARM64; that macro also gates _addcarry_u64, _subborrow_u64, __shiftleft128, __shiftright128, and _udiv128/_div128, which have no direct single-instruction ARM64 equivalents and are out of scope for this change. Per the issue author, x64 is intentionally not modified -- _umul128 remains preferable there. --- stl/inc/__msvc_int128.hpp | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/stl/inc/__msvc_int128.hpp b/stl/inc/__msvc_int128.hpp index e7d9650ff8..99c78af752 100644 --- a/stl/inc/__msvc_int128.hpp +++ b/stl/inc/__msvc_int128.hpp @@ -158,7 +158,12 @@ struct alignas(16) _Base128 { if (!_Is_constant_evaluated()) { return _umul128(_Left, _Right, &_High_result); } -#endif // _STL_128_INTRINSICS +#elif (defined(_M_ARM64) || defined(_M_ARM64EC)) && !defined(_M_CEE_PURE) + if (!_Is_constant_evaluated()) { + _High_result = __umulh(_Left, _Right); + return _Left * _Right; + } +#endif // ^^^ (defined(_M_ARM64) || defined(_M_ARM64EC)) && !defined(_M_CEE_PURE) ^^^ const uint32_t __u[2] = { static_cast(_Left),