src_float_to_short_array: Use division instead of shift#85
src_float_to_short_array: Use division instead of shift#85
Conversation
|
No it won't fix the issue and it is slower due to the integer division (and harder to understand then the solution proposed in #84 (comment)) The PowerPC problem is a broken An example: |
On modern CPUs with deep caches, branches are usually considered more expensive than division. If you can prove to me that my solution using division is slower than your solution, I will take your solution.
If PowerPC has a broken Meanwhile, I am still interested if this fixes @janstary's issue. |
Haven't we already proved that the output of
Speculative execution mitigates some of that. AND:
Prove is simple: LONG on most 64bit systems is 64bit: https://godbolt.org/z/HMv8iu |
|
|
I also tested on an arm machine (current OpenBSD/armv7 With the patch, it fails as |
That's not proof, that's opinion. Proof requires a proper benchmark.
You do realize that for CPUs where it works (eg at least x86 and x86_64) the "clipping optimization" results in zero branches, don't you? |
No, you are missing my point. For Now examine what the clipping optimization is and when it is enabled:
My claim: On most architectures the clipping optimization is disabled. Quick check: Do you have a "common" 64 bit Linux on a x86_64 architecture? Then please run Proof for "most architectures":
Summary:
|
1b3d5b1 to
c10c819
Compare
@janstary I wonder if this fixes your issue on PowerPC.