I tried this code:
#![no_std]
fn edit(d: &mut [u8]) {
d[40] = 1;
d[80] = 1;
d[120] = 1;
d[160] = 1;
}
// Returning the array is fine
pub fn array_good() -> [u8; 200] {
let mut d = [0u8; 200];
edit(&mut d);
d
}
// Setting the bytes within the option is also fine
pub fn option_good() -> Option<[u8; 200]> {
let mut o = Some([0u8; 200]);
if let Some(ref mut d) = o {
edit(d);
}
o
}
// When returning an initialized array in an Option,
// the initialization gets split into multiple separate memset/memclr calls,
// just to optimize away a few redundant byte clears.
pub fn option_bad() -> Option<[u8; 200]> {
Some(array_good())
}
On godbolt: https://godbolt.org/z/e8z3ornos
I expected to see this happen: In all three cases I expect roughly similar code with a single call to memclr followed by four stores.
Instead, this happened: The first implementation splits the initialization into five different calls to memclr. This is unnecessary and inefficient.
example::array_good:
push {r4, r6, r7, lr}
add r7, sp, #8
movs r1, #200
mov r4, r0
bl __aeabi_memclr
movs r0, #1
strb.w r0, [r4, #160]
strb.w r0, [r4, #120]
strb.w r0, [r4, #80]
strb.w r0, [r4, #40]
pop {r4, r6, r7, pc}
example::option_good:
push {r4, r6, r7, lr}
add r7, sp, #8
mov r4, r0
adds r0, #1
movs r1, #200
bl __aeabi_memclr
movs r0, #1
strb.w r0, [r4, #161]
strb.w r0, [r4, #121]
strb.w r0, [r4, #81]
strb.w r0, [r4, #41]
strb r0, [r4]
pop {r4, r6, r7, pc}
example::option_bad:
push {r4, r6, r7, lr}
add r7, sp, #8
mov r4, r0
adds r0, #1
movs r1, #40
bl __aeabi_memclr
add.w r0, r4, #42
movs r1, #39
bl __aeabi_memclr
add.w r0, r4, #82
movs r1, #39
bl __aeabi_memclr
add.w r0, r4, #122
movs r1, #39
bl __aeabi_memclr
add.w r0, r4, #162
movs r1, #39
bl __aeabi_memclr
movs r0, #1
strb.w r0, [r4, #161]
strb.w r0, [r4, #121]
strb.w r0, [r4, #81]
strb.w r0, [r4, #41]
strb r0, [r4]
pop {r4, r6, r7, pc}
The behavior is not target specific. The behavior is the same on x86, for larger arrays and initialized segments it will also call memset.
-C opt-level=1 doesn't inline and thus suppresses the issue.
Meta
rustc --version --verbose:
rustc 1.53.0-nightly (b84932674 2021-04-21)
This behavior is present between 1.45.0 and current nightly. Before 1.45.0 all three implementations split the initialization.
I noticed this when looking at #83022 (comment)
I tried this code:
On godbolt: https://godbolt.org/z/e8z3ornos
I expected to see this happen: In all three cases I expect roughly similar code with a single call to
memclrfollowed by four stores.Instead, this happened: The first implementation splits the initialization into five different calls to
memclr. This is unnecessary and inefficient.The behavior is not target specific. The behavior is the same on x86, for larger arrays and initialized segments it will also call
memset.-C opt-level=1doesn't inline and thus suppresses the issue.Meta
rustc --version --verbose:This behavior is present between 1.45.0 and current nightly. Before 1.45.0 all three implementations split the initialization.
I noticed this when looking at #83022 (comment)