perf: optimize left function by eliminating double chars() iteration #19571

viirya · 2025-12-30T22:41:45Z

For negative n values, the function was calling string.chars() twice:

Once to count total characters
Again to take the prefix

This optimization collects chars into a reusable buffer once per row for the negative n case, eliminating the redundant iteration.

Benchmark results (negative n, which triggers the optimization):

size=1024: 71.323 µs → 52.760 µs (26.0% faster)
size=4096: 289.62 µs → 212.23 µs (26.7% faster)

Benchmark results (positive n, minimal overhead):

size=1024: 24.465 µs → 24.691 µs (0.9% slower)
size=4096: 96.129 µs → 97.078 µs (1.0% slower)

The dramatic improvement for negative n cases far outweighs the negligible overhead for positive n cases.

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

adriangb · 2025-12-31T03:06:13Z

Can you make a PR with just the benchmarks (which is easier to merge) so we can then compare main vs. changed?

viirya · 2026-01-02T07:40:04Z

Opened #19600.

martin-g · 2026-01-02T10:47:39Z

datafusion/functions/src/unicode/left.rs

+                    chars_buf.extend(string.chars());
+                    let len = chars_buf.len() as i64;
+
                    Some(if n.abs() < len {


minor improvement: calling .abs() on MIN will lead to problems (crash in dev and wrapping in release).
i64::MIN is a corner case, so I guess no one will complain that it is not supported here.
I also tried with if n + len > 0 but then it was 3% slower on my machine.

Suggested change

Some(if n.abs() < len {

Some(if n != i64::MIN && n.abs() < len {

For negative n values, the function was calling string.chars() twice: 1. Once to count total characters 2. Again to take the prefix This optimization collects chars into a reusable buffer once per row for the negative n case, eliminating the redundant iteration. Benchmark results (negative n, which triggers the optimization): - size=1024: 71.323 µs → 52.760 µs (26.0% faster) - size=4096: 289.62 µs → 212.23 µs (26.7% faster) Benchmark results (positive n, minimal overhead): - size=1024: 24.465 µs → 24.691 µs (0.9% slower) - size=4096: 96.129 µs → 97.078 µs (1.0% slower) The dramatic improvement for negative n cases far outweighs the negligible overhead for positive n cases.

github-actions bot added the functions Changes to functions implementation label Dec 30, 2025

viirya force-pushed the left_optimize branch from 413e195 to 08f3cdb Compare December 31, 2025 00:32

viirya requested a review from andygrove December 31, 2025 00:50

martin-g reviewed Jan 2, 2026

View reviewed changes

viirya force-pushed the left_optimize branch from 08f3cdb to 46833cb Compare January 2, 2026 18:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: optimize left function by eliminating double chars() iteration #19571

perf: optimize left function by eliminating double chars() iteration #19571

viirya commented Dec 30, 2025

Uh oh!

adriangb commented Dec 31, 2025

Uh oh!

viirya commented Jan 2, 2026

Uh oh!

martin-g Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Some(if n.abs() < len {
	Some(if n != i64::MIN && n.abs() < len {

perf: optimize left function by eliminating double chars() iteration #19571

Are you sure you want to change the base?

perf: optimize left function by eliminating double chars() iteration #19571

Conversation

viirya commented Dec 30, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

adriangb commented Dec 31, 2025

Uh oh!

viirya commented Jan 2, 2026

Uh oh!

martin-g Jan 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants