Make str.format more Pythonic#3412
Make str.format more Pythonic#3412chrisnovakovic wants to merge 2 commits intothought-machine:masterfrom
str.format more Pythonic#3412Conversation
`str.format` has a number of surprising behaviours that aren't Pythonic:
* Numerical replacement fields use a different syntax to named
replacement fields (i.e. `${0}`).
* Automatic and manual replacement field names can be used within the
same format string.
* Replacement field names are passed through if the corresponding
keyword argument is undefined.
* Format strings are allowed to contain unbalanced delimiters.
Some of these were introduced in thought-machine#3146, along with some regressions that
quietly broke substitution and delimiter escaping (see thought-machine#3356).
Make Please's `str.format` implementation more Pythonic by aligning it
with [Bazel's](https://bazel.build/rules/lib/core/string#format):
* Permit the use of by-order, by-position or by-name replacement field
referencing, but do not allow them to be combined in the same format
string.
* Parse all occurrences of `{` and `}` in the format string as if they
are delimiters, except for those escaped via `{{` and `}}` - do not
pass them through to the return value, and raise errors if they appear
inappropriately.
* Perform more robust checks on replacement field references.
The new error messages are intentionally reminiscent of those returned
by Python's `str.format`, and are identical in some cases.
The correctness and robustness of the new implementation comes at the
cost of performance - it is hard to compare them fairly against the
degenerate case in `src/parse/asp/builtins_bench_test.go`, but in the
typical case, the new implementation is about 75% slower and performs
four times more allocations:
```
cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12 5117605 257.0 ns/op 80 B/op 2 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12 512914 2474 ns/op 7568 B/op 3 allocs/op
PASS
cpu: Intel(R) Xeon(R) E-2246G CPU @ 3.60GHz
BenchmarkStrFormat
BenchmarkStrFormat-12 3261982 454.1 ns/op 176 B/op 8 allocs/op
BenchmarkStrFormatBig
BenchmarkStrFormatBig-12 358680 9921 ns/op 7712 B/op 12 allocs/op
PASS
```
| self = self[start+2:] | ||
| continue | ||
| } | ||
| // We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter. |
There was a problem hiding this comment.
Is "field name" the right word here (given it may be {} or {1})? Looks like"replacement field" is the terminology used by python
| return newPyInt(strings.LastIndex(string(self), string(needle))) | ||
| } | ||
|
|
||
| // strFormat implements the str.format function. It interpolates a format string using the arguments passed to |
There was a problem hiding this comment.
Just to check, is this used both for f-strings and explicit calls to .format()?
| buf.WriteString(args[arg].String()) | ||
| arg++ | ||
| } else if val, present := s.locals[key]; present { | ||
| // Extract the replacement field's name, excluding the delimiters. |
There was a problem hiding this comment.
| // Extract the replacement field's name, excluding the delimiters. | |
| // Extract the replacement field's name or index if present, excluding the delimiters. |
| // be more of them than there are positional arguments (although there may ultimately be fewer). Internally, the | ||
| // first positional argument is the format string - the positional arguments from the caller's perspective are | ||
| // shifted one to the right in args. | ||
| s.Assert(fieldName == "", "cannot switch from automatic field numbering to manual field specification") |
There was a problem hiding this comment.
I am concerned that this error message will be very unclear to somebody who is unfamiliar with how this is implemented - I think a better phrasing would be something like "cannot mix replacement field specification types in one format string", and possibly include examples of the mismatched replacement fields
| case strFormatByName: | ||
| // With named replacement fields, output the string value of the keyword argument with the given name. | ||
| s.Assert(fieldName != "", "must use named replacement fields with keyword arguments") | ||
| val, exists := s.locals[fieldName] |
There was a problem hiding this comment.
are keyword arguments implemented as locals?
| s, err := parseString(fmt.Sprintf(`ret = "%s".format(%s)`, test.FormatStr, test.Args)) | ||
| if test.Error == "" { | ||
| assert.NotNil(t, s) | ||
| assert.NoError(t, err) |
There was a problem hiding this comment.
nit: I think we should assert.NoError first
| "Dangling opening delimiter at start of string": { | ||
| FormatStr: `{{} {} {}`, | ||
| Args: `"one", "two", "three"`, | ||
| Error: "single '}' encountered in format string", |
There was a problem hiding this comment.
Can we include the position in this error?
| // Find the corresponding "}" delimiter... | ||
| end := strings.IndexByte(self[start+1:], '}') | ||
| // ...and if there isn't one, this must be a malformed format string. | ||
| s.Assert(end != -1, "single '{' encountered in format string") |
There was a problem hiding this comment.
| s.Assert(end != -1, "single '{' encountered in format string") | |
| s.Assert(end != -1, "unmatched and unescaped '{' encountered in format string") |
| continue | ||
| } | ||
| // We now know we're about to begin parsing a field name, which means this character must be the "{" delimiter. | ||
| s.Assert(self[start] == '{', "single '}' encountered in format string") |
There was a problem hiding this comment.
| s.Assert(self[start] == '{', "single '}' encountered in format string") | |
| s.Assert(self[start] == '{', "unmatched and unescaped '}' encountered in format string") |
| "Numerical field name": { | ||
| FormatStr: `1={one} 2={2} 0={zero}`, | ||
| Args: `zero="a", one="b", two="c"`, | ||
| Error: "unspecified keyword argument '2'", |
There was a problem hiding this comment.
It'd be good to special-case this to get the more meaningful error about mixing format types
str.formathas a number of surprising behaviours that aren't Pythonic:${0}).Some of these were introduced in #3146, along with some regressions that quietly broke substitution and delimiter escaping (see #3356).
Make Please's
str.formatimplementation more Pythonic by aligning it with Bazel's:{and}in the format string as if they are delimiters, except for those escaped via{{and}}- do not pass them through to the return value, and raise errors if they appear inappropriately.The new error messages are intentionally reminiscent of those returned by Python's
str.format, and are identical in some cases.The correctness and robustness of the new implementation comes at the cost of performance - it is hard to compare them fairly against the degenerate case in
src/parse/asp/builtins_bench_test.go, but in the typical case, the new implementation is about 75% slower and performs four times more allocations: