Skip to content

Fix Uppercase/Lowercase intrinsic types to apply Unicode special case mappings#3930

Draft
Copilot wants to merge 5 commits into
mainfrom
copilot/fix-uppercase-lowercase-mapping
Draft

Fix Uppercase/Lowercase intrinsic types to apply Unicode special case mappings#3930
Copilot wants to merge 5 commits into
mainfrom
copilot/fix-uppercase-lowercase-mapping

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 16, 2026

Go's strings.ToUpper/strings.ToLower use simple Unicode case mapping (1 rune → 1 rune), while JavaScript's toUpperCase()/toLowerCase() use full case mapping from Unicode SpecialCasing.txt (1 char → potentially multiple chars). This caused intrinsic string types to produce wrong results for ~103 code points.

type T = Uppercase<"ß">;      // tsgo: "ß" (wrong), tsc: "SS" (correct)
type U = Lowercase<"İ">;      // tsgo: "i" (wrong), tsc: "i̇" (correct)
type V = Uppercase<"straße">; // tsgo: "STRAßE" (wrong), tsc: "STRASSE" (correct)
  • Added internal/checker/stringcase.go with toUpperCase/toLowerCase and first-rune variants using golang.org/x/text/cases with the language.Und locale, which provides full Unicode case mapping matching JavaScript's behavior and stays up to date automatically via the x/text module
  • ASCII strings use a fast path via strings.ToUpper/strings.ToLower directly, avoiding the x/text/cases overhead for the common case
  • The x/text/cases casers are lazily initialized via sync.OnceValue so Unicode tables are only loaded when non-ASCII input is actually encountered
  • Updated applyStringMapping to use these instead of strings.ToUpper/strings.ToLower
  • Added compiler test covering ß, İ, ligatures (, , ), Capitalize, Uncapitalize, and mixed strings

Copilot AI and others added 2 commits May 16, 2026 20:20
… mappings

Implement JavaScript-compatible full Unicode case mapping for intrinsic
string types (Uppercase, Lowercase, Capitalize, Uncapitalize). Go's
strings.ToUpper/ToLower use simple case mapping (1:1), while JavaScript's
toUpperCase()/toLowerCase() use full Unicode case mapping from
SpecialCasing.txt where a single character can map to multiple characters.

Key examples fixed:
- Uppercase<"ß"> now correctly resolves to "SS" (not "ß")
- Lowercase<"İ"> now correctly resolves to "i̇" (not "i")
- Uppercase<"fi"> now correctly resolves to "FI" (not "fi")

Fixes #3489

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/7ac37be7-07a1-49c6-be87-36aa6ffa208f

Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix Unicode mapping for Uppercase/Lowercase intrinsic types Fix Uppercase/Lowercase intrinsic types to apply Unicode special case mappings May 16, 2026
Copilot AI requested a review from jakebailey May 16, 2026 20:32
Comment thread internal/checker/stringcase.go Outdated
Comment on lines +92 to +95
// upperSpecialCasings contains unconditional special case mappings for toUpperCase
// from Unicode SpecialCasing.txt. These are cases where a single code point maps
// to multiple code points when uppercased, matching JavaScript's behavior.
var upperSpecialCasings = map[rune]string{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Where did you get this, and how can we be sure to keep this up to date? Is this not something that is exposed from the unicode package?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original table was manually extracted from Unicode's SpecialCasing.txt. You're right that maintaining it by hand is fragile. golang.org/x/text/cases (already a project dependency) handles this — cases.Upper(language.Und) / cases.Lower(language.Und) performs full Unicode case mapping matching JS behavior, and stays current via the x/text module. Refactored in e3ee094.

…-rolled table

Replace the manually maintained Unicode SpecialCasing.txt table with
golang.org/x/text/cases package (already a project dependency), which
provides full Unicode case mapping that matches JavaScript's behavior
and stays up to date automatically via the x/text module.

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/47f91f8b-f932-4212-8285-794b2ee100e8

Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Copilot AI requested a review from jakebailey May 16, 2026 21:04
Comment thread internal/checker/stringcase.go Outdated
Comment on lines +19 to +20
upperCaser = cases.Upper(language.Und)
lowerCaser = cases.Lower(language.Und)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot We should avoid this as much as possible; special case ASCII so we aren't always going through this, and make loading these lazy via sync.OnceValue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added ASCII fast paths for all four functions — ASCII-only strings use strings.ToUpper/strings.ToLower directly (or inline byte manipulation for first-rune variants). The cases.Upper/cases.Lower casers are now initialized lazily via sync.OnceValue so the x/text tables are only loaded if non-ASCII input is actually encountered. Done in 53de6b0.

ASCII strings use strings.ToUpper/ToLower directly, avoiding x/text/cases
overhead. The x/text casers are initialized lazily via sync.OnceValue so
tables are only loaded when non-ASCII input is encountered.

Agent-Logs-Url: https://github.com/microsoft/typescript-go/sessions/76d200c6-8d8e-4f2c-875d-f8712343859f

Co-authored-by: jakebailey <5341706+jakebailey@users.noreply.github.com>
Copilot AI requested a review from jakebailey May 16, 2026 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tsgo's Uppercase/Lowercase intrinsic types don't apply Unicode special case mappings

2 participants