Filter out invisible unicode characters from text segments by JiuqingSong · Pull Request #3344 · microsoft/roosterjs

JiuqingSong · 2026-05-19T20:27:42Z

Summary

Strip invisible Unicode tag characters (U+E0000–U+EFFFF) inside createText so they cannot survive paste/DOM-to-model conversion. These characters are used to hide instructions/text inside HTML (see https://embracethered.com/blog/posts/2024/hiding-and-finding-text-with-unicode-tags/) and otherwise leak into the model as normal text.
Meaningful invisible characters that fall outside that range (e.g. ZWSP U+200B, ZWJ U+200D, RLO U+202E, PDF U+202C) are preserved.
Unit tests in creatorsTest.ts cover mixed/boundary/only-invisible inputs and confirm meaningful invisible chars are untouched. An end-to-end test in endToEndTest.ts verifies a full DOM → Model → DOM/text round-trip strips only the tag range.

Test plan

yarn test:fast --testPathPattern=creatorsTest
yarn test:fast --testPathPattern=endToEndTest

🤖 Generated with Claude Code

BryanValverdeU · 2026-05-20T21:09:43Z

+ * @returns The string with invisible unicode characters removed
+ */
+function stripInvisibleUnicode(value: string): string {
+    return value.replace(INVISIBLE_UNICODE_REGEX, '');


I remember this was originally limited to the content was inserted as initial content in the editor and for Links. Do we have any perf concerns with applying the regex to all created text on every call?

This is a valid concern.

I investigated the original security issue, and realize that the attack can happen from any kind of source, as long as the content is put into editor. So manual operations (new editor, paste), or 3rd party code (call formatContentModel() can both trigger the result. Of cause the manual operation is easier to do.

What do you think? Should we limit the check to manual operation only? I'm open to any suggestion.

@romanisa fyi.

Filter out invisible unicode characters

ac0f0c4

BryanValverdeU reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter out invisible unicode characters from text segments#3344

Filter out invisible unicode characters from text segments#3344
JiuqingSong wants to merge 1 commit into
masterfrom
u/jisong/filterinvisibleunicode

JiuqingSong commented May 19, 2026

Uh oh!

BryanValverdeU May 20, 2026

Uh oh!

JiuqingSong May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JiuqingSong commented May 19, 2026

Summary

Test plan

Uh oh!

BryanValverdeU May 20, 2026

Choose a reason for hiding this comment

Uh oh!

JiuqingSong May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants