Filter out invisible unicode characters from text segments#3344
Open
JiuqingSong wants to merge 1 commit into
Open
Filter out invisible unicode characters from text segments#3344JiuqingSong wants to merge 1 commit into
JiuqingSong wants to merge 1 commit into
Conversation
| * @returns The string with invisible unicode characters removed | ||
| */ | ||
| function stripInvisibleUnicode(value: string): string { | ||
| return value.replace(INVISIBLE_UNICODE_REGEX, ''); |
Contributor
There was a problem hiding this comment.
I remember this was originally limited to the content was inserted as initial content in the editor and for Links. Do we have any perf concerns with applying the regex to all created text on every call?
Collaborator
Author
There was a problem hiding this comment.
This is a valid concern.
I investigated the original security issue, and realize that the attack can happen from any kind of source, as long as the content is put into editor. So manual operations (new editor, paste), or 3rd party code (call formatContentModel() can both trigger the result. Of cause the manual operation is easier to do.
What do you think? Should we limit the check to manual operation only? I'm open to any suggestion.
@romanisa fyi.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
createTextso they cannot survive paste/DOM-to-model conversion. These characters are used to hide instructions/text inside HTML (see https://embracethered.com/blog/posts/2024/hiding-and-finding-text-with-unicode-tags/) and otherwise leak into the model as normal text.U+200B, ZWJU+200D, RLOU+202E, PDFU+202C) are preserved.creatorsTest.tscover mixed/boundary/only-invisible inputs and confirm meaningful invisible chars are untouched. An end-to-end test inendToEndTest.tsverifies a full DOM → Model → DOM/text round-trip strips only the tag range.Test plan
yarn test:fast --testPathPattern=creatorsTestyarn test:fast --testPathPattern=endToEndTest🤖 Generated with Claude Code