Skip to content

Incorrect handling of double-width characters #14

@ulidtko

Description

@ulidtko

Take this TSV:

date        |P東       |東 score  |P南       |南 score  |P西       |西 score  |P北       |北 score  |comment
2015-04-04  john    35100       bob     32100       mary    12000       katy    20800
2015-04-04  mary    33500       bob     49500       katy    21600       john    -4600

It looks aligned in ST+elastic tabstops, completely with column headers. But in any other text viewer (less or this Markdown view above) column headers are not aligned — because of an extra space inserted between double-width characters 東南西北 and the following tab character separator.

For clarity, I'll visualize the whitespace characters involved:

date······↦   |P東····↦   |東·score↦   |P南····↦   |南·score↦   |P西····↦   |西·score↦   |P北····↦   |北·score↦   |comment
2015-04-04↦   john···↦   35100···↦   bob····↦   32100···↦   mary···↦   12000···↦   katy···↦   20800
2015-04-04↦   mary···↦   33500···↦   bob····↦   49500···↦   katy···↦   21600···↦   john···↦   -4600

In a fixwidth environment like a terminal (e.g. less), the string |P東 takes 4 character places to render (even though it's a 3-character string: |, P, ). This is exactly the width that john and mary cells have. But — and this is the bug — john and mary have 3 U+20's after them, while |P東 has 4. This is what breaks alignment in monospace non-elastic-tabstop-aware viewers.

Conceptually, this is easily fixed by using "em width" (which is 1 or 2 for character C where unicodedata.east_asian_width(C)=='Na' or unicodedata.east_asian_width(C)=='W' correspondingly) instead of plain character count when computing the number of spaces that the plugin inserts for compatibility alignment.

Whew. I do realize that this report is futile, but still, it's here for the record.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions