Take this TSV:
date |P東 |東 score |P南 |南 score |P西 |西 score |P北 |北 score |comment
2015-04-04 john 35100 bob 32100 mary 12000 katy 20800
2015-04-04 mary 33500 bob 49500 katy 21600 john -4600
It looks aligned in ST+elastic tabstops, completely with column headers. But in any other text viewer (less or this Markdown view above) column headers are not aligned — because of an extra space inserted between double-width characters 東南西北 and the following tab character separator.
For clarity, I'll visualize the whitespace characters involved:
date······↦ |P東····↦ |東·score↦ |P南····↦ |南·score↦ |P西····↦ |西·score↦ |P北····↦ |北·score↦ |comment
2015-04-04↦ john···↦ 35100···↦ bob····↦ 32100···↦ mary···↦ 12000···↦ katy···↦ 20800
2015-04-04↦ mary···↦ 33500···↦ bob····↦ 49500···↦ katy···↦ 21600···↦ john···↦ -4600
In a fixwidth environment like a terminal (e.g. less), the string |P東 takes 4 character places to render (even though it's a 3-character string: |, P, 東). This is exactly the width that john and mary cells have. But — and this is the bug — john and mary have 3 U+20's after them, while |P東 has 4. This is what breaks alignment in monospace non-elastic-tabstop-aware viewers.
Conceptually, this is easily fixed by using "em width" (which is 1 or 2 for character C where unicodedata.east_asian_width(C)=='Na' or unicodedata.east_asian_width(C)=='W' correspondingly) instead of plain character count when computing the number of spaces that the plugin inserts for compatibility alignment.
Whew. I do realize that this report is futile, but still, it's here for the record.
Take this TSV:
It looks aligned in ST+elastic tabstops, completely with column headers. But in any other text viewer (
lessor this Markdown view above) column headers are not aligned — because of an extra space inserted between double-width characters東南西北and the following tab character separator.For clarity, I'll visualize the whitespace characters involved:
In a fixwidth environment like a terminal (e.g.
less), the string|P東takes 4 character places to render (even though it's a 3-character string:|,P,東). This is exactly the width thatjohnandmarycells have. But — and this is the bug —johnandmaryhave 3 U+20's after them, while|P東has 4. This is what breaks alignment in monospace non-elastic-tabstop-aware viewers.Conceptually, this is easily fixed by using "em width" (which is 1 or 2 for character
Cwhereunicodedata.east_asian_width(C)=='Na'orunicodedata.east_asian_width(C)=='W'correspondingly) instead of plain character count when computing the number of spaces that the plugin inserts for compatibility alignment.Whew. I do realize that this report is futile, but still, it's here for the record.