Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table column width miscalculated when value contains "accent" Unicode character #1599

Open
taurit opened this issue Aug 6, 2024 · 3 comments
Labels
bug Something isn't working needs triage

Comments

@taurit
Copy link

taurit commented Aug 6, 2024

Information

  • OS: Windows
  • Version: 0.49.1
  • Terminal: Windows Terminal

Describe the bug
When I display strings containing Unicode accent characters in the table context:

  • the width of the table seems miscalculated
  • there seems to be a whitespace rendered after the characters

I tried troubleshooting (e.g. went through Best Practices -> Configuring the Windows Terminal For Unicode and Emoji Support) but with no difference. I am curious if this is a limitation of a Terminal, a bug, or if I am misusing the library and it can be fixed with a configuration change?

To Reproduce

static void Main(string[] args)
{
    var table = new Table();
    table.Border = TableBorder.Rounded;

    table.AddColumn("Field");
    table.AddColumn("Value");

    table.AddRow("Row 1", "1");
    table.AddRow("Row 2", "2 ąśłćż");
    table.AddRow("Row 3", "3 єшерти");
    table.AddRow("Row 4", "4 áb́ćd́"); // \u0301 is the issue here

    AnsiConsole.Write(table);
}

Expected behavior
A table is rendered with the same column width in all rows.

Screenshots
obraz

Additional context

  • Windows Terminal uses "Cascadia Code" font, as described in Best Practices
  • "Use Unicode UTF-8 for worldwide language support" checkbox is selected in my OS:
    obraz

Thanks for your work and a great library! :)

Please upvote 👍 this issue if you are interested in it.

@taurit taurit added bug Something isn't working needs triage labels Aug 6, 2024
@github-project-automation github-project-automation bot moved this to Todo 🕑 in Spectre Console Aug 6, 2024
@patriksvensson
Copy link
Contributor

This seems to be a bug in the wcwidth library. I will look into it.

@elgonzo
Copy link

elgonzo commented Aug 7, 2024

there seems to be a whitespace rendered after the characters

The whitespace "holes" in the 4 á b́ ć d́ output look very much just like ye olde Windows Console behavior.

Your "4 áb́ćd́" string uses combining marks. A combining mark uses one character cell in the console output buffer independently from the character it is combining with, hence the empty space you see. It's not possible to solve this by shifting the cursor position one to the left after outputting a combining mark in an attempt to make the empty space available for output. Because then you are not going to see the combining marks (the diacriticals) anymore, because they are being overwritten in the console output buffer. (Have been there, done that...)

For many scenarios, string.Normalize() might be used to convert combining marks and their preceding character into single characters. However, this is not 100% bullet-proof and still might leave you with combining marks (and thus with "holes" in the console output), as there are no single (pre-composed) Unicode characters for all possible combinations of combining marks and their preceding character (as is the case with , for example).

@taurit
Copy link
Author

taurit commented Aug 9, 2024

@elgonzo Thank you for the explanation!

It indeed looks like an issue with specific terminals like cmd.exe rather than the library. I think there is nothing more to do on the side of Spectre.Console.

As an additional test, I pasted a simple echo "áb́ćd́éf́" to Windows Terminal and saw the same erroneous behavior with whitespace.

Workaround

I'll paste the workaround I ended up with in case someone with a similar problem finds this thread.

  1. First, I use string.Normalize() to replace characters with ones showing wider compatibility where it's possible
  2. Then, I remove accent characters remaining in the string. I lose some accent marks in the console output, but it's an acceptable tradeoff for me to keep the output readable.

Screenshot

Workaround ran in Windows Terminal

Code

// Variant 1
// (accent marks were the only characters problematic for Windows Terminal that I found)
Console.WriteLine("Strings without normalization:");
var table = new Table();
table.AddColumns("Field", "Value");
table.AddRow("Row 1", "ABCDEF");
table.AddRow("Row 2", "ąęúłśż");
table.AddRow("Row 3", "áéúíüñ");
table.AddRow("Row 4", "абвцде");
table.AddRow("Row 5", "а\u0301б\u0301в\u0301ц\u0301д\u0301е\u0301");
table.AddRow("Row 6", "a\u0301b\u0301c\u0301d\u0301e\u0301f\u0301");
table.AddRow("Row 7", "👍👎👌👏👋👊");
table.AddRow("Row 8", "你好嗎?我很");
table.AddRow("Row 9", "🇵🇱🇧🇷🇨🇦🇺🇸🇬🇧🇦🇺");
table.AddRow("Row 10", "أبجد ه");
AnsiConsole.Write(table);


// Variant 2
Console.WriteLine("Normalized with `string.Normalize(NormalizationForm.FormC)`:");
var table2 = new Table();
table2.AddColumns("Field", "Value");
table2.AddRow("Row 1", "ABCDEF".Normalize());
table2.AddRow("Row 5", "а\u0301б\u0301в\u0301ц\u0301д\u0301е\u0301".Normalize());
table2.AddRow("Row 6", "a\u0301b\u0301c\u0301d\u0301e\u0301f\u0301".Normalize());
AnsiConsole.Write(table2);

// Variant 3
Console.WriteLine("Normalized with `string.Normalize(NormalizationForm.FormC)`, remaining accent characters removed:");
string RemoveAccentMarks(string input) => string.Concat(input.Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark));
var table3 = new Table();
table3.AddColumns("Field", "Value");
table3.AddRow("Row 1", "ABCDEF".Normalize());
table3.AddRow("Row 5", RemoveAccentMarks("а\u0301б\u0301в\u0301ц\u0301д\u0301е\u0301".Normalize()));
table3.AddRow("Row 6", RemoveAccentMarks("a\u0301b\u0301c\u0301d\u0301e\u0301f\u0301".Normalize()));
AnsiConsole.Write(table3);

Thanks again for the support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage
Projects
Status: Todo 🕑
Development

No branches or pull requests

3 participants