Fixes #283 #284

RichardSteele · 2023-11-11T13:44:50Z

Prevents the loss of a space if the previous word is 8-bit but the current one is not. This equals the case of the current word being 8-bit but not the previous one.

…rrent one is not.

vincent-richard · 2023-11-14T22:26:20Z

Hello!

Unfortunately, the patch you proposed introduces a regression and breaks the "parser_textTest" unit test, at 3 places:

test: textTest::testNewFromString (F) line: 182
test: textTest::testBugFix20110511 (F) line: 593
test: textTest::testInternationalizedEmail_whitespace (F) line: 711

RichardSteele · 2023-11-17T08:11:54Z

Right, I forgot about the tests.

I think this bug can't be resolved without breaking other parts. As per RFC2047 spaces between adjacent encoded words are just separators but not meant to be displayed. A space between an encoded word and a regular ASCII text is not just a separator but also meant to be displayed.

The problem is that vmime::text::createFromString() doesn't know whether or not the text will be forcefully encoded later on, which will create encoded words even for ASCII texts, like in a mailbox field. Likewise, during the stringification of a text it's unclear whether or not the text was created manually or by createFromString(). Handling the bug at this point could end up in adding redundant spaces.

As RFC2047 says:

Use of 'encoded-word's to represent strings of purely ASCII characters is allowed, but discouraged.

In conclusion, it might be better to put a warning in the comments of vmime::text.

``` mailbox(text("Test München West", charsets::UTF_8), "[email protected]").generate(); ``` produces ``` =?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <[email protected]> ``` The first space between ``Test`` and ``München`` is encoded as an underscore along with the first word: ``Test_``. The second space between ``München`` and ``West`` is encoded with neither of the two words and thus lost. Decoding the text results in ``Test MünchenWest`` instead of ``Test München West``. This is caused by how ``vmime::text::createFromString()`` handles transitions between 7-bit and 8-bit words: If an 8-bit word follows a 7-bit word, a space is appended to the previous word. The opposite case of a 7-bit word following an 8-bit word *misses* this behaviour. When one fixes this problem, a follow-up issue appears: ``text::createFromString("a b\xFFc d")`` tokenizes the input into ``m_words={word("a "), word("b\xFFc ", utf8), word("d")}``. This "right-side alignment" nature of the whitespace is a problem for word::generate(): As per RFC 2047, spaces between adjacent encoded words are just separators but not meant to be displayed. A space between an encoded word and a regular ASCII text is not just a separator but also meant to be displayed. When word::generate() outputs the b-word, it would have to strip one space, but only when there is a transition from encoded-word to unencoded word. word::generate() does not know whether d will be encoded or unencoded. The idea now is that we could change the tokenization of ``text::createFromString`` such that whitespace is at the *start* of words rather than at the end. With that, word::generate() need not know anything about the next word, but rather only the *previous* one. Thus, in this patch, 1. The tokenization of ``text::createFromString`` is changed to left-align spaces and the function is fixed to account for the missing space on transition. 2. ``word::generate`` learns how to steal a space character. 3. Testcases are adjusted to account for the shifted position of the space. Fixes: #283, #284 Co-authored-by: Vincent Richard <[email protected]>

Prevents the loss of a space if the previous word is 8-bit but the cu…

133ca32

…rrent one is not.

RichardSteele mentioned this pull request Nov 12, 2023

vmime::text::createFromString() drops spaces if an 8-bit word is followed by a 7-bit word #283

Closed

jengelh mentioned this pull request Apr 26, 2024

vmime: prevent loss of a space during text::createFromString #306

Merged

vincent-richard closed this May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #283 #284

Fixes #283 #284

RichardSteele commented Nov 11, 2023

vincent-richard commented Nov 14, 2023

RichardSteele commented Nov 17, 2023

Fixes #283 #284

Fixes #283 #284

Conversation

RichardSteele commented Nov 11, 2023

vincent-richard commented Nov 14, 2023

RichardSteele commented Nov 17, 2023