treat Chinese punctuation marks as word separator #10853

zhangzq · 2024-05-06T07:30:14Z

Describe the bug
The Chinese punctuation marks are not recognized as word separators.

def fun1(one, *two):
    pass

## 同 #fun1。
# 同 #fun1 。   
# same as #fun1.
def fun2(one, *, two):
    pass

The first #fun1 is not recognized. I can add space before 。, but it will also generate space before 。 which looks ugly.

Screenshots

To Reproduce

As above and default doxyfile.

Expected behavior

Treat #fun1。 like #fun1.

not only 。，but all Chinese punctuation marks，like ，：（）？！“”, etc.

Version
1.11.0 (696ee5ef4954fb8a098c028ff7df0ddc72dfe83d*)

The text was updated successfully, but these errors were encountered:

albert-github · 2024-05-06T09:38:41Z

This is quite a difficult but interesting problem especially to implement for people that don't speak l the languages with different punctuation rules.

Some thoughts / questions:

This means that on a lot of places the different punctuation systems should be known as now most of the time the entire range \x80-\xFF is included in e.g. the lexer rules (and on other places).
(as a small example:

ID        [$a-z_A-Z\x80-\xFF][$a-z_A-Z0-9\x80-\xFF]*

meaning that for the ID, e.g. variable / function name but not limited to this, the ACSII characters $, _, a-z, A-Z and all "non"-ASCII characters can be used as first character and as following characters besides the mentioned characters also digits are possible).

As the cutting / pasting of non ASCII characters is quite tricky:

Can you please attach a, small, self contained example (source+configuration file in a, compressed, tar or zip file!) that allows us to reproduce the problem? Please don't add external links as they might not be persistent (also references to GitHub repositories are considered non persistent).

Probably the page https://en.wikipedia.org/wiki/Chinese_punctuation gives a good overview already

can you confirm this?

In the mentioned wikipedia page I see: Enumeration comma ( 、 ), in he function trWriteList (translator_cn.h) I see that the "normal" comma , is used. Wouldn't it be better to use here in the enumeration the "Enumeration comma"?

In the function DefinitionImpl::_setBriefDescription (definition.cpp) we see that the function needsPunctuation() is used and for Chinese the returned value is false so no automatic (Chines) full stop is added.

should for Chinese also a Chinese full stop be added when the sentence does not end with one of the Chines punctuation characters equivalent to the mentioned characters?

zhangzq · 2024-05-07T03:22:38Z

The wiki page is useful, the mentioned punctuation marks should be treated as word stops. These characters can't be parts of variable/function names.

I think variable/function/etc names shouldn't contain non-ascii characters. I know some language support non-ascii as names, but nobody uses them in serious programming.

albert-github added the enhancement a request to enhance doxygen, not a bug label May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

treat Chinese punctuation marks as word separator #10853

treat Chinese punctuation marks as word separator #10853

zhangzq commented May 6, 2024 •

edited by albert-github

albert-github commented May 6, 2024

zhangzq commented May 7, 2024 •

edited

treat Chinese punctuation marks as word separator #10853

treat Chinese punctuation marks as word separator #10853

Comments

zhangzq commented May 6, 2024 • edited by albert-github

albert-github commented May 6, 2024

zhangzq commented May 7, 2024 • edited

zhangzq commented May 6, 2024 •

edited by albert-github

zhangzq commented May 7, 2024 •

edited