Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying charset and/or improve charset detection #79

Open
dhouck opened this issue Mar 24, 2023 · 0 comments
Open

Allow specifying charset and/or improve charset detection #79

dhouck opened this issue Mar 24, 2023 · 0 comments

Comments

@dhouck
Copy link

dhouck commented Mar 24, 2023

Currently the only way to specify the charset is in the document (with BOM or <meta charset=); if the charset is known but not specified in the document, there is no way to specify it.

Additionally, charset detection even with Heuristics.ALL does not always work well; in particular, it fails to recognize UTF-8 at least if the first non-ASCII byte is late in the document. The WHATWG spec recommends that systems are able to recognize UTF-8 even if they arenʼt good at other charsets (as a non-normative note)

The UTF-8 encoding has a highly detectable bit pattern. Files from the local file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8, while documents with byte sequences that do not match it are very likely not. When a user agent can examine the whole file, rather than just the preamble, detecting for UTF-8 specifically can be especially effective. [PPUTF8] [UTF8DET]

(This is reproduced with multiple test documents; the smallest is below but another one output the warning method that the UTF-8 character was invalid in Windows-1252, meaning that went with the default which was a particularly bad guess)

<!DOCTYPE html>
<html lang="en">
    <head>
        <link rel="stylesheet" href="https://fred-wang.github.io/mathml.css/mathml.css">
        <title>Circle equation</title>
        <!-- <meta charset="utf-8" /> -->
    </head>
    <body>
        <p>
            The equation
            <math display=inline>
                <mi>y</mi><mo>=</mo><mo>±</mo>
                <msqrt>
                    <msup><mi>r</mi><mn>2</mn></msup>
                    <mo>-</mo>
                    <msup><mi>x</mi><mn>2</mn></msup>
                </msqrt>
            </math>
            produces a circle with radius <math display=inline><mi>r</mi></math>:
            </p>
        <svg width="10em" height="10em" viewBox="0 0 100 100">
            <desc>A circle</desc>
            <circle cx="50" cy="50" r="40" fill="none" stroke="blue" stroke-width="1" />
        </svg>
    </body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant