Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: reduce the default allowed tags #350

Open
vanillajonathan opened this issue Jun 7, 2022 · 20 comments
Open

Proposal: reduce the default allowed tags #350

vanillajonathan opened this issue Jun 7, 2022 · 20 comments

Comments

@vanillajonathan
Copy link
Contributor

Remove all legacy/deprecated tags such as acronym, big, center, dir, keygen, menuitem and strike.

Remove non-semantic presentational HTML4 tags such as b, i, u and tt.

Remove tags dealing with forms such as button, datalist, fieldset, form, input, keygen, label, legend, optgroup, option, output and select. Remove UI tags such as meter, progress.

Remove the main tag just as the html and body tags aren't allowed.

@mganss
Copy link
Owner

mganss commented Jun 9, 2022

I'd rather not mess with the default set for backward compatibility reasons. But perhaps it would be a good idea to provide static readymade sets for different use cases like https://github.com/rgrove/sanitize#configuration (restricted, basic, relaxed). What do you think?

@vanillajonathan
Copy link
Contributor Author

It could affect backward compatibility so it would have to be done for a major version, such as HtmlSanitizer v8.0. But I do think it should eventually be done, because in my opinion the defaults are too liberal to the extent that it would be silly to use the defaults.

Static predefined sets could be an approach, another approach would be just to offer predefined sets as suggestions in the wiki. Either way, I do feel the defaults ought to be changed for an upcoming major version where breaking changes are to be expected.

@mganss
Copy link
Owner

mganss commented Jun 10, 2022

I agree in that we could change the defaults and up the major version. I'm unsure though what exactly we should use as guidance when selecting the default set. The original motivation was to eliminate only those elements that carry XSS potential. For example, https://github.com/rgrove/sanitize allows b and i even in the most restricted configuration.

@vanillajonathan
Copy link
Contributor Author

Right now it is possible to do some undesirable things such as using display: none or visibility: hidden to hide spam, set the min-width, width, min-height and height to excessive values like 9000px, etc.

I think HtmlSanatizer is commonly used for comments, forum posts, blog posts and to author content in a CMS.

I am not even sure there ought to be a default set at all. If a default set should exist, I would think it should be heavily restricted to perhaps just semantic elements for user-submitted comment; this would include strong and em but not h1 .. h6.

@tiesont
Copy link

tiesont commented Jun 11, 2022

I think any decision about changing the default needs to be based on deciding what is in scope for HtmlSanitizer and what isn't. Arbitrarily deciding what constitutes valid HTML feels out of scope, to me; as @mganss says, the main motivator for HtmlSanitizer is, well, sanitizing HTML.

I do like the idea of having static predefined sets that can be passed to the various options.

@vanillajonathan
Copy link
Contributor Author

Good point @tiesont!

Maybe move the default to a static class called Presets and change the example in the documentation to:

var sanitizer = new HtmlSanitizer();
sanitizer.AllowedTags = Presets.AllXssSafeTags;
sanitizer.AllowedAttributes = Presets.AllXssSafeAttributes;
sanitizer.AllowedCssProperties = Presets.AllXssSafeCssProperties;
sanitizer.AllowedAtRules = Presets.AllXssSafeAtRules;
sanitizer.AllowedSchemes = Presets.AllXssSafeSchemes;
sanitizer.UriAttributes = Presets.AllXssSafeUriAttributes;
var html = "<p>Hello world!</p>";
var sanitized = sanitizer.Sanitize(html, "https://www.example.com");

Though this would kind of break the expectation that a class when newed up should be fully instantiated and ready to be used. But maybe it is fully initialized since the Sanitize method can be called, although it will sanitize everything since nothing would be allowed. To prevent this, the constructor could be changed to require the parameters but maybe that would be awkward too.

@tiesont
Copy link

tiesont commented Jun 11, 2022

You'd probably want to follow the pattern used by a lot of .NET Core utilities - rather than pass in multiple parameters, pass in a configuration/options object (maybe even allow a factory method). The default constructor would just use the existing defaults.

@vanillajonathan
Copy link
Contributor Author

So something like:

var options = new HtmlSanitizerOptions
{
    AllowedTags = Presets.AllXssSafeTags;
    AllowedAttributes = Presets.AllXssSafeAttributes;
    AllowedCssProperties = Presets.AllXssSafeCssProperties;
    AllowedAtRules = Presets.AllXssSafeAtRules;
    AllowedSchemes = Presets.AllXssSafeSchemes;
    UriAttributes = Presets.AllXssSafeUriAttributes;
};

var sanitizer = new HtmlSanitizer(options);

@tiesont
Copy link

tiesont commented Jun 11, 2022

Exactly.

@vanillajonathan
Copy link
Contributor Author

@mganss Thoughts?

@mganss
Copy link
Owner

mganss commented Jun 14, 2022

@vanillajonathan I like the idea. Would you like to implement this?

Also I think it would be nice if you had readymade instances of the HtmlSanitizerOptions class, for example

var s = new HtmlSanitizer(Presets.Default);
var s2 = new HtmlSanitizer(Presets.Relaxed);
var s3 = new HtmlSanitizer(Presets.Strict);

@vanillajonathan
Copy link
Contributor Author

vanillajonathan commented Jun 14, 2022

Then you could pass in the presets to the HtmlSanitizerOptions instead.

var options = new HtmlSanitizerOptions(Presets.Strict);
var sanitizer = new HtmlSanitizer(options);

This is the way used by JsonSerializerOptions.

See #359. You can merge it already, it is not a breaking change, it just adds a constructor while leaving the existing constructor in place. The existing constructor could be decorated with the Obsolete attribute.

@tiesont
Copy link

tiesont commented Jul 18, 2022

Would it make any sense to also provide a constructor which accepts a Func<HtmlSanitizerOptions>?

You might use it like so:

var sanitizer = new HtmlSanitizer(() => {
    var options = new HtmlSanitizerOptions(Presets.Strict);

    // do some other stuff with options as necessary
    // --> assuming actions that can't be called with initializer syntax

    return options;
});

This particular example doesn't really have any benefit over just instantiating the options object and then passing it as a parameter, but I'd assume it would allow a factory delegate of some sort to be passed, too?

@vanillajonathan
Copy link
Contributor Author

Sounds reasonable. It would be an Action, so it would be Action<HtmlSanitizerOptions>.

I've been hesitating on the presets. It would be singular not plural because plural are used for bitflags enum while singular for normal enums. Preset is a bit ambiguous, so I was thinking HtmlSanitizerPreset or HtmlSanitizerOptionsPreset but that is a bit long though.

Right now we only have the default preset so the enum would just have one variant. It could be called Default but if the ambition is to move towards a more reduced defaults then maybe it ought to be named Relaxed or Loose or something instead.

With pull request #370 the constructor taking arguments is removed and a new parameterless constructor is introduced which configures the HtmlSanitizer with the current defaults that is already used today.

If taking in an enum for presets as input parameter then the class would grow with a new if condition for every variant of the enum. Another approach would to use a factory to keep all that if logic within a factory class. The factory could have take in an enum, or instead of an enum it could just have methods instead. Example HtmlSanitizerOptionsFactory.CreateRelaxed().

@tiesont
Copy link

tiesont commented Jul 18, 2022

@vanillajonathan I think I'm a little confused - what are you referring to, with respect to enums?

@vanillajonathan
Copy link
Contributor Author

In the mentioned Presets.Strict, here Presets is a enum, and Strict is a enum member.

The Microsoft .NET Framework Design Guidelines guidelines state:

✔️ DO use a singular type name for an enumeration unless its values are bit fields.
✔️ DO use a plural type name for an enumeration with bit fields as values, also called flags enum.

Source: https://docs.microsoft.com/en-us/dotnet/standard/design-guidelines/names-of-classes-structs-and-interfaces#naming-enumerations

So this enum would be named something like Preset (singular), not Presets (plural).

Currently HtmlSanitizer doesn't have any presets, only a default configuration, so it would look something like:

public enum Preset
{
    Default
}

But I am not sure we even should offer any enum member called Default. It is rather unclear what "default" would actually mean, so maybe it should be something like:

public enum Preset
{
    Relaxed
}

If the HtmlSanitizer or the HtmlSanitizerOptions class would take in an constructor that accepts a enum preset parameter, then it would look something like:

public class HtmlSanitizerOptions
{
    public HtmlSanitizerOptions { }

    public HtmlSanitizerOptions(Preset preset)
    {
        if (preset == Preset.Relaxed)
        {
            AllowedTags = HtmlSanitizerDefaults.DefaultAllowedTags;
            AllowedAttributes = HtmlSanitizerDefaults.DefaultAllowedAttributes;
            AllowedCssProperties = HtmlSanitizerDefaults.DefaultAllowedCssProperties;
            AllowedAtRules = HtmlSanitizerDefaults.DefaultAllowedAtRules;
            AllowedSchemes = HtmlSanitizerDefaults.DefaultAllowedSchemes;
            UriAttributes = HtmlSanitizerDefaults.DefaultUriAttributes;
        }
    }
}

If more enum members were added to the Preset enum, more else if (preset == Preset.Strict) etc conditions would have to be added to the constructor, and the constructor could get rather long.

Another approach could be a factory, such as:

public static class HtmlSanitizerOptionsFactory
{
    public static HtmlSanitizerOptions CreateRelaxed()
    {
        return new HtmlSanitizerOptions
        {
            AllowedTags = HtmlSanitizerDefaults.DefaultAllowedTags;
            AllowedAttributes = HtmlSanitizerDefaults.DefaultAllowedAttributes;
            AllowedCssProperties = HtmlSanitizerDefaults.DefaultAllowedCssProperties;
            AllowedAtRules = HtmlSanitizerDefaults.DefaultAllowedAtRules;
            AllowedSchemes = HtmlSanitizerDefaults.DefaultAllowedSchemes;
            UriAttributes = HtmlSanitizerDefaults.DefaultUriAttributes;
        }
    }
}

@tiesont
Copy link

tiesont commented Jul 18, 2022

@vanillajonathan Ah, the enum is for the options class. That makes sense. I was trying to figure out where the HtmlSanitizer class would use an enum.

I'd think your enums, a switch, and your factory proposal should keep the constructor relatively concise.

@mganss
Copy link
Owner

mganss commented Jul 18, 2022

@tiesont Re the Action parameter, I'm not sure I understand the use case. I know this pattern from IServiceCollection where you configure a DI container with an Action which gets called at a later time when the container is built. But where's the need to postpone execution of the Action in the scope of HtmlSanitizer? Or is there another reason I'm missing?

@vanillajonathan I favor the second approach for the reasons you have mentioned. Why not have static members like so?

public class HtmlSanitizerOptions
{
    public static readonly HtmlSanitizerOptions Relaxed = CreateRelaxed();

    private static HtmlSanitizerOptions CreateRelaxed()
    {
        // ...
    }
}

@vanillajonathan
Copy link
Contributor Author

Well the enum could be on the HtmlSanitizer class, but I was thinking of having on the HtmlSanitizerOptions class because what is what is done by JsonSerializerOptions.

But I was thinking of ditching the enum altogether and instead using a separate factory class with a factory method for each preset. This would avoid a long method with many if/else statements or switch.

@mganss Yeah static members on the HtmlSanitizerOptions is an alternative too, that would be factory methods on the options class instead of having a separate factory class. The advantage of having the factory method on the options class is that it is arguably more discoverable than a separate factory class, and arguably more cohesive and local. An argument for a separate factory class would be that it separates things. I like your idea though.

@mganss
Copy link
Owner

mganss commented Jul 18, 2022

Perhaps the presets should be ImmutableHashSets. It feels wrong that the caller could change the static defaults.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants