Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a space where two tags met #49

Open
danschumann opened this issue Apr 2, 2015 · 17 comments
Open

Add a space where two tags met #49

danschumann opened this issue Apr 2, 2015 · 17 comments

Comments

@danschumann
Copy link

I'm wondering if there is a way to create a space wherever tags were.

Picture this:

<div>Some sentence.</div><div>Some other Sentence</div>

It converts to Some sentence.Some other Sentence when I run

text = sanitizeHtml(text, {allowedTags:[], allowedAttribute: {}});

Is there an option to add whitespace so the output is better: Some sentence. Some other Sentence

@dgrad
Copy link

dgrad commented Nov 27, 2015

I think to do this properly you'll need a list of block tags (or a list of inline tags). You want to add a space wherever a block tag ends (actually it should probably be a newline character and let the browser convert it to space), but not where an inline tag ends (e.g. <div>foo</div><div>bar</div> should convert to foo bar, but <span>foo</span><span>bar</span> should convert to foobar).

@boutell
Copy link
Member

boutell commented Dec 1, 2015

I agree, and there should be an option to override that list.

I'd take a pull request for this one.

On Fri, Nov 27, 2015 at 12:42 PM, Daniel Grad [email protected]
wrote:

I think to do this properly you'll need a list of block tags (or a list of
inline tags). You want to add a space wherever a block tag ends (actually
it should probably be a newline character and let the browser convert it to
space), but not where an inline tag ends (e.g.

foo
bar
should convert to foo bar, but foobar should convert to foobar).


Reply to this email directly or view it on GitHub
#49 (comment)
.

*THOMAS BOUTELL, *DEV & OPS
P'UNK AVENUE | (215) 755-1330 | punkave.com

@SystemDisc
Copy link

Until this gets implemented, a messy hack would be:

text = text.replace(/>/g, '> ');
text = sanitizeHtml(text, {allowedTags:[]});

@rafacustodio
Copy link

up!!

Is this still on?

@boutell
Copy link
Member

boutell commented Nov 6, 2017

@r-custodio As mentioned, I'd take a PR for this. Unfortunately as maintainer I can't necessarily implement every feature.

@greghub
Copy link

greghub commented Nov 18, 2020

@abea this is labeled as seeking contributions but closed. Is it still something you'd accept a PR for?

@abea
Copy link
Contributor

abea commented Nov 18, 2020

@greghub Sure. It had been sitting idle for years, so there didn't seem much reason to keep it open. I'll reopen it if you want to work on it for 2.x and let the stalebot close it if nothing happens.

@abea abea reopened this Nov 18, 2020
@stale
Copy link

stale bot commented Jan 17, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jan 17, 2021
@stale stale bot closed this as completed Jan 31, 2021
@rusakovic
Copy link

Until this gets implemented, a messy hack would be:

text = text.replace(/>/g, '> ');
text = sanitizeHtml(text, {allowedTags:[]});

it's 2023 now. thank you for your solution =)

@boutell
Copy link
Member

boutell commented Sep 5, 2023

@rusakovic Contributions are welcome!

@adorum
Copy link

adorum commented Mar 26, 2024

it's 2024 now. :)

@boutell
Copy link
Member

boutell commented Mar 26, 2024

Yes, it's 2024 now, and as always, community contributions are welcome. 😄 This isn't a feature that matters for our use cases, although I appreciate it would be nice for developers reading the resulting markup.

@boutell
Copy link
Member

boutell commented Mar 26, 2024

Reopening for potential community PRs.

@boutell boutell reopened this Mar 26, 2024
@stale stale bot removed the stale label Mar 26, 2024
@SystemDisc
Copy link

Is anyone aware of a reliable way to get a list of block-tags? If so, this shouldn't be terribly difficult to implement, right? I'm not sure.

@BoDonkey
Copy link
Contributor

Somewhat dubious source, but chatGPT says
Sure! Here is a list of HTML block-level tags:

  1. <address>
  2. <article>
  3. <aside>
  4. <blockquote>
  5. <canvas>
  6. <dd>
  7. <div>
  8. <dl>
  9. <dt>
  10. <fieldset>
  11. <figcaption>
  12. <figure>
  13. <footer>
  14. <form>
  15. <h1> to <h6>
  16. <header>
  17. <hr>
  18. <li>
  19. <main>
  20. <nav>
  21. <ol>
  22. <p>
  23. <pre>
  24. <section>
  25. <table>
  26. <ul>

These tags are generally used to structure the main content of an HTML document.

@abea
Copy link
Contributor

abea commented Jul 23, 2024

It might be easier to exclude inline elements. There are fewer(?) and they're generally easier to identify. The list of phrasing content elements minus inputs, media, br, and super randos (e.g., ruby) looks to be a pretty good start.

@SystemDisc
Copy link

SystemDisc commented Jul 23, 2024

This is what I got from ChatGPT:

Here is a comprehensive list of block-level HTML tags/elements available in the latest implementation(s) of HTML, including both standard and experimental/non-standard elements:

Standard Block-Level Elements

  1. <address> - Represents contact information for the author/owner of a document.
  2. <article> - Represents a self-contained composition in a document.
  3. <aside> - Represents a section of content indirectly related to the content around it.
  4. <blockquote> - Represents a section that is quoted from another source.
  5. <canvas> - Used for rendering graphics via scripting (usually JavaScript).
  6. <dd> - Describes a term in a description list.
  7. <div> - A generic container for flow content.
  8. <dl> - Represents a description list.
  9. <dt> - Represents a term/name in a description list.
  10. <fieldset> - Groups related elements in a form.
  11. <figcaption> - Represents a caption or legend for the content of its parent <figure>.
  12. <figure> - Represents self-contained content, like illustrations, diagrams, photos, etc.
  13. <footer> - Represents a footer for its nearest sectioning content or sectioning root element.
  14. <form> - Represents a document section containing interactive controls for submitting information.
  15. <h1> to <h6> - Represent headings, with <h1> being the highest (or most important) level and <h6> the lowest.
  16. <header> - Represents introductory content, typically a group of introductory or navigational aids.
  17. <hgroup> - Groups a set of <h1>-<h6> elements when a heading has multiple levels.
  18. <hr> - Represents a thematic break between paragraph-level elements.
  19. <li> - Represents a list item in an ordered or unordered list.
  20. <main> - Represents the main content of the <body>.
  21. <nav> - Represents a section of a page that links to other pages or to parts within the page.
  22. <ol> - Represents an ordered list of items.
  23. <p> - Represents a paragraph.
  24. <pre> - Represents preformatted text.
  25. <section> - Represents a standalone section — which doesn't have a more specific semantic element to represent it.
  26. <table> - Represents tabular data.
  27. <ul> - Represents an unordered list of items.

Deprecated Block-Level Elements

  1. <center> - Used to center-align text (use CSS instead).
  2. <dir> - Used for directory lists (use <ul> instead).
  3. <menu> - Initially intended for context menus, lists of commands, or form controls. It has been repurposed and is less common in modern usage for block-level content.

Non-Standard/Experimental Block-Level Elements

  1. <dialog> - Represents a dialog box or other interactive component, such as an inspector or window.
  2. <main> - Although now part of the HTML5 specification, it was experimental before being standardized. It represents the dominant content of the <body> of a document.
  3. <details> - Represents a disclosure widget from which the user can obtain additional information or controls.
  4. <summary> - Represents a summary, caption, or legend for a <details> element's disclosure box.

These elements are considered block-level because they typically start on a new line and take up the full width available (unless otherwise styled with CSS).

For the most up-to-date list, always refer to the latest HTML specification and browser documentation, as new elements and updates can be introduced.


Here is a comprehensive list of inline HTML tags/elements available in the latest implementation(s) of HTML, including both standard and experimental/non-standard elements:

Standard Inline Elements

  1. <a> - Defines a hyperlink.
  2. <abbr> - Represents an abbreviation or acronym.
  3. <b> - Represents a span of text stylistically different from normal text, without conveying any extra importance or emphasis.
  4. <bdi> - Isolates a span of text that might be formatted in a different direction from other text outside it.
  5. <bdo> - Overrides the current text direction.
  6. <br> - Produces a line break in text.
  7. <cite> - Represents the title of a work.
  8. <code> - Displays a fragment of computer code.
  9. <data> - Links a given content with a machine-readable translation.
  10. <dfn> - Indicates the term being defined within the context of a definition phrase or sentence.
  11. <em> - Marks text that has stress emphasis.
  12. <i> - Represents a span of text in an alternate voice or mood, or otherwise offset from the normal prose in a manner indicating a different quality of text.
  13. <img> - Embeds an image into the document.
  14. <input> - Allows the user to enter data.
  15. <kbd> - Represents user input from a keyboard, voice input, or any other text entry device.
  16. <label> - Represents a caption for an item in a user interface.
  17. <mark> - Represents text that has been highlighted for reference or notation purposes.
  18. <meter> - Represents either a scalar value within a known range or a fractional value.
  19. <noscript> - Defines a section of text to be displayed if a script type on the page is unsupported or if scripting is currently turned off in the browser.
  20. <object> - Represents an external resource, which can be treated as an image, a nested browsing context, or a resource to be handled by a plugin.
  21. <output> - Represents the result of a calculation or user action.
  22. <picture> - Contains zero or more <source> elements and one <img> element to offer alternative versions of an image for different display/device scenarios.
  23. <progress> - Represents the completion progress of a task.
  24. <q> - Indicates that the enclosed text is a short inline quotation.
  25. <s> - Represents text that is no longer accurate or relevant.
  26. <samp> - Represents sample output from a program or computing system.
  27. <script> - Contains scripting statements, or points to an external script file through the src attribute.
  28. <select> - Represents a control that provides a menu of options.
  29. <small> - Makes the text font size one size smaller (for example, from large to medium, or from small to x-small).
  30. <span> - Generic inline container for phrasing content, which does not inherently represent anything.
  31. <strong> - Indicates that its contents have strong importance, seriousness, or urgency.
  32. <sub> - Specifies inline text which should be displayed as subscript.
  33. <sup> - Specifies inline text which should be displayed as superscript.
  34. <template> - Holds client-side content that will not be rendered when the page loads but can be instantiated later using JavaScript.
  35. <textarea> - Represents a multi-line plain-text editing control.
  36. <time> - Represents either a time on a 24-hour clock or a precise date in the Gregorian calendar.
  37. <u> - Represents a span of inline text which should be rendered in a way that indicates that it has a non-textual annotation.
  38. <var> - Represents the name of a variable in a mathematical expression or a programming context.
  39. <wbr> - Represents a word break opportunity.

Deprecated Inline Elements

  1. <acronym> - Represents an acronym; use <abbr> instead.
  2. <big> - Makes the text font size one size larger.
  3. <tt> - Represents text in a fixed-pitch font; use CSS instead.
  4. <font> - Defines font, color, and size for text; use CSS instead.

Non-Standard/Experimental Inline Elements

  1. <slot> - Part of the Web Components technology suite, it is a placeholder inside a web component that you can fill with your own markup, similar to a content placeholder in other templating systems.

These elements are considered inline because they do not start on a new line and only take up as much width as necessary. For the most accurate and up-to-date list, always refer to the latest HTML specification and browser documentation.


In HTML, custom or undefined tags are treated as inline elements by default. This means that if you define a custom tag that is not recognized by the HTML specification, it will behave like an inline element unless you explicitly style it using CSS.

For example, if you create a custom tag <my-custom-element>, it will be treated as an inline element:

<my-custom-element>This is a custom element.</my-custom-element>

To change its behavior to a block-level element, you need to use CSS:

my-custom-element {
    display: block;
}

This CSS rule will make the custom element behave as a block-level element:

<my-custom-element>This is a custom element.</my-custom-element>

With the CSS applied, <my-custom-element> will now start on a new line and take up the full width available, like standard block-level elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants