Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate headings = duplicate IDs #58

Open
infinitnet opened this issue Apr 1, 2024 · 6 comments
Open

Duplicate headings = duplicate IDs #58

infinitnet opened this issue Apr 1, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@infinitnet
Copy link

The title sort of explains the issue already. If we have an outline like:

H2: PHP
H3: Examples
H2: JavaScript
H3: Examples

then the two H3s would have the same ID and effectively break the functionality of the TOC. This is something I'd consider a critical issue that actually blocks me from using the plugin because having duplicate headings is a relatively common situation and not a rare exception. The obvious solution would be to append -N if there are duplicate headings, eg. #example-1 and #example-2 in the case of above outline.

@mtoensing
Copy link
Owner

mtoensing commented Apr 1, 2024

I understand and know this bug. If someone comes up with an easy (!) solution I am glad to merge a PR. But normally this should not happen because the Table of Conents is not well-structured. I understand your example where you have a recurring heading for different sections. But from my point of view the TOC should be something like.

H2: PHP
H3: Examples PHP
H2: JavaScript
H3: Examples JavaScript

This may sound like an excuse and in a way it is honestly. However SimpleTOCs inner workings are relatively straightforward: The function that generates the ids is simple and robust because it does not care what happened before and after. And you would have to "sync" the -1, -2, -3 from each loop for headings and toc with each other. Since you can remove a heading level from the TOC this needs to be reflected in the other loop for the existing headings in the post.

simpletoc_sanitize_string()

It does not keep track what it generated for the TOC and the heading ids. So, a simple solution is needed to fix this.

@mtoensing mtoensing added the bug Something isn't working label Apr 1, 2024
@infinitnet
Copy link
Author

Yes, having duplicate headings is indeed not optimal, but it's still something that's rather common and I'm just taking a "user perspective" here: If someones wants to replace another TOC plugin with yours on several sites with several hundreds of pages each where many of them may or may not have duplicate headings, this could quickly become a reason to not use the plugin because it would just take too much time to find and change all these pages.

How about something like this (just as a concept)?

function simpletoc_add_ids_to_content($content) {
  $dom = new DOMDocument();
  $dom->loadHTML($content);
  $xpath = new DOMXPath($dom);
  $heading_counts = [];
  
  // Loop through all heading tags
  foreach ($xpath->query('//h1|//h2|//h3|//h4|//h5|//h6') as $tag) {
    $heading_text = trim(strip_tags($tag->nodeValue));
    
    // Check if the tag already has an id attribute
    if ($tag->hasAttribute('id')) {
      continue;
    }
    
    // Generate the anchor using simpletoc_sanitize_string
    $anchor = simpletoc_sanitize_string($heading_text);
    
    // Check if we've seen this anchor before
    if (isset($heading_counts[$anchor])) {
      // Append numeric suffix if duplicate
      $heading_counts[$anchor]++;
      $anchor .= '-' . $heading_counts[$anchor]; 
    } else {
      // First occurrence, no suffix needed
      $heading_counts[$anchor] = 1;
    }
    
    // Add ID attribute
    $tag->setAttribute('id', $anchor);
  }
  return $dom->saveHTML();
}

function generate_toc($headings, $attributes) {
  // ... (existing code)
  
  foreach ($headings as $line => $headline) {
    // ... (existing code)
    
    $title = trim(strip_tags($headline));
    $anchor = simpletoc_sanitize_string($title);
    
    // Check if we've seen this anchor before
    if (isset($heading_counts[$anchor])) {
      // Append numeric suffix if duplicate
      $anchor .= '-' . $heading_counts[$anchor]; 
    }
    
    // ... (existing code)
  }
  
  // ... (existing code)
}

@infinitnet
Copy link
Author

Or much simpler...:

$heading_ids = [];

function simpletoc_sanitize_string($string) {
  global $heading_ids;

  // Remove punctuation, non-breaking spaces, umlauts, and accents
  $sanitized_string = remove_accents(str_replace(" ", " ", preg_replace("/\p{P}/u", "", $string)));

  // Replace whitespace and other characters with dashes
  $sanitized_string = sanitize_title_with_dashes($sanitized_string);

  // Check if the sanitized string already exists in the $heading_ids array
  if (isset($heading_ids[$sanitized_string])) {
    // Increment the count and append the numeric suffix
    $heading_ids[$sanitized_string]++;
    $sanitized_string .= '-' . $heading_ids[$sanitized_string];
  } else {
    // Add the sanitized string to the $heading_ids array with a count of 1
    $heading_ids[$sanitized_string] = 1;
  }

  return $sanitized_string;
}

@mtoensing
Copy link
Owner

// Add the sanitized string to the $heading_ids array with a count of 1

You can't use the counter inside the function because it will be called in two different places: Heading ids and generation of the TOC. And the TOC can be limited to only H2s. So the counter might fail. Be since I am looking into #57 because 'someone' came up with a brilliant idea today ;-) I would be glad if you provide a pull request.

@infinitnet
Copy link
Author

I'd love to contribute with a PR, but unfortunately my coding skills are way too basic to be able to; they're mostly a mix of being able to describe what I want and running detailed instructions through prompt chains using different LLMs and iterating and adjusting the instructions until I get functional code :)) I was looking for something to replace LuckyWP TOC with and yours was the only plugin I found that would actually fit all requirements except for the duplicate headings part.

function simpletoc_sanitize_string($string, $heading_ids = []) {
  // Remove punctuation, non-breaking spaces, umlauts, and accents
  $sanitized_string = remove_accents(str_replace(" ", " ", preg_replace("/\p{P}/u", "", $string)));

  // Replace whitespace and other characters with dashes
  $sanitized_string = sanitize_title_with_dashes($sanitized_string);

  // Check if the sanitized string already exists in the $heading_ids array
  if (isset($heading_ids[$sanitized_string])) {
    // Increment the count and append the numeric suffix
    $heading_ids[$sanitized_string]++;
    $sanitized_string .= '-' . $heading_ids[$sanitized_string];
  } else {
    // Add the sanitized string to the $heading_ids array with a count of 1
    $heading_ids[$sanitized_string] = 1;
  }

  return [$sanitized_string, $heading_ids];
}

function simpletoc_add_ids_to_content($content) {
  $heading_ids = [];

  // Use the existing code to parse the content and extract headings
  // ...

  foreach ($headings as $heading) {
    $sanitized_string = simpletoc_sanitize_string($heading, $heading_ids);
    $heading_ids = $sanitized_string[1];
    $anchor = $sanitized_string[0];

    // Add the ID attribute to the heading
    // ...
  }

  // ...
}

function generate_toc($headings, $attributes) {
  $heading_ids = [];

  // ...

  foreach ($headings as $line => $headline) {
    // ...

    $title = trim(strip_tags($headline));
    $sanitized_string = simpletoc_sanitize_string($title, $heading_ids);
    $heading_ids = $sanitized_string[1];
    $anchor = $sanitized_string[0];

    // ...
  }

  // ...
}

@mtoensing
Copy link
Owner

This is much more complicated because the function simpletoc_add_ids_to_content runs many times on the same content with different scopes. This was necessary to catch all headings even on pages that are not using simpletoc as a block. So it it with the current setup not easy to add custom ids that match the ids in the SimpleTOC itself. It works because the function creates the same output without "counting"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants