-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow site configuration to not index tag pages #10835
Comments
The concept of tags is plugin-specific, so it can only be a plugin config, not a core site config. The sitemap plugin does not need to be modified, it already ignores all pages having
Is this really that different from regular blog pagination, and blog authors pages for example? In the future, we may allow you to provide an MDX file to provide a valuable tag description: Similar for blog authors: For now, it's just a string description but we'd like to allow MDX, so at least the 1st paginated page could contain meaningfully unique content
Afaik we are using canonical URLs and structured data so search engines know that this is not to be considered as duplicate content.
Can you provide it? Show your SEO score before/after the change. Or share a resource from an authority such as google explaining why this is a bad practice decreasing the score. Note that docs tags barely create duplicate content because they only render title + description: https://docusaurus.io/tests/docs/tags Unlike blog posts which present an excerpt, that you can truncate using a As far as I understand, you only use docs tags: I'm not sure that creating such an index with relatively small excerpts is going to be considered "duplicate content". Maybe the problem is that you are not using I find it surprising that you are presenting here an advanced SEO topic, while you are not even using the most basic SEO metadata correctly on your canonical pages 😅
Specific pages receive more external backlinks from external domains, and also from internal pages since paginated pages only receive links from previous pages while all paginated pages will link to actual canonical pages. I doubt this is the case so please provide it or provide an authority link explaining this behavior.
I don't understand what you mean here 😅
How am I supposed to see this in the screenshots above? Please We don't have such a problem on other websites I manage. I'd be happy to improve SEO for Docusaurus. This includes providing APIs to alter the SEO behavior, and/or providing more sensible defaults. However, I do not take this lightly. Changing the SEO profile of thousands of existing Docusaurus websites is risky and could backfire. That's why I'm going to push back a lot and ask you to back your claims better. We need to run experiments, and measure the SEO impact before/after to see if a change is worth generalizing, or an opt-in feature worth implementing. I'm also usually asking other community members with expertise or care about SEO to confirm this change is welcome, such as @jdevalk or @johnnyreilly Afaik @johnnyreilly uses tags and regularly monitor his website SEO, working with an SEO agency, and has not reported this problem: |
Hi, Thank you for your detailed response and for diving deeper into this topic.
Good point. E-E-A-T (Experience, Expertise, Authoritativeness, Trust) signals matter for user trust for SEO. New idea for evaluation: Use To help search engines understand that category pages with See source code import {
useCurrentSidebarCategory,
filterDocCardListItems,
findFirstSidebarItemLink,
} from '@docusaurus/plugin-content-docs/client';
import { PageType } from '@site/src/components/PageType';
import { toAbsoluteUrl } from '@site/src/components/Utilities/ToAbsoluteUrl';
// Read by:
// - Google : https://developers.google.com/search/docs/appearance/structured-data/carousel
// However, according to URL Inspector Tool (tested Jan 12, 2025) the data was not read.
// According to the docs, it only results in rich results for specific type of pages, but
// we still use it to "hint" Google how the page works.
export function getChildrenItemListStructuredData(pageType: PageType): ItemList | null {
if (pageType !== 'collection' && pageType !== 'category') {
return null;
}
return {
'@context': 'https://schema.org',
'@type': 'ItemList',
itemListElement: collectAllChildrenUrls().map((href, idx): ListItem => ({
'@type': 'ListItem',
'position': idx + 1,
url: toAbsoluteUrl(href),
})),
}
}
function collectAllChildrenUrls(): string[] {
const category = useCurrentSidebarCategory();
const filteredItems = filterDocCardListItems(category.items);
const urls = filteredItems.map(findFirstSidebarItemLink).filter((href) => href !== undefined);
return urls;
}
interface ListItem {
readonly '@type': 'ListItem';
readonly 'position': number;
readonly url: string;
}
interface ItemList {
// Format:
// - https://schema.org/ItemList
// - https://developers.google.com/search/docs/appearance/structured-data/carousel#summary
readonly '@context': 'https://schema.org';
readonly '@type': 'ItemList';
readonly itemListElement: readonly ListItem[];
} Adding this structured data to tag/author pages could be a less disruptive approach, considering your point about changing SEO profiles of existing sites.
Confirmed this works, thank you for the suggestion. About nofollow, no index on tag anchor: After reading more on this, I think we may still want to add them to not consume the crawl budget. If a page has
This can be double-edged.
I was wrong to sound certain here. Because Google’s ranking and indexing system is a complex, ever-changing “black box” (due to non-linear AI analysis), any data we show would have limited long-term value. That's why I quoted Google guidelines, because this is the only source of truth we have rather than speculations. And we know for sure that tag pages have zero "meaningfully unique content" as you put it. Regarding tag page indexing: I observed tag pages being indexed and ranked higher than main content pages,
You're correct that I only use docs due to the technical nature of the content. While I could add manual descriptions, my website in question pulls docs from an external source (privacy.sexy) that updates frequently, making separate description maintenance challenging. P.S.: The privacy.sexy community recommended Docusaurus, and I'm increasingly appreciating its clean architecture and documentation. My workaround to deindex tag pages: It may be helpful for others, this is how I noindex tag pages to resolve the issue in the thread: See my workaround
import Head from '@docusaurus/Head';
import React, {type ReactNode} from 'react';
// To solve SEO issues, see: https://github.com/facebook/docusaurus/issues/10835
export function NoIndexMetadata(): ReactNode {
return (
<Head>
<meta name="robots" content="noindex" />
{/*
No need for nofollow:
- Google has confirmed that noindex pages don't pass link signals anyway
- Adding nofollow wouldn't change anything in terms of link equity
- The internal navigation links are legitimate site structure
*/}
</Head>
);
}
// Wrapped to solve SEO issues, see: https://github.com/facebook/docusaurus/issues/10835
// Root component of the "docs containing tag X" page.
import React, {type ReactNode} from 'react';
import DocTagDocListPage from '@theme-original/DocTagDocListPage';
import type DocTagDocListPageType from '@theme/DocTagDocListPage';
import type {WrapperProps} from '@docusaurus/types';
import { NoIndexMetadata } from '@site/src/components/NoIndexMetadata';
type Props = WrapperProps<typeof DocTagDocListPageType>;
export default function DocTagDocListPageWrapper(props: Props): ReactNode {
return (
<>
<NoIndexMetadata />
<DocTagDocListPage {...props} />
</>
);
}
// Wrapped to solve SEO issues, see: https://github.com/facebook/docusaurus/issues/10835
// Root component of the tags list page
import React, {type ReactNode} from 'react';
import DocTagsListPage from '@theme-original/DocTagsListPage';
import type DocTagsListPageType from '@theme/DocTagsListPage';
import type {WrapperProps} from '@docusaurus/types';
import { NoIndexMetadata } from '@site/src/components/NoIndexMetadata';
type Props = WrapperProps<typeof DocTagsListPageType>;
export default function DocTagsListPageWrapper(props: Props): ReactNode {
return (
<>
<NoIndexMetadata />
<DocTagsListPage {...props} />
</>
);
} |
Thanks for the feedback I think we are only using structured data for blog paginated pages, but we don't for blog authors/tags and we don't either for docs tags (which I haven't seen used that often in practice) or category index pages. That may explain why some of your docs tags pages are ranking higher than the actual docs pages. It's reasonable to add more structured data that we don't have today, and see what happens on our own website SEO + those willing to adopt this in canary. |
Have you read the Contributing Guidelines on issues?
Description
Solution
Proposed API:
User experience, add new flag for
DocusaurusConfig
indocusaurus.config
file such asdeindexTags: true
.Proposed changes:
<a>
elements will haverel="noindex nofollow"
attributes on tag lists pages.Tag
to do he check from siteConfig.<meta name="robots" content="noindex, nofollow">
.DocTagsListPage
andBlogTagsListPage
to add theHead
with noindex meta.<meta name="robots" content="noindex, nofollow">
.DocTagDocListPage
andBlogTagsPostsPage
to add theHead
with noindex meta./tags**
sitemap.ignorePatterns:[${tagsBasePath}**]
.Motivation
Why
Tag pages are thin/low quality, creating duplicated content.
This leads to search engines scoring the website lower, or indexing tag pages before the specific pages.
Google says 1:
I have solved this through wrapping/swizzling list pages, tag components and and custom
sitemap.ignorePatterns
rule in the config file, but it's a lot of workaround and a best-practice like this would be appreciated if it came as default.Background
This has lead to me issues with all search engines for privacylearn.com to present open-source scripts pre-launch.
siteliner analysis:
Google indexing status:
Engines include Google, Yandex and Bing where thousands of my pages got de-indexed over time and tags pages took more priority than proper pages.
API design
No response
Have you tried building it?
No response
Self-service
The text was updated successfully, but these errors were encountered: