Post Featured Image

Information on an Ongoing Issue with Google and Duplicate Content

There is an ongoing SEO issue affecting some sites which rely on organic search results from Google for traffic. This issue not limited to NetSuite web stores and appears to have arisen following changes in Google's indexing behavior.

Updates:

  • February 27, 2019 — added an FAQ section
  • March 4, 2019 — added a link to a patch
  • March 8, 2019 — clarified some language, following a conversation with a Google representative

Issue Summary

For NetSuite web stores, the issue primarily affects product detail pages and category pages. Google is asserting that a number of pages on a site contain substantially similiar content, and are incorrectly flagging them as 'duplicates'. Google does not like duplicate pages — they will de-index pages which they think are copies of an original. There are legitimate cases where multiple versions of a page are generated on an ecommerce site and should be ignored (for example, page URLs that have item option parameters in them), but many of the duplicate content messages from Google are not instances of these sorts of cases. Instead, Google is saying that pages for different products are duplicates of each other.

In anticipation of possible duplicate content errors, we provide a list of pages to search engines that assert the 'canonical' pages on a site — these are the master pages on a site that we say are unique and should be indexed. When Google finds a page that is a duplicate of a canonical page, the idea is that Google will prefer the canonical page and ignore the duplicate. However, part of the issue is that Google is taking a site's list of canonical pages and saying that some of them are duplicates of each other. While we know that these are not duplicates of each other, we do not know why Google thinks they are.

Issue Discovery

A number of customers started noticing a rise in the number of duplicate content warnings reported to them by Google Search Console. These warnings started occuring around November 2018, although a Google representative has suggested the errors could have been ongoing long before that, and that they onl became visible in the search console.

Site administrators who track their search performance using Google Search Console have noticed an increase in product detail pages being excluded from Google's index, with Google flagging them with the following types:

  1. Duplicate, Google chose different canonical than user
  2. Duplicate, submitted URL not selected as canonical

The effect of both of these types of exclusion is that legitimate product detail pages are being excluded from Google's search results. This means that merchants are missing out on potential visitors and orders because shoppers are not seeing listings for their products in Google.

An image of a graph from the Google search console.

The above image shows a steady increase in the number of pages that Google reports as duplicate, where the submitted URL was not selected as canonical.

What We Know (And What We Don't)

PDPs are naturally quite similar in that they share common site elements (eg, header, footer, navigation), but also additional PDP-specific elements such as size guides, buying advice, manufacturer information, etc. Googlebot is (usually) smart enough to recognise each page's unique content and therefore not flag them as duplicates. However, Googlebot may be thinking that because of these shared elements, the pages are too similar and are flagging them as duplicates. Adding more unique content to a PDP or moving generic content out of these pages could improve things. This advice also applies to category pages.

Around the same time that users reported this issue, Google announced that users would notice an increase in the number of pages flagged as duplicate, where Google chose a canonical different than the ones the user asserted. Why? In short, Google was changing to what they call 'mobile friendly' rendering. This would, they said, only affect the reporting of the site, rather than reflect a change in the site. However, users are noticing a change: their PDPs and category pages are getting flagged as duplicates when they are not. It appears that its bot is rendering pages like normal users, rather than relying on the content provided to it by the SEO page generator.

One possible scenario we found during investigation is where Googlebot does not see the pre-rendered content nor the live content generated by the single-page application (SPA). We found that this was because it was triggering the JavaScript (which is typically not run by crawlers) that first deletes the pre-rendered, SEO page generator static HTML but it was not running the JavaScript that generates the new content. In other words, the pre-rendered content was being vacated, but the SPA content was not replacing it.

We have issued a patch that changes this behavior for the Googlebot user agent, telling it not to delete the generated content.

However, this does not appear to correct the underlying issue, and there is still a question unanswered: is unreliable JS execution a significant part of Google's duplicate-detection strategy?

Following a conversation with a Google representative, we are now confident that Google's testing tool for mobile-friendliness is not the same as how Google's indexing tool works. Therefore, success or failure in the mobile friendliness tool is not necessarily indicative of success or failure in their indexing tool.

Is This Only a SuiteCommerce Problem?

We have identified that this issue is affecting all kinds of ecommerce sites. It affects some sites on the NetSuite platform running SuiteCommerce, SuiteCommerce Advanced and Site Builder, as well as some sites outside of our platform running other software such as Shopify.

For example, in a recent Google Webmaster Central office hours segment (32m, 17s), a very similar issue was raised. Google's response has been that it's something they consider strange and that they are investigating, but there's no transparency on this, so we don't know what's happening or what they know.

I have put some links in a section below of other people reporting similar issues.

What Can I Do?

Unfortunately, there are no conclusive steps to remedy this situation at this time. But there are small steps you can do to improve things:

  1. Check to see if it's affecting you — use Google search console performance and look under Index > Coverage > Excluded; if you are affected, raise a support case and ask to have it attached to NetSuite issue 524693
  2. Give us access to your Google search console, even if you're not affected (this will give us more information) — add seo[at]netsuite[dot]com to your user list
  3. Apply the patch mentioned above to ensure that unreliable JavaScript execution cannot result in empty page content
  4. Review the content of your product detail pages for excessively similar content — this might be some informational areas of your PDPs that are universal across your inventory (such as returns information, sizing advice, manufacturer information, etc) and consider moving them to separate pages off of the PDPs or to have them load dynamically in modals

We will update this page with additional information when we have it.

FAQ

Didn't the issues start roughly at the same time that sites were moved to the new SEO page generator?

Some sites were moved to the new SEO page generator at this time, but most were not. We have found evidence of sites that were affected by this issue at a time when they were running the old page generator, and were unchanged when they migrated to the new one.

My site was moved to the new on-site search engine (Elastic) at roughly the same time. Is this a cause of the issue?

There is no evidence for this. On-site search relates to the indexing and searching of content on the NetSuite platform — it is not related to the indexing and searching of content by third-party search engines such as Google.

Additional Links

The following links provide additional context about canonicals and Google search console:

These links are reports from webmasters describing the same or similar problems, or discussions about the problem: