Post Featured Image

Information on an Ongoing Issue with Google and Duplicate Content

There is an ongoing SEO issue affecting some sites which rely on organic search results from Google for traffic. This issue not limited to NetSuite web stores and appears to have arisen following changes in Google's indexing behavior.

Updates:

  • February 27, 2019 — added an FAQ section
  • March 4, 2019 — added a link to a patch
  • March 8, 2019 — clarified some language, following a conversation with a Google representative
  • March 22, 2019 — completely re-written for clarity
  • April 15, 2019 — added an update
  • April 26, 2019 — new information from Google

April 26, 2019 Update

Google have released new information relevant to this issue that I want to share.

Firstly, they now state publicly that they are aware of the issue we describe and that they are working on their end to resolve it.

We hope that when they mark this issue as resolved, all of our customers who have been affected by this issue will see improvements.

Secondly, we are still working on addressing this issue on our side, but another issue with Google is hampering us. Specifically, Google have announced that there was an issue with their indexing process which prevented updates. A knock-on effect of this is that Google Search Console has not had access to new data since April 8, so we have limited evidence to confirm whether the changes we have made have had an impact.

Finally, unrelated to the updates from Google, I have updated the section on how to tell if you're affected by the issue. I only listed one possible error message when, in reality, there are two (it's just that the second requires a bit more nuance).

April 15, 2019 Update

We have been working directly with customers who are most affected by this issue, as well as with CAP partners who are working with their customers affected by it. In short, the investigations suggest that the most likely cause is to do with Googlebot not rendering pages correctly. We are testing a code change on a couple of sites and they have shown improvements in behavior, so we are going to try it on more sites to see if we are right. If it shows improvement there, we can then roll it out to more sites.

We want to be clear that these code changes we are making should not be considered a 'bug fix', or a 'patch' in the normal sense: we are modifying the code because of the unannounced changes to the way Google appears to have made to its crawling/indexing behavior.

We are also aware that some site operators are frustrated by perceived slowness on resolving this issue. Please do keep in mind that updates to Google's index can take a long time, and therefore judging whether particular changes are correct can be difficult to ascertain in a short period of time.

We are mindful of the fact that when we released a previous patch, some people assumed this resolved the issue entirely (despite us making it clear that it does not), so we are cautious to release additional patches until we have a high level of certainty that they do resolve the solution.

In other words, we ask for your continued patience with this matter and we hope to have a resolution soon.

Everything below this line is previous content.

Issue Summary

Google is alerting customers through the Google Search Console that some of their pages are 'duplicates' of other pages of their sites, and is therefore excluding those pages from its search index. This is an issue because the pages that Google has flagged as duplicate are, in fact, unique pages with substantially different content. Therefore, Google is incorrectly flagging these pages as duplicate and is wrong to not index them.

This issue is primarily affecting product list (category) pages and product detail pages. Product detail pages for completely different products are being flagged as duplicated when they are not.

How to Find Out If You Are Affected

Open your site in Google Search Console.

  1. Click on Index > Coverage in the left navigation
  2. When the page loads, click on the Excluded button
  3. In the Details section, look for a large number next to these messages:
    • Duplicate, submitted URL not selected as canonical
    • Duplicate, Google chose different canonical than user
  4. Inspect individual URLs but hovering over them and clicking the magnifying glass next to them
  5. In the Indexing section, you should see the User-declared canonical URL of the page and then the Google-selected canonical URL — if these are for completely unrelated/different pages (eg unique products) then you may be affected by this

An important thing to note about when inspecting reported URLs is that there are frequently legitimate duplicates reported in these groups. Furthermore, depending on how your site is configured, you may see varying numbers of each type. In other words, simply seeing these messages (or high numbers of them) is not necessarily evidence of the issue: you need to examine the individual URLs.

We would also like to point out that while we have received a number of cases from customers about this issue, it does not seem to be affecting only NetSuite web stores. We have found mentions of the same issue from people off of the platform as well. As such, we are not convinced that this is solely a NetSuite issue.

Secondary Effects

Some customers who are affected by this issue are also reporting a loss in organic traffic and sales. If your site relies on keyword searches for shoppers, this is particularly problematic.

Areas of Investigation

The reason why this issue is on-going is, in short, because we have not got to the bottom of what is causing this issue.

The strongest indicator at the moment is that this is a rendering problem. Our working theory is that Googlebot is bypassing the SEO page generator and is trying to render the pages like a normal user. While it appears that Googlebot is capable of running our site's JavaScript, we do not think it is capable of running it reliably. What this means, therefore, is that we think Googlebot is not rendering pages, or is not rendering pages correctly.

What could cause this rendering problem is not immediately apparent.

Emptying the Pre-Rendered Content but not Rendering the SPA

One area we looked at was to do with how Googlebot handles the loading and discarding of pre-rendered content from the page generator, and then attempt to load the content from the single-page application.

We did identify an issue with how Googlebot was doing this, for which we issued a patch. In short, when a user receives pre-rendered content and then attempts to render the SPA, we tell it to discard the pre-rendered content. When the SPA content has finished processing, it will fill the gap left by the pre-rendered content. We identified some cases where Googlebot was emptying the main div on the site, and then failing to render the SPA content. This meant that Googlebot saw an empty page.

The patch corrects this behavior by telling Googlebot not to discard the pre-rendered content. It does not fix the underlying issue, but should still be applied to correct this specific issue.

Failure to Get All Resources

Some testing has shown that Googlebot is struggling to reliably get all of the required resources for a page, such as a response from the items API. When this happens, Googlebot could be shown an error page but one that is served through the single-page app. What this means is that it is being served an OK (200) message, rather than a Page Not Found (404) error or some sort of server error (5xx) — in these cases, Googlebot would see these error pages as identical.

We are still investigating this.

Structured Data Confusion

Web store elements are built using Schema.org structured data where appropriate. This markup makes it easier for search engines to identify the important data on a page. For PDPs, this includes things like the item's name, price, description, etc.

When related items are included on a PDP, we think it may be possible that Googlebot is seeing a list of products on the page and considering them materially identical to a page it has visited before that has the same list of products on it. It could be possible, therefore, that this is causing issues.

Again, we are still investigating this.

FAQ

Is This Only Affecting NetSuite Web Stores?

We have identified that this issue is affecting all kinds of ecommerce sites. It affects some sites on the NetSuite platform running SuiteCommerce, SuiteCommerce Advanced and Site Builder, as well as some sites outside of our platform running other software such as Shopify.

For example, in a recent Google Webmaster Central office hours segment (32m, 17s), a very similar issue was raised. Google's response has been that it's something they consider strange and that they are investigating, but there's no transparency on this, so we don't know what's happening or what they know.

I have put some links in a section below of other people reporting similar issues.

What Can I Do?

Unfortunately, there are no conclusive steps to remedy this situation at this time. But there are small steps you can do to improve things:

  1. Check to see if it's affecting you — use Google search console performance and look under Index > Coverage > Excluded; if you are affected, raise a support case and ask to have it attached to NetSuite issue 524693
  2. Give us access to your Google search console, even if you're not affected (this will give us more information) — add seo[at]netsuite[dot]com to your user list
  3. Apply the patch mentioned above to ensure that unreliable JavaScript execution cannot result in empty page content
  4. Review the content of your product detail pages for excessively similar content — this might be some informational areas of your PDPs that are universal across your inventory (such as returns information, sizing advice, manufacturer information, etc) and consider moving them to separate pages off of the PDPs or to have them load dynamically in modals

Additionally, if you have a commercial agreement with Google, we encourage you to raise your concerns with them directly, passing on this blog post and examples of pages on your site that have incorrectly been flagged as duplicates.

My Site Was Migrated to the New Page Generator the Same Time the Issue Started. Is it Related?

Some sites were moved to the new SEO page generator at this time, but most were not. We have found evidence of sites that were affected by this issue at a time when they were running the old page generator, and were unchanged when they migrated to the new one.

My Site Was Migrated to the On-Site Search Service the Same Time the Issue Started. Is it Related?

There is no evidence for this. On-site search relates to the indexing and searching of content on the NetSuite platform — it is not related to the indexing and searching of content by third-party search engines such as Google.

Additional Links

The following links provide additional context about canonicals and Google search console:

These links are reports from webmasters describing the same or similar problems, or discussions about the problem: