Post Featured Image

Take a Look at Our New SEO Page Generator (Prerender)

This post is a follow-up to a webinar we held on SEO and the page generator. The changes are to the core code, and therefore all SuiteCommerce sites, regardless of the version of SCA they are running.

EARLY UPGRADES — rollout of the new SEO page generator, described below, is beginning. If you want your site to be among the first migrated, please contact support.

The SEO page generator is functionality built into the backend of SuiteCommerce sites. It solves a particular problem faced when search engines visit your site: they may not execute some or all of your JavaScript like a normal shopper's web browser would.

SuiteCommerce sites are built using single page application (SPA) architecture, which means that a single HTML page is loaded the first time it is requested, and then it is dynamically updated later without having to reload the page. While it has a lot of benefits, it is problematic because of its heavy reliance on JavaScript.

So, while we're confident most shoppers will get the experience and data they need when they visit one of our sites, we're not sure that search engine crawlers will. If a search engine doesn't get the whole picture, this could negatively affect a site's search engine ranking, and that's not something anyone wants.

The solution is what we call an SEO page generator. It is an application that runs in the background, on the server, that has the capability to fully execute JavaScript and return the generated HTML.

In other words, the page generator will detect a search engine and, rather than serve them the files necessary to run the SPA themselves, will serve them a full HTML page of content. This also works for human browsers who have disabled JavaScript in their browser.

Now, that's all well and good — but what's changed?

Well, while the solution's conception is fine, we were aware of a number of limitations that were causing issues for a number of customers. Thus, we've decided to swap out the application that produces the HTML for a newer and better one.

Goodbye, V8 and Envjs; hello, Prerender.

Problem Statement

Before we talk more about Prerender, let's take a look at two of the biggest issues that we were facing with the existing setup: poor performance and memory inefficiency.

Rendering Time (and Timeouts)

In comparison, the old system worked slowly.

As pages grow complex, with more JavaScript and HTML to render, it takes longer for the generator to work. This has two potential implications:

  1. Search engines may penalize you if a page takes what they consider too long a time to load
  2. Your page may take so long to load, that the request times out and an incomplete page is served

This effectively put a limit on the complexity of your web pages. In some cases, sites had to compromise on the functionality they implemented on their site, just so that they would output decent HTML.

Out of Memory

In a similar vein, there were issues when a page would grow so large that the page generator could not build the whole page.

Again, this put a limit on your site, but this time with the size of your web pages. If you wanted to generate a lot of HTML, use a lot of scripts, handle a lot of data, etc, then you would frequently find yourself running out of memory.

Prerender

Our solution to these woes is software called Prerender.

Prerender is a more advanced, more modern page generator. It is open source and super fast, and is the ideal replacement for our old offering.

A good thing about our implementation is that it is a direct swap out of technology behind the scenes. You don't need to upgrade your bundles or add in new code to take advantage of it. In fact, a few of our customers have been testing it and we're starting to get an idea of the benefits:

  • Pages are rendered about twice as fast
  • Time to the first byte is faster
  • Memory use is more efficient
  • More pages are covered

Our old method was to run a standard V8 JS engine with Envjs for DOM APIs. This really wasn't cutting it. In comparison, Prerender is a complete, proper solution for what we're trying to achieve with full support for ECMAScript5 and, for you massive nerds out there, the DOM4 API.

This is effected by moving from what you might call a 'virtual browser' implemented in JavaScript, to a proper headless browser service. What this means is that you can and should treat it like you would any other browser as part of your testing. The API support is what you would expect, the overhead is the same as a normal desktop browser — it is a genuine browser. So, test it like you would Internet Explorer, Chrome, Firefox, Safari, etc.

Testing and Debugging

Before we look at what we've improved, let's go over some basics for testing and debugging.

If you're already familiar with this, then you'll be pleased to know that you can still trigger the page generator's output by attaching ?seodebug=T to end of your page's URL. We also strongly recommend that you also attach a unique URL parameter at the end as well, as this will ensure that an freshly generated, uncached version is supplied.

NOTE — until this functionality is rolled to all sites, you may have to attach seoprerender=T to the end of the URL as well. This forces the application to use the Prerender engine, rather than the old one.

As things haven't changed too much, you can refer to an article we wrote a while ago about coding and debugging with the SEO page generator (note, however, that some of the information around error messaging and debug output has changed). We also have three important documents for you to examine:

  1. SEO Page Generator Best Practices
  2. SEO Page Generator Performance Statistics
  3. Troubleshooting Your Website

Hiding or Showing Content To or From Search Engines

Within SuiteCommerce code, you can specify whether to render a block of JavaScript only to customers or only to search engines by making use of a value attached to the SC global variable:

SC.isPageGenerator()

When this evaluates to true, this means that we are not dealing with a regular, JavaScript-enabled shopper: we are most likely dealing with a search engine crawler. When that happens, we can exclude content from them.

The classic example is not showing links to the quick view modals on items in product lists. But there are smarter things to do, for advanced users; for example, if you would normally make a call to load an external script or service (eg for a livechat program) then you can (and should) exclude this from the page generator.

For testing the page generator output, you can wrap some code in a conditional that evaluates this and see what's returned.

Improved Debugging

One of the new changes is improved output from the page generator when you do log data. Let's take a look at some of the changes.

After attaching the URL parameter and a random string, take a look at this log output:

[02:48:47.622] [    +2 ms ] Requested URL with SEO generator relevant params: https://www.example.com/?seodebug=T
[02:48:47.622] [    +0 ms ] Source URL: https://www.example.com/DEMO/shopping.ssp?seodebug=T
[02:48:47.622] [    +0 ms ] Rewrite Path: /s.nl?sitepath=/DEMO/shopping.ssp
[02:48:47.773] [  +151 ms ] Generated the frame page for the requested URL
[02:48:49.701] [ +1928 ms ] Got a response from Prerender
[02:48:49.702] [    +1 ms ] Memory usage:      17.671875MB
[02:48:49.703] [    +1 ms ] CPU usage:         0.290000s
[02:48:49.703] [    +0 ms ] Sub request total: 1.201000s
[02:48:49.703] [    +0 ms ] Details of 8 sub requests:
               GET https://www.example.com/c.12345/DEMO/shopping.environment.ssp?lang=en_US&cur=USD&X-SC-Touchpoint=shopping&t=1518557601992 [status 200]
               Requested at 2018-02-16T10:48:48.086Z and responded by 2018-02-16T10:48:48.173Z (which took 87ms)
               GET https://www.example.com/c.12345/DEMO/languages/shopping_en_US.js?t=1518557601992 [status 200]
               Requested at 2018-02-16T10:48:48.086Z and responded by 2018-02-16T10:48:48.287Z (which took 201ms)
               GET https://www.example.com/c.12345/DEMO/javascript/shopping.js?t=1518557601992 [status 200]
               Requested at 2018-02-16T10:48:48.087Z and responded by 2018-02-16T10:48:48.288Z (which took 201ms)
               GET https://www.example.com/cms/2/assets/js/postframe.js [status 200]
               Requested at 2018-02-16T10:48:48.087Z and responded by 2018-02-16T10:48:48.285Z (which took 198ms)
               GET https://www.example.com/cms/2/cms.js [status 200]
               Requested at 2018-02-16T10:48:48.088Z and responded by 2018-02-16T10:48:48.286Z (which took 198ms)
               GET https://www.example.com/api/cms/session/domain [status 200]
               Requested at 2018-02-16T10:48:48.472Z and responded by 2018-02-16T10:48:48.686Z (which took 214ms)
               GET https://www.example.com/api/cms/versions?site_id=2&c=12345 [status 200]
               Requested at 2018-02-16T10:48:48.688Z and responded by 2018-02-16T10:48:48.803Z (which took 115ms)
               GET https://www.example.com/api/cms/pages/contents?c=12345&n=2&page_type=home-page&path=%2F&version_id=312&site_id=2&c=12345 [status 200]
               Requested at 2018-02-16T10:48:48.913Z and responded by 2018-02-16T10:48:49.071Z (which took 158ms)


[02:48:49.704] [    +1 ms ] *** All requested URLs with headers (begin) 
[02:48:49.704] [    +0 ms ] Header count: 9
[02:48:49.704] [    +0 ms ]

Let's take a look at some of these sections.

Timings and Timeout

Firstly, this line:

[02:48:49.701] [ +1928 ms ] Got a response from Prerender

The timing in square bracket indicates how long it took for Prerender to render the page (ie, 1.928 seconds). This can be a useful diagnostic. For example, if your page is timing out, then you might see this error:

[09:17:25.701] [ +22065 ms ] SeoGenerator:prerender:Error in SEO Page Generation. The SEO page rendered for the URL https://www.example.com/?preview=22847 &seodebug=T&seonojscache=T&seoprerender=T can be incomplete.

Here you can see that the page took over 22 seconds to render, and thus timed out.

One thing to note about our implementation of Prerender (compared to V8) is that the timeout limit has been lowered. Previously it was 30 seconds and there are few reasons for dropping it. Part of the reason for this is that we expect it to generate pages much faster, so if your page is taking about 20 seconds to render, then you've got serious problems with your page and you need to re-evaluate it.

There's another reason, which relates to HTTPS. HTTPS connections will time out after waiting for 30 seconds, so we need to make sure that a response is given within that time (even if it's a bad one).

Sub-Requests

Now look at this line (and the idented lines below it):

[02:48:49.703] [    +0 ms ] Details of 8 sub requests:

Sub-requests are all the additional HTTP calls made during the page load. You can see the sorts of things that are called: the JavaScript for the SMTs, the environment SSP file, etc, as well as what their statuses were and how long they took.

If one of them fails, you'll get something like this:

GET https://www.example.com/shopping.environment.ssp?lang=en_US&t=1516304321167 [status 502]
Requested at 2018-02-08T14:03:41.693Z and responded by 2018-02-08T14:03:41.784Z (which took 91ms)

Note that the status is a 502, which is the 'bad gateway' error (but it could easily be a 404, for example).

Also note that as of writing this, only GET requests are supported by Prerender. In the vast majority of cases, this won't affect you but in testing we found one site that would perform a POST on page load, which returned some data that they used on the page. I mean, that's not a terrible thing to do in itself, but Prerender couldn't perform the POST and so the data was not being called in (and therefore not being included).

Re-Rendering

Keep in mind, that the Prerender page delivered by the server is re-generated in the browser, and that the page you see in the browser is not what crawlers and search engines will see. If you want to see it (and you should, for testing) then you will need to disable JavaScript in your browser and reload.

Depending on your browser, you may be able to disable JavaScript quickly for certain sites. For example, Chrome lets you quickly add sites on which JavaScript should be disabled by going to chrome://settings/content/javascript.

Alternatively, you can use CLI commands like wget or curl. For example:

curl -O https://www.example.com/product123?seodebug=T&preview=23456

This will fetch the JavaScript-disabled HTML for your product detail page and then output it into a text file in the directory you're currently working in. You can then examine the raw HTML — and what messages are being returned by the debug tool.

The important thing is that you compare it to the JavaScript-enabled version: has all the important content rendered? Can you navigate around the site? While it might not be transactional, it should be navigatable.

Cache Busting

When pages are requested from the page generator, they may be served from the cache. If this is the case, then the output from the debugger will be the cached output — in other words, you will be served data from the first time this URL was requested, and not necessarily the latest. What we need to do is what is called a 'cache miss'; ie, get a page direct from the server, missing the cache entirely.

As mentioned earlier, you can get around this may be appending some unique parameters to the end of your request URL. Just make sure you don't accidentally append a NetSuite-reserved parameter to end — go with something like preview=<someRandomNumber>.

Disabling the Page Generator

One final thing: if you wish — and you're sure — we can disable the page generator on a per-site basis.

Generally speaking, we advise against this, but there are some specific circumstances where customers may wish to disable the page generator from firing:

  • You do not want your site indexed by search engines
  • You do not care about your search engine rankings
  • You have access restrictions on your site, such as password-protected site
  • Your site is used for something like intracompany procurement

In other words, disabling the page generator will negatively effect your search engine rankings but may provide some small performance benefits, if your pages are complex or otherwise taking a long time to load.

If you are certain you want to disable the page generator, contact support.

Summary

OK, so let's review some of the changes that are coming in with the Prerender implementation:

  • No more (hopefully) 'out of memory' errors
  • Improved rendering time
  • Improved time to first byte
  • Improved memory efficiency (so pages can be larger and more complex)
  • Improved page coverage
  • Improved HTML and JavaScript support
  • Improved logging
  • Seamless upgrade — no need to update bundles or code

I think that's a pretty solid list for a new feature.

In addition to what was said above, here are some other small best practices our developers want to share:

  • Ensure that any “server polling”-type behavior is hidden from Prerender
  • Be aware that asynchronous XHR is supported, and server requests may complete out of order
  • console.log() is supported, and is visible in the output enabled by ?seodebug=T

Generally speaking, if your site is working fine then you won't notice anything different in your day to day operation. The changes are there to aid search engine crawling as well as debugging issues that arise while crawling. Still, it's a good opportunity to dust off your testing cap and get clicking around your site with different browsers (including the page generator) and seeing what doesn't work.