Post Featured Image

Find Out What Artificial Intelligence and Machine Learning Could Mean for SuiteCommerce

This blog post deals with potential features and functionality that are developing in the worlds of web development, artificial intelligence and ecommerce. It is not intended to necessarily be illustrative of the work we, NetSuite, are working on, or a commitment to do so.

I want to make clear that these are just things we are thinking about — things we are exploring. In particular, I want to talk about the possible features that could result in a successful union between SuiteCommerce and technologies that commonly fall under the 'aritifical intelligence' umbrella.

On a personal level, I am inherently skeptical about about features that are commonly touted as being 'powered' by AI. I think there's a high bar that needs to be cleared before something can be classified as AI, but I am willing to accept a broader definition of what AI is, in order to talk about it.

What makes this discussion relevant to this blog is how we might apply principles and concepts from the AI world to the world of ecommerce, specifically SuiteCommerce. When conducting research on the applications of AI on the web in general, there are a number of already available in the wild; applying them to a web store means going beyond simply copying-and-pasting them into the store. We need to look at how they could fit, and whether they are desirable and useful. In other words, it's easy to be swept up by an 'AI revolution' without stopping to evaluate whether they're more than just novelties.

A secondary, but crucial, concept in the umbrella is that of machine learning. AI could be described as dealing with virtual agents that can interpret data and make intelligent (and correct) decisions about how to act on that data in a variety of contexts. Machine learning has a more specific focus, specifically on how systems or applications might gather and process data to 'learn' on their own, without supervision.

Needless to say, there is a lot of overlap between the two, which is why you will frequently hear people just call it 'AI' when they might not mean AI specifically. In short, I would probably use the term 'machine learning' in most scenarios, reserving 'AI' for when we're actually talking about something that performs the actions of a person.

With that in mind, let's summarize some applications and then spend some time looking at each of them in turn:

  1. AI chatbots that use natural language processing (NLP)
  2. Speech recognition for hands-free and eyes-free interactions
  3. Search result relevance improvements
  4. Web store experience personalization

All of these are somewhat frontend inclined features; ie, things that would immediately be obvious to the shopper. There are a number of applications for this technology that could go on behind the scenes — these are outside of the scope of the post, but I will quickly mention some at the end.

Anyway, let's dive into the most obvious example first: chatbots.

Chatbots

If you're old enough, you may remember Clippy, Microsoft's intelligent assistant that appeared in their Office suite and provided help to users. It wasn't a chatbot (you couldn't have a conversation with it) but it would certainly use natural language to ask questions about what you wanted to do, and offer help on achieving your goals.

More modern applications of virtual assistants appear frequently on websites as front-line support staff. You've probably seen them when you visit a website: they'll prompt you to interact with them in a pop-up, and if you do, they'll frequently tell you the sorts of things they're able to do. In some cases they will mandate the types of interactions you can do (eg provide buttons to press) but others are more sophisticated and use natural language processing. So, what are these two flavors of chatbot?

Support Assistant

From an ecommerce perspective, they can be used to provide slick, new interfaces to your site's features.

Indeed, one of the key benefits of using bots as front-line staff is that are immediately available, 24/7. They can give instant responses at any time on a range of topics, from responding with the answers to FAQs to personalized responses, specific to the customer.

For example, your chatbot could be programmed to check a customer's order status after they arrive, and see if there are any outstanding orders. If one of your pre-programmed questions is, "Where's my order?" then your chatbot could provide a copy of your stock response (eg, "Average delivery time is 2-4 days after placing an order") as well as a personalized update about their last order (eg, "Your order #12345 should arrive tomorrow before 1pm"). All of this information is already available on your website, but the chatbot is programmed to get it and provide it in a way that a human would.

In addition to 'read-only' responses, you could also empower your bots to make changes either to the shopper's record, or to one of their orders, at the shopper's request. If the shopper selects the option to discuss their recent order, you could include an option to change the shipping method to a more expedited courier service, for example.

Just keep in mind that the purpose of this bot is to provide a different interface for the features already available on your site. In other words, just because you could, for example, offer the ability to cancel an order through a chatbot doesn't mean you should, when you don't offer it through the normal self-service account features. In this scenario, you may prefer to insist that they talk to one of your human agents. The shopper, however, will still have successfully made it through 'first-line' support.

In summary, these bots are great for providing new interfaces for existing site functionality, as well as providing a level of personality to the experience. They also have the benefit of being available all the time, which is handy for weekends, evenings and busy times.

Personal Shoppers

In a nutshell, personal shopper bots help guide shoppers from first introductions, to browsing a store's inventory,

For retailers and food delivery services, this has become a particularly exciting area to watch for new developments. Over the past few years, start-ups have sprung up to try and corner the market and businesses are keen to invest. And there are notable examples alive in the wild right now that you can experiment with.

Writing for the Washington Post, Sarah Halzack provides an excellent look at the state of the art in April, 2016. There are two great examples she cites: the Kik bot H&M use which can suggest outfits to you, and the 'TacoBot' Taco Bell made available in Slack (an example of which is shown below, courtesy of Taco Bell).

You can see in the above interaction how the user did not need to even need to visit Taco Bell's site or app to place an order. This certainly helps build more channels of revenue for them, while also providing convenience for office workers who don't want to leave the context of their favored office productivity tool.

Similarly, H&M built their app into a platform that was also not their site or app: Kik. Indeed, one of the key features they developed was a recommendation service where they would ask you basic questions about who you are and what your preferences are, and then show you collections of outfits based on the responses. Do you identify as a man or a woman? Do you like relaxed outfits? Do you like these shoes? You get the idea; after a while, it should have some good confidence about whether you're going to like some of its outfits, and will begint to offer you some. From there, it has the ability to ask your sizes and add the items to the cart, and begin the checkout process.

While I can't seem to find evidence of whether it is actually the case, there is at least the possibility that this could then be used to provide feedback to the system that powers it to improve the recommendations it makes to other shoppers. For example, if shoppers routinely dislike a particular outfit choice, then its rank in the list of recommended outfits could be lowered, with alternatives (ie ones more likely to lead to conversions) being offered instead. This would be a good application of machine learning in commerce (a self-improving recommendation engine) that can be added as a feature to AI (the chatbot).

To put it another way, when it makes an outfit recommendation to a shopper, what percentage of users convert? Once we have that number, we can match up the success rate against the 'profile' of the people involved. We can then not only get statistics about which outfits are the best performers, but with whom those outfits performed the best with. This is important for us, but it's important to the bot because it will help improve its recommendations. This correction/reinforment mechanism is very important for machine learning.

Natural Language Processing

Away from simple prompts and pre-programmed possibilities is the world of natural language processing and understanding — this is where a virtual agent learns how to interpret any sentence that a human user gives them and provide a response.

NLP is much harder to get right because while there are plenty of people working on it (and producing commercially available resources), the trick becomes teaching the AI how to interpret the results of what the person asks for in the context of your web store.

In other words, it's relatively easy for a computer to understand literal commands, but getting them to understand conversational, vague or colloquial language is much harder. This is why a lot of chatbots use buttons or require users to use exact phrases. For example, if a chatbot were to ask a shopper if they wanted to proceed in a process, the chatbot might only understand the words 'yes' or 'no' — but a shopper might reasonably respond with words like 'sure', 'nah', 'OK', or something noncommittal like 'I dunno'. Natural language conversations are one of the chief goals of AI.

If we were to see NLP on a SuiteCommerce site, the trick would be about getting the bot to understand our site. In other words there are two parts:

  1. Understanding what the shopper has said
  2. Translating that into an action on the site

For example, if the shopper is asked how they can be helped today and they respond with something like, "I need new shoes for a wedding", the bot has should send them to the category page for dress shoes. It has to understand what the shopper wants in the context of the web store. It also needs to 'know' when an interaction falls outside of its context, eg, when TacoBot was asked whether the chicken or the egg came first, it recognized this as 'chit chat' rather than, say, a request to add egg to a taco. In other words, it may be great if your bot can understand that a shopper wants to buy smart shoes for a wedding, but if you're a B2B car part seller, your bot needs to know how to tell someone they're in the wrong place.

Speech Recognition

Anyone with a smartphone will probably have, at least once, interacted with it using their voice. A lot of people use their device's voice assistant quite regularly, in fact. Beyond Siri, Cortana, and Google's unnamed assistant, Amazon has also made significant waves with their Alexa voice assistant, who comes in standalone units for people's homes. People, I guess, are using chatbots already — asking them random factual questions, asking for directions, adding items to their next grocery shopping, etc.

While they are chatbots, similar to the ones we just described, there all have significant features that puts them a cut above the chatbots we just described:

  1. Speech recognition — they understand the sounds you're making and translate them into words
  2. Hands-free interaction — they can be activated and used without having to touch them (ie they are always listening, at least in a 'dormant' capacity)
  3. Eyes-free interaction — they can be interacted with without having to see a screen or some other display (ie they use speech synthesis to 'speak' back to you)

The capacity to understand human speech and then respond using it is incredibly valuable. Indeed, such is their power that we routinely interact with our virtual assistants as if they were human.

From an ecommerce perspective, there are perhaps two avenues available to us:

  1. Enhance a site's chatbot so that it goes beyond text-to-text interactions, using speech recognition and synthesis to further humanize the interaction
  2. Make a site compatible with hands-free and eyes-free browsing, so that it is possible for a device's virtual assistant to complete the checkout process on behalf of the shopper

I am not sure how feasible the second option is from a SuiteCommerce point of view. Anyone's who used Alexa to purchase something from Amazon know that it is a seamless process. However, keep in mind that Amazon develop both the system/APIs and the device: they can connect the two together relatively trivially. Outside of this ecosystem, there is no open standard for hands-free/eyes-free interactions of this kind. While NetSuite and SuiteCommerce has APIs, SuiteScript and other means for, say, browsing items and placing orders, considerable work would need to be done to create comparable experiences.

However, device manufacturers can open their devices up to developers for their own applications. For example, Alexa has a mature developer kit, which can be programmed to perform custom skills. Could Alexa be programmed to enable interactions with your web store system? Probably. The issue, as I said, is a single standard that is applicable across all virtual assistant devices, rather than having to integrate it into each one. This may not happen, especially if you consider the fractured reality of OS and browser market shares. It may be a while before your customers are able to ask any assistant the status of your order with your store; it could be sooner if you're willing to just limit yourself to, say, just Alexa.

What may be available far sooner is the first option; ie, adding in the ability to interact with a web store using voice commands through a web browser. This is because there are open standards for web browsers to support speech recognition and speech synthesis. It's called the web speech API and it covers both; if you're interested, you can read the Mozilla docs on it, or the W3C specification. Both contain simple examples that you can tinker with. However, if you're interested, I've had a go at creating something quick for my site.

Integrating Speech Recognition into a Web Store

I have created a proof-of-concept extension that illustrates that, at least, that the API works and that it is possible to perform actions using only one's voice. I spent about two days working on this (so the code is pretty hacky) but take a look (and a listen 🔊) at this video:

I created a button that, when pressed, uses my device's microphone to listen to what I want it to do. My device then uses its inbuilt speech recognition software to work out what I said. It returns a string of what it thinks I said, along with a confidence rating between 0 and 1. In my POC, I have scripted basic code that reads the first words of that string to see if it matches one of my keywords; if it does, I pass it to a handler for that keyword. This is very crude.

However, if someone were to spend more than a couple of days on this (ie months or maybe years) then you could certainly build something far more sophisticated and robust.

If you're interested in taking a look at the code, then you can see it here: SCEXT-VOICE.zip. Some things to keep in mind, if you try to use it yourself:

  1. This is very experimental
  2. It is currently only compatible in Google Chrome (despite being an open standard, only they have implemented it)
  3. The Chrome security team considers microphone usage a 'powerful feature', so its policy requires that it only be used on secure domains; therefore your site will need to have a secure shopping domain and a secure local server

Anyway, I'm going to talk through some of the module.

Speech Recognition Basics

The button is generated using a view and a template; the majority of the work happens in the view file. At its base, it looks like this:

define('SteveGoldberg.Voice.Voice.View'
, [
    'Backbone'
  , 'voice.tpl'
  ]
, function
  (
    Backbone
  , voice_tpl
  )
{
  'use strict';

  return Backbone.View.extend({
    template: voice_tpl

  , events: {
      'click #voice-button': 'startRecognition'
    }

  , initialize: function (options)
    {
      this.container = options.container;
      this.PLP = window.PLP = this.container.getComponent('PLP');
      this.Environment = this.container.getComponent('Environment');

      var that = this;

      console.log('Voice view loaded');
      var SpeechRecognition = SpeechRecognition || webkitSpeechRecognition;

      this.recognition = new SpeechRecognition();
      this.recognition.lang = 'en-GB';
      this.recognition.interimResults = false;

      this.recognition.onresult = function (event)
      {
        that.handleSuccess(event);
      }

      this.recognition.onspeechend = function () {
        this.stop();
      }

      this.recognition.onnomatch = function (event) {
        console.error('Error recognizing speech');
      }

      this.recognition.onerror = function (event) {
        console.error('Error', event.error);
      }
    }

  , startRecognition: function ()
    {
      this.recognition.start();
      console.log('Ready to receive a command.');
    }

  , handleSuccess: function (event)
    {
      var last = event.results.length - 1;
      var command = event.results[last][0].transcript;
      var that = this;

      console.log('Command: ' + command)
      console.log('Confidence: ' + event.results[0][0].confidence);
    }
  })
})

We start by invoking the SpeechRecognition object and constructing a new instance of it. Note that we set a couple of properties: one to determine the speaker's language (mine is British English) and the other to disable interim results, which basically means that we want it to process the speech only once the user has stopped speaking (interim results are better for longer sentences, conversations, note-taking, etc).

Then we set handlers for the events that could happen when a user uses speech recognition: when there's a result (ie the software understands and is able to return a string), when the user stops talking, when the software fails to understand, and what happens when there is an error (eg there is no permission granted to use the microphone).

The success handler is super basic (at this stage) as it essentially just logs what it thinks the user has said to the console, along with its confidence. If you watched the video of me above, then you'll notice that there was one time where I didn't enunciate clearly enough and it didn't understand what I said. One of the cool things you can do is effectively test your elocution by saying a word or sentence in different ways and seeing what word it returns and the confidence of that word.

We can improve confidence by using the grammar property. Essentially, what this does is enable us as developers to specify particular words or phrases that we want the speech recognition software to listen out for. In other words, if the software can't decide between particular words, it will check if it is similar to one we have told it to expect — thus, we can boost the confidence of the speech recognition software. For example, if we're expecting the user to specify a color, we can provide it with an array of colors. Thus, when a shopper specifies the color most close to, say, scarlett or ruby, it will understand that the shopper said "red" rather than "read", which are homophones.

Handling Particular Actions

Again, just to reiterate: this is hacky code. I wouldn't consider using it in the wild.

So, I've added/replaced the following code to my view:

, isPLP: function ()
  {
    return this.PLP && this.PLP.getUrl();
  }

, handleSuccess: function (event)
  {
    var last = event.results.length - 1;
    var command = event.results[last][0].transcript;
    var that = this;

    console.log('Command: ' + command)
    console.log('Confidence: ' + event.results[0][0].confidence);

    if (~command.indexOf('search for'))
    {
      that.performSearch(command);
    }
  }

, performSearch: function (command)
  {
    var keyword = command.replace('search for ', '')
    if (this.isPLP())
    {
      this.PLP.setSearchText({searchText: keyword})
    }
    else
    {
      Backbone.history.navigate('/search?keywords=' + keyword, {trigger: true})
    }
  }

So, in my crude code I have added a handler for when the shopper starts a command with "search for". If it's present in the string, we pass it to performSearch(), which then triggers the keyword search. Note that if we're on a PLP page, we can simply use the extensibility API method, otherwise we use the quick-and-dirty method of writing the URL manually and sending it directly to Backbone (yuck!).

Anyway, it works as a proof-of-concept: I was able to search for stuff with my voice! What about other actions?

, getFacetColors: function ()
  {
    var configColors = {};

    if (this.isPLP())
    {
      configColors = _.findWhere(this.PLP.getAllFilters(), {id: 'custitem31'})
    }
    return configColors ? configColors.values : {}
  }
  // refinements could eventually be handled in a more flexible way, ie using "refine by" as the keyword and then creating something the handles the type of refinement
, refineByColor: function (command)
  {
    if (this.isPLP())
    {
      // Because I'm British, it uses British spelling on my devices
      // Also, this assumes that you use multi-select for colors
      var color = command.replace('refine by colour ', '');
      var facetColors = this.getFacetColors();
      var filters = {};
      var colorArray = [];
      var newFilters = {};
      var isColor = !!_.findWhere(facetColors, {label: color});

      if (isColor)
      {
        colorArray.push(color);
        newFilters.custitem31 = colorArray;
      }

      this.PLP.getFilters().forEach(function (filter)
      {
        filters[filter.id] = filter.value
      });
      _.extend(filters, newFilters);

      this.PLP.setFilters({filters: filters});
    }
  }

I have two new functions: one to pull out the full complement of color refinements and the other to apply it to the current collection of search refinements. The first bit is pretty straightforward: use the extensibility API to go through all of the possible filters and find the one that matches my custom field ID for the colors (in my case, this is custitem31).

Then, in the second function, we have a couple of interesting things. Firstly, because I am actually British and I'm using devices configured for British spelling, I have to account for the fact that the string the speech recognition service is going to return is going to contain British spellings; in our case, this means that "colour" is going to be returned instead of "color". Rather than deal with this in some sort of proper way, I'm just going to hardcode it into my site (yuck!).

Anyway, once we know the user is wanting to refine by color, we check whether the returned string is a valid color, where 'valid' means 'in our list of refinable colors'. If it is, we then need to add to prepare that color for the extensibility API. Now, you'll note that I am putting the value into an array, and then an object. This is because the color refinement is configured on my site to enable multiple selections — this is a configuration option. In other words, some sites allow shoppers to refine by multiple values per facet; others prefer only one value at the time. I was going to build some code that figures this out based on the value returned from the configuration, but I figured that I'm already deep down the rabbit hole of hacky code, that I'm just going to hardcode it in for my site.

Then, once we have that, we need to apply as a refinement. The extensibility API allows you to set filters, but it doesn't let you add to the existing filters. So, there's just a little bit of code so that we get all the existing filters, format them, merge them with our new filters, and then apply them. The result is that we can now specify "refine by color purple" and the site will... refine by the color purple.

For the final part of the demo, I coded in a generic command: "click on". It's a bit of a catch-all command, but it works by basically commanding the system to find a particular link, button or input with the provided text and then process a click event.

In order to find the right thing to find we're going to use jQuery's :contains(). Actually, we're not. Annoyingly, it is case-sensitive. Strings from the speech recognition are returned in lowercase, and the text used in links and buttons are not all lowercase. So, I had a look online and borrowed someone's code that extends jQuery to provide something like :contains() but for string text regardless of its case.

So, in the view's initialize() method, I am putting in:

jQuery.extend(jQuery.expr[':'],
{
  'containsNC': function(elem, i, match, array)
  {
    return (elem.textContent || elem.innerText || '').toLowerCase().indexOf((match[3] || '').toLowerCase()) >= 0;
  }
});

After that, we add a new method to our view:

, clickOn: function (command)
  {
    var text = command.replace('click on ', '');
    var $el = jQuery('a:containsNC("' + text + '"), button:containsNC("' + text + '"), label:containsNC("' + text + '")')

    $el.length > 0 ? $el[0].click() : null
  }

After taking the text we're looking for out of the command, we try to find elements on the page that match it. I've created selectors for anchor, button and label (for PDP options) tags and it'll create an object with everything it finds. If it finds any, then it'll select first one in the object, which is pretty dumb, but it works in a lot of scenarios (eg if you want to log in, add to cart). Anyway you get the point.

The point of creating such a demo goes beyond simple curiosity about how well the API works — it lets us explore whether shoppers would actually find such a service useful. One person I spoke to about it said that it would really only be useful if it was smarter; for example, if she could say to a chatbot, "I'm looking for something for a party" and then have it return outfit suggestions, then that would be good. Otherwise, issuing simple commands via voice would not be particularly useful. I think that's a fair assessment, which ultimately puts this sort of thing very much in the novelty category.

There may be applications for users with accessibility requirements, but a quick search reveals that there are already tools for users who rely on this. Still, it's at least an exciting demonstration of what is possible. Who knows, perhaps some breakthrough will happen that make this more than a novelty.

Search Relevance

Now: a pivot to data.

Some of the most likely, but perhaps the most muted improvements we'll see, will come to our ability to gather, model and use data. I talked a bit about this earlier, so let's take a look at a couple of uses for a machine learning application of data; the first is search relevance.

At the moment, our default ordering for search results is the mysterious 'relevance' system. What exactly relevance is is hard to define. But what if this was only the starting point for relevance, and we had additional data to go on that might improve the ordering of results?

Well, we're already looking at ways for merchandisers to do this themselves with search boosting. Search boosting essentially lets you determine additional weights to attach to items so that they either rise in the rankings, or sink lower in them. Got a product you want to expose to shoppers more frequently? Raise it up! Have one that you're not keen on promoting? Sink it down.

This is all well and good, but it begs a philosophical question: to what end? What are you trying to achieve specifically? For example, if you boost a particular item, are you hoping that it increases:

  • Awareness (surfacing an item to shoppers)?
  • Interactions with shoppers (clicking through to its PDP)?
  • Conversions?

Ultimately, you could just say all three, but you may have different goals. You may simply want customers to be more aware of the range of your inventory. You may want to test a new product and see if people are interested in it. You could also have questions like, "If I promote product X, will conversions for product Y go down?". Indeed, while search boosting won't necessarily answer these questions for you, this sort of data may prove interesting, and it could prove valuable to machine learning systems when trying to determine what the 'best' order for products are.

What this means is that we move beyond simply relying on the intuition of the merchandiser, and instead empower them to make informed decisions. We should be able to track the performance of items based on the factors that led to their surfacing. In other words, we could discover, for example, that when a particular product is surfaced for specific keyword searches, it performs well, but not so well in others.

All of this is, broadly speaking, 'analytics'. However, the trick comes when we factor in machine learning. If we find ourselves in a situation where we are able to generate, store, and analyze this data, then we coud create models to teach a machine learning system. Then we could see what suggestions it makes to improve awareness, interaction, conversions.

Personalization

In a similar hypothetical vein is the vague area of 'personalization', or what you might call 'hyper-personalization'. Generally speaking, this means that whenever a shopper visits, we use whatever data we have about them to change their user experience. But it means going beyond simple demographical information, or the shopper's previous purchases: it means building up more data and making better changes and recommendations.

Sure, recommending products based on previous purchases seems obvious, but how smart is the algorithm that powers it? If I've bought shirts from a retailer before, then sure, offer me more shirts — a guy could always buy more of them. But if I just bought a toaster, do you need to recommend me more toasters? How many toasters does one man need?

The key is data, and the relationships between data points. What other user experience changes could you make based on the data you have about a shopper? For B2B customers, this could mean that you tailor your site — dynamically — for your shoppers based on who they are, and what they are to you. Indeed, gathering that data basically underpins this whole operation and will surely need to form the foundation of any future AI/ML projects in ecommerce.

Final Thoughts

There are a whole load of other potential uses for it; I thought about fleshing them all out but this is already a long post, so here are some more ideas:

  • Fraud prevention — teach a bot to learn what fraudulent transactions look like and then have it automatically flag/prevent orders that look fraudulent
  • Segmentation/targeting — find new groups of shoppers by having a bot analyze your shopper data, so you can target them better
  • Automatic re-order — if you sell consumables, you could predict when a shopper is going to need them again and offer to automatically re-order them
  • Stock forecasting — if you sell out of a product at a particular time of the year, machine learning could predict when that's going to happen and order more from your suppliers
  • Dynamic pricing — automatically adjust pricing based on the shopper, or if seasonal/market events warrant it

Again, I can't say if any or all of these will make it into SuiteCommerce or, at least, if they will anytime soon. However, we are thinking about these things.

Of all the ones most likely to happen, I think the biggest feature will be chatbots. I mean, they already exist, right? They just need to be a bit more sophisticated. They need to move beyond hardcoded responses and start making decisions for themselves. However, even in their 'limited' (but still impressive) capacity, they can certainly be handy frontline support staff who can do a good job fielding basic questions.

The big 'behind-the-scenes' will be to data gathering. We need to either start capturing more data, figuring out better access to it, start training bots to understand it, or a mix of the three. I look forward to seeing what gets developed over the next few years.