Trustpilot: Building AI for your company means not falling in love

Trustpilot, an online review platform, faced a problem that most companies are dealing with these days: how to build AI to meet business goals or solve problems — from scratch. It’s not a one-size-fits-all answer, of course; it depends entirely on what you need to solve, if you build or bolt on a solution, and so on.

Size is another concern. A giant like Microsoft can snap up Semantic Machines and drastically improve its smart assistant capabilities rapidly — almost a case of the rich getting richer. But it’s more difficult if you’re not a tech giant. Trustpilot is by no means tiny, with seven offices and some 700 employees globally, but it’s also not a large enterprise company. Although it’s fresh off of a $55 million funding round, its resources aren’t infinite. And unlike the Microsofts of the world, it’s not a company that makes technology. It provides a service using technology, and as technology evolves alongside changing customer wants and business needs, companies like Trustpilot have to hustle to keep up.

The idea of taking your business from one that’s devoid of AI to one that depends on AI can be daunting. Having traversed this very journey, Ramin Vatanparast, Trustpilot’s chief product officer, had some thoughts on the subject and examples of what Trustpilot has done, which he shared in a presentation and fireside chat at Transform 2019.

Solve a problem, don’t fall in love with the technology

It’s all too easy to become enamored of AI. It’s the latest thing right now, and its potential is shiny, vast, and seemingly limitless. For businesses, that allure may be coupled with a nagging need to keep up with the Joneses, too. But Vatanparast said that Trustpilot consciously avoided falling in love with any particular technologies and instead focused on the outcomes the company wanted to achieve.

In fact, Trustpilot came to AI out of necessity rather than desire. As the company grew over the past 12 years, so did the amount of data it had to deal. Vatanparast said that Trustpilot snags an online review every two seconds and boasts more than 3 billion monthly impressions on those reviews each month. He expects that by the end of 2020, Trustpilot will clear 100 million total reviews. Trustpilot currently has 560TB of data. Of that, 55GB is unstructured or semi-unstructured data; 30TB is “cleaned, processed, or tagged”; and 17TB is in a data warehouse and available to customers to generate insights.

The company’s goal was to create the most trusted online review platform. Solving its data problems stood in the way of that goal, and Trustpilot determined that the only way it was going to be able to handle that data was by using AI.

In embarking on the AI journey, Trustpilot tried to avoid pitfalls. “Most of the companies are jumping on the revenue-based solution. So they’re looking to apply AI where there’s a possibility of generating revenue,” said Vatanparast. But Trustpilot focused more on using AI to improve its core product — online reviews — and ensuring that those reviews were trustworthy.

Cultural and structural shifts

With that vision in mind, Trustpilot started down the path to AI slowly and deliberately. “First, we created a layer on the company foundation,” said Vatanparast. That began with creating an “AI culture” within the company, which required educating the staff. They brought in a data scientist to work more closely with other teams, helping to demystify AI internally. Individuals and teams were encouraged to learn more about AI and try out ideas without the threat of consequences. “It’s OK to fail in the beginning, as we are in the beginning of the journey,” he said.

Eventually, experimentation gave way to more practical considerations and the need to actually make something useful. As Vatanparast noted, if the AI you’re working with isn’t aimed at meeting specific goals or solving problems the company is trying to address, you’re going to have problems getting it into production. But resources were a problem — although Vatanparast spun that as a positive: “We have limited resources, which sometimes is good. It helps you to focus more.” He added, “We tried to avoid [building] a laser cutter if we needed a knife.”

Into the weeds

For Trustpilot, there were two key areas that needed the artificial intelligence treatment: detecting and removing fake or spammy reviews, and providing better insights from review data.

The answer to the first issue is the Trustpilot Fraud Engine. Trustpilot already had a human team inspecting its reviews for fakes and spam, along with crowdsourced moderation by consumers and companies. That resulted in more than 5,000 notifications per month, but the volume was such that the company wasn’t able to scale it.

So Trustpilot built the Fraud Engine, using technologies and techniques including supervised and unsupervised ML, predictive analysis, statistical outlier detection, graph analysis, and neural networks. They had to create “fake score” parameters; if a given review hit a certain threshold, it would be automatically removed, and it would inform a human reviewer about the action. Now, 81% of the fake reviews and spam posts on Trustpilot’s platform are caught by AI.

The second thing Trustpilot needed to handle via AI was its Review Insights. “The problem was that successful companies who are working with Trustpilot … are getting around 1,000-10,000 reviews per month,” explained Vatnaparast. He said that it’s become a challenge for those companies who were relying on the virtuous cycle of consumer feedback to help them improve their products. Facing those thousands of comments, however, they’ve struggled to decipher which ones are useful, which ones they needed to reply to, and how to apply those findings to improving their products and services.

Put another way, Trustpilot had a hole in its platform. The data was there and available, but it wasn’t helpful. So the company built a sentiment classification model that detects positive and negative feedback in reviews. The system can understand from the reviews how consumers are feelings towards a product or service, and then businesses can make changes and track to see whether people are responding positively to those changes or not.

According to Vatanparast, the sentiment part is crucial, because even within a five-star review — which, going by stars alone, would appear to be a perfect review — there would be some negative sentiments expressed. It’s essentially, “I’m happy about everything, but — ,” he said. And that feedback, from a loyal or mostly satisfied customer, is more instructive “negative” feedback than you can glean from a one-star review, where the customer is just upset and may be grousing.

Equipped with the sentiment classification model, Trustpilot has been to analyze 35 million reviews and detect 85 million “sentiments.”

And then there’s the ethics

You can’t escape the issue of ethics in AI, even in places like Trustpilot’s review system where it doesn’t seem obvious that it would matter. Recalling some of the ethical challenges Trustpilot faced, Vatanparast said, “One example is around how you build a model and how accurate the model is. How do you treat the data and go after behaviors?”

The AI may flag 1,000 reviews on Trustpilot’s platform as fake or spammy, but 10 of those might actually be trustworthy. Is it better to keep the fraud detection model tight and accept that 1% false positive rate in order to remove those 990 offenders? Or should you loosen the model to avoid any false positives but then let more fake reviews slip through? Vatanparast wasn’t explicit about where Trustpilot falls on that particular question, but he did say that the company is constantly asking those sorts of questions and adjusting accordingly. But it’s a balance.

That also raises the issue of transparency. As Trustpilot is honing its models, it needs to show the math, as it were. If the process is constantly shifting, however slightly, then the results will change. For the purpose of maintaining trust with companies and customers using the Trustpilot platform, that cannot be a total black box. Again, though, there’s a balance: How much information is too much to share?

The next challenge

There is virtually endless room for improvement with an AI you deploy in your business. For Trustpilot, the next hurdle has to do with improving its language model.

In the AI field, there’s much fervent discussion about the need for clean data to train models. It’s a classic garbage in, garbage out situation. Ironically, Trustpilot needs a language model that works on junky data. “You could use Wikipedia to generate a language model,” he said, noting that such a model would have mostly correct grammar, usage, and spelling. “But you can’t apply that model to the [Trustpilot] reviews, because the reviews are not clean data. They’re not structured. They definitely in many cases don’t have the right spelling.”

He used the word “cheap” as an example. Used in different ways, “cheap” can mean “inexpensive,” which is a positive sentiment. But it can also mean “poor quality,” which is negative. Thus, out of context, the word “cheap” is useless, or at least significantly problematic, for creating any kind of reliable sentiment measure.

The English language is full of those sorts of oddities and quirks, so the task there is significant enough, but then there are numerous other languages to deal with in the same way. Trustpilot collects reviews from users all over the world in many different languages. “Creating models around the English language is definitely much easier, but when you look to other languages, it becomes tougher,” Vatanparast said.

Trustpilot has reached out to the IBM Watson team for help on Nordic languages, hoping to hand over its “messy” (and anonymized) data to improve Watson’s language models, and then ideally Trustpilot can use the updated models to be more accurate in its own system. It’s a process that ostensibly benefits both parties, and Trustpilot hopes to repeat the process with other organizations to keep improving the AI that is now core to its business.