Debunking NLP: Translation
August 17, 2017
By: Ellen Falci, Product Manager, NLP/Enrichment, Clarabridge
This blog post is part 2 of our Debunking Natural Language Processing (NLP) series. Throughout this series, Ellen will highlight several features that help Clarabridge users go beyond simple topic analysis. This series will show you how new types of analysis aren’t so farfetched after all!
Bad translations are ubiquitous in our globalized world. There are seemingly infinite listicles showcasing very comical mistranslations like this, this and this. Though, my favorite might be TranslationParty which takes your sentence and translates it back and forth through Google Translate between English and Japanese until it reaches equilibrium. The result is inevitably a bastardized but hysterical interpretation of your original sentence.
Translation services used to be a human-centric activity. A bilingual speaker would read the original document and then translate it into the target language. With the advent of the Internet and improvements in computing power, translation services have become machine-centric. They have matured greatly over the past few years and have arrived to the consumer market in many different forms. We see them in free web tools like Google Translate or Bing Translate that support text-to-text translation. In the past few years, we’ve also seen the growth of speech-to-speech and image-to-text translation tools. The futuristic image of having an earpiece translate audio in near real time is no longer science fiction. And, for common language pairs like English to Spanish or French to German, the quality is actually pretty decent.
However, neither human nor machine translation is perfect. Both are vulnerable to misinterpretation. Machine translation shines at translating key words, especially nouns. Verbs and adjectives, on the other hand, are more difficult to translate automatically; they often have subtleties that are lost in translation. While it’s not so hard to translate a pronoun, preposition or a conjunction in isolation, through machine translation, they often end up completely reorganized throughout the sentence. Add idioms, hyperbole, sarcasm and named entities to your sentence and we have significantly complicated the task for both a human and a machine. While analyzing a translated text, a human can make inferences about the meaning even when the grammar or vocabulary are flawed. However, when we ask a Natural Language Processing (NLP) engine to interpret an imperfect translation, we are asking for trouble.
Natural Language Processing (NLP) is a very difficult task. It’s taken the industry many decades to produce suitable, scalable systems that work on well formatted, natively-written text. As we’ve established, translating text often warps key words and grammatical elements which make NLP even more challenging. Attempts at feeding poorly translated text through an NLP engine may result in failures of identifying negation, modification, sentiment, associated words and topics. The subtlety of meaning, emotion and intent are inevitably lost, jeopardizing your ability to relate to your customer. While NLP engines (including Clarabridge) can be used to analyze both native and translated customer feedback, the output from native text will always be superior in quality and accuracy.
There certainly are cases where translating text is an essential component of business. Marketers must translate web content to target new markets. News agencies must translate headlines in international environments. This pattern does not apply to Customer Experience Management or Customer Experience Analytics. In the CX ecosystem, native language text reigns supreme. You should make every effort to preserve your customers’ intended messages in their truest form. By doing so, you improve your ability to understand them fully and offer them the most empathetic service possible.
But, Ellen, I don’t have anyone on my team that speaks [insert your exotic language here]!
Fair point. It’s better to understand your customers partially than to ignore them completely for not speaking your lingua franca. In this situation, I encourage you to translate the keywords in your topic categories rather than translating the text itself. This procedure is most successful when there is a human in the loop. Translating topics is not the same as translating prose. A translator must consider the domain and data source when translating keywords. (It was a comical moment when we were QA-ing our Automotive template and discovered that “starter” had inadvertently been translated as “appetizer.” The translator had not properly considered the domain of the data!) By translating the keywords that you’re querying for instead of translating the original data, you don’t lose the meaning from your customer’s voice and you can still present your data and insights through whatever lens you prefer.
Machine translation technologies are rapidly evolving and may reach a point where meaning and intent are not warped through translation. But in the meantime, I implore you not to sacrifice meaning for expediency. Your customers will thank you for not playing the telephone game with their feedback!
Come back in two weeks for the next installment of my Debunking NLP series!
Ellen Falci is Clarabridge’s NLP/Enrichment Product Manager. Follow Ellen on Twitter at @ellenfalci.