Debunking Natural Language Processing: Sarcasm
March 11, 2019
When I tell people that I work on text analytics products, their first question is often “But, how do you handle sarcasm?” My typical response—“About as well as a human, which is to say not very well at all”— is equally sincere and sarcastic, a poetic homage to the difficulty of the linguistic problem.
Sarcasm is mired in a complex web of deep cultural knowledge, emotional sensitivity and individual awareness. Some cultures (looking at you, my British friends) find comfort in the dryness of the humor, whereas others find it distasteful or even disrespectful. Sarcasm can be subtle or obtuse; flattering or insulting; funny or offensive. In sarcastic expressions, the words used are only a small piece of the puzzle. Tone matters, body language matters, context matters. Given the high degree of tangible and intangible awareness necessary for sarcasm to succeed, it is perhaps unsurprising that non-native speakers struggle to master both delivery and understanding of sarcasm in learned languages.
All of these components make sarcasm an extremely difficult linguistic problem to study. Linguists have classified sarcasm into specific sub-types including irony, satire, passive aggression and flattery. They’ve determined that your brain works differently when processing sarcastic comments in comparison to sincere ones. Others have identified the facial tics that betray an otherwise earnest face. Most of this research, though, is conducted through individual face-to-face analysis. Conducting broad analyses of sarcasm through text analytics is nearly impossible. Text-only communication lacks tonal and visual cues, making it highly susceptible to misunderstanding and misinterpretation. Other clues for correct interpretation of a comment may be baked directly into the medium and shared among some of its participants, but these clues may be imperceptible or opaque to outsiders.
An NLP engine is not a native speaker of any human language. It understands the rules or the patterns that its human overlords have programmed into it, but it lacks any linguistic intuition. It can understand words used, relationships between those words, and maybe even emotion and intent, but it fails when meaning transcends content. It may understand its context or its purpose but will undoubtedly fail when other niche cultural or societal knowledge is injected into a witty retort. As humans, we could all listen to or read the same passage and walk away with different understandings of its intent. An NLP engine fares about the same. In some situations, it will interpret an ironic comment correctly; in other cases, it will completely miss the mark and produce the exact opposite sentiment as a native speaker would otherwise expect.
The Clarabridge NLP engine errs on the side of sincerity, but, given the flexibility in our sentiment engine, users have the power to customize rules to support common sarcastic phrases that appear in their dataset. I’ve found success in customizing rules for two specific types of sarcasm.
1. Speakers, in an attempt to underscore their emotions, may associate positive actions with negative aspects (or vice versa) such as in the sentences “I love sitting in traffic” or “going to the dentist is the best.” Clarabridge understands word associations, parts of speech and sentiment and allows users to leverage this word-level metadata in the construction of sentiment rules. Users could construct a rule that negated every positive verb associated with “dentist” or “traffic” or “[insert emotionally charged word from your industry here].”
2. Social media posts are now often suffixed with #sarcasm, #sarcastic or #not to aid a reader in interpretation of an otherwise ambiguous post. Users can tune sentiment based off of specific hashtags and positions of these hashtags within posts.
Sarcasm is, without a doubt, an important part of our communicative tools as social beings. However, sarcastic expressions are relatively rare in most types of text. The ability to detect sarcasm through computational means should not be a make-or-break point when deciding which NLP tool to use. We at Clarabridge will continue to investigate sarcasm in customer feedback and how we may be able to improve sentiment accuracy for these expressions, but for now I’ll quote the Comic Book Guy from “The Simpsons”: “Sarcasm detector? Now that’s a really useful invention.”