Sep
22
Written by:
Brendan Haggerty
9/22/2009 6:18 PM

I was watching the news this morning on Channel 4 (
http://www.nbcwashington.com) as they talked about the latest controversy surrounding my beloved Redskins [Please no commments isn’t it painful enough having to watch them play]. Redskins linebacker Robert Henson posted the following Tweet after the game on Sunday, "All you fake half hearted Skins fan can .. I won't go there but I dislike you very strongly, don't come to Fed Ex to boo dim wits!!" (
http://sports.espn.go.com/nfl/news/story?id=4492151) . Joe Krebs implicitily made it a point while reading the Tweet and then explicitly stated the irony that since Henson did not use a comma between “boo” and “dim wits” he was actually calling himself and his teamates dim wits not the fans who booed.
While we can applaud Joe in his attempts to enforce grammatical rules in the social media world, this is a perfect example of the challenges faced by customers as they mine social media content for valuable feedback. We do not use perfect grammar as we tweet and blog. Anyone can grab their grammar textbook from 5th grade and implement programs to parse perfectly written text; however the winners in social media text mining software will be those companies that can effectively interpret intent when faced with real-world issues such as texting abbreviations, sarcasm, incorrect grammar, etc...
These are not trivial problems to solve in software and we cannot throw out the hard and fast syntactic rules we learned in school (even though most of them have long ago left my brain). We must merge the syntax of text, the habits of the speaker and the semantic context to truly determine the intent and sentiment. Typically, it is not feasible to truly assess the habits of the speaker instead we need to focus on the venue of the feedback (Twitter, online reviews, surveys) and the industry (hospitality, financial services, retail, airlines) to develop domain specific rules for interpretation of text. Some examples of this are:
- Feedback in the financial industry tends to be much more serious and direct than feedback in the hospitality industry. Many hotel reviews are actually descriptions of a trip not the hotel itself.
Parsing and interpretation of text will never be 100% correct, but as companies look for solutions to mine social media, I think the most important capabilities to look for in a vendor will be:
- Flexibility to tune – This is still an evolving field and the ability of your software to grow with you and be tuned to your specific needs will go a long way towards how happy you end up.
What do you think? What are the key features you evaluate when comparing text analytic packages?