Teaching Computer to Understand Social Media Content: Are We Almost There Yet?

    Attention: open in a new window. PDFPrintE-mail

      Teaching Computer to Understand Social Media Content: Are We Almost There Yet?
      By Alisa Kongthon
      September 27, 2016

      Undoubtedly, social media has changed the way we communicate. Before social media, we were limited to communicate with people we somehow knew personally. We were also limited to the means that we used to interact with other people. There were many things we had to deal with that younger generations do not have to: waiting for a letter to arrive in your mailbox, spending loads of money for a long distance call or waiting for days to get your roll of 35mm film developed.

      But with the Internet and social media, we can now communicate with hundreds or even thousands of people all over the world with the click of a button. It is so common to see people who have thousands of friends on Facebook and thousands of followers on Twitter or Instagram. We can now share our pictures and video clips the moment we take them with our friends on social networks. People are now more willing to share their personal information than ever before. Opinions and comments can be offered to a wider audience in real-time.

      An in-depth analysis of the content of opinions could help us understand the preferences of people towards many different topics; including the news, social and political issues, and commercial products and services. Of course, humans have the ability to read and interpret the meaning of these contents very easily, but can only process a limited amount of information. And, with the tremendous amount of content being generated by social media, it is simply not possible for a human to read a relevant amount. Thanks to the advancement of technology, now computers can assist humans to do so.

      Opinion mining and sentiment analysis are approaches used to help extract opinions from social media and analyze them. They can be utilized in several ways. In term of marketing, for example, opinion mining and sentiment analysis can help a company evaluate the success of advertising or new product campaigns and identify which models of a product or service are popular. Feedback from customers can be taken into account in developing marketing strategies, product development plans and improvements to customer service. Being able to identify this kind of information in a systematic way, the company perceives a much clearer picture of public opinion than typical surveys or focus group interviews.

      One interesting thing about social media is that it changes the way we converse and write. Many words seem as if they are misspelled or grammatically incorrect. Not to mention that abbreviations, acronyms and “emoticons” such as OMG, LOL, :) and ;) are also commonly used on social media. These are ways to expedite communications by shortening exchanges, so that we do not have to type out the whole expressions or sentences. Some new words such as “selfie,” “wefie,” “tweet,” “hashtag” and “unfriend” originated from social media. The following sentence seems typical of the language commonly used nowadays in cyberspace: “I had a g8t time 2day :)  Cu b4 lunch tmrw!” We could say that people who communicate on social media often seem to be speaking a new language.

      Human language itself is very complex. Teaching a computer to understand human language alone is definitely not an easy task. On top of that, the new form of social media language is also filled with sarcasm, non-traditional language and expressions. Therefore, it is even more difficult for computers to be able to understand online content.

      The simplest way to teach a computer to understand sentiment is to assign a degree of positivity or negativity to a word. For example, “beautiful” would be labeled as a positive word as well as “happy” or “lovely”. “Hate” or “terrible” would be identified as negative words. The problem with this methodology is that it does not take into account of the context in which that assigned word appears in the text. For example, without understanding the context, the computer would not be able to apprehend the concept of irony and sarcasm because sarcasm is not normally intended to be obvious. Therefore, a statement must be surrounded by some sort of context to be understood as sarcastic.

      Consider the following sentence: “What a lovely view! I can see a dumpster from my hotel room.” A human has the ability to understand this is a sarcastic statement. But how about a machine? Without context analysis, the computer will assign a positive sentiment to this statement because of the word “lovely”. Bringing context into the picture is probably the best approach to identify sarcasm with reliable results. So basically, given a set of sarcastic statements, a computer could learn (or memorize) the fact that these statements are sarcastic but never understand why. Therefore, a question remains regarding how reliably a computer can identify sarcasm within a new data set or topic.

      The majority of opinion mining and sentiment analysis research has focused on text written in English. Many free or commercial social media monitoring and analysis tools also support only the English language. In Thailand where the number of social media users has increased drastically in the past few years, there is definitely a need for tools to help analyze this massive content explosion.

      At the National Electronics and Computer Technology Center (NECTEC), we have developed various technologies as the building blocks needed to produce effective and creative solutions to address these very challenging problems. These technologies include natural language processing for Thai language, text mining, opinion mining and sentiment analysis. Thai language belongs to the class of non-segmented language groups in which words are written continuously without using any explicit delimiting characters (e.g. space and punctuation). This poses a greater challenge compared with analysis of English text since some words can be ambiguous or meaningless if we segment them incorrectly.

      Although the current version of our sentiment analysis tool still applies a lexicon-based approach (i.e., assigning degree of positive and negative expression to a word) and syntactic patterns to analyze Thai social media content, we are now on the verge of exploring a context-based approach. With our current algorithm, we are able to analyze comparative statements. We can also extract and associate upon which aspect of company’s product or service customers base their like or dislike. We are very enthusiastic about our results. Yet, we are still eager to improve the algorithm in the hope that one day we will be able to make computers understand Thai social media language as if they were human.


        Items details

        • Hits: 204 clicks
        • Average hits: 68 clicks / month

        TCE-Plugin by

        บทความนี้มีประโยชน์มากน้อยเพียงใด: / 0