Felipe Bravo Marquez, del Grupo de Machine Learning de la Universidad de Waikato (Nueva Zelanda), nos ofrecerá un coloquio titulado "Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis".
Acquiring and Exploiting Lexical Knowledge for Twitter Sentiment Analysis
The most popular sentiment analysis task in Twitter is the automatic classification of tweets into sentiment categories such as positive, negative, and neutral. State-of-the-art solutions to this problem are based on supervised machine learning models trained from manually annotated examples. These models are affected by a label sparsity problem, because the manual annotation of tweets is labour-intensive and time-consuming.
This presentation addresses the label sparsity problem for Twitter polarity classification by automatically building two type of resources that can be exploited when labelled data is scarce: opinion lexicons, which are lists of words labelled by sentiment, and synthetically labelled tweets. In the first part of the presentation, we show how to build Twitter-specific opinion lexicons by training words-level classifiers using representations that exploit different sources of information such as (a) the morphological information conveyed by part-of-speech (POS) tags, (b) associations between words and the sentiment expressed in the tweets that contain them, and (c) distributional representations calculated from unlabelled tweets. Experimental results show that the generated lexicons produce significant improvements for tweet-level polarity classification. In the second part, we develop distant supervision methods for generating synthetic training data for twitter polarity classification by exploiting unlabelled tweets and prior lexical knowledge. Positive and negative training instances are generated by averaging unlabelled tweets annotated according to a given polarity lexicon. We study different mechanisms for selecting the candidate tweets to be averaged. Our experimental results show that the training data generated by the proposed models produce classifiers that perform significantly better than classifiers trained from tweets annotated with emoticons, a popular distant supervision approach for Twitter sentiment analysis.
La charla tendrá lugar en el Laboratorio de Programación Avanzada (LPA) del Campus San Joaquín (Sala B038) y se transmitirá por videoconferencia al Auditorio Claudio Matamoros (F-106) en Casa Central.
¡Quedan todos cordialmente invitados!