MimicProp: Learning to Incorporate Lexicon Knowledge into Distributed Word Representation for Social Media Analysis

Picture of Muheng Yan
Muheng Yan
Picture of Ali Mert Ertugrul
Ali Mert Ertugrul
Picture of Meiqi Guo
Meiqi Guo
Picture of Wen-Ting Chung
Wen-Ting Chung
Published at ICWSM 2020
Teaser image

Abstract

Lexicon-based methods and word embeddings are the two widely used approaches for analyzing texts in social media. The choice of an approach can have a significant impact on the reliability of the text analysis. For example, lexicons provide manually curated, domain-specific attributes about alimited set of words, while word embeddings learn to encode some loose semantic interpretations for a much broader set ofwords. Text analysis can benefit from a representation that offers both the broad coverage of word embeddings and the domain knowledge of lexicons. This paper presents MimicProp,a new graph-mode method that learns a lexicon-aligned word embedding. Our approach improves over prior graph-based methods in terms of its interpretability (i.e., lexicon attributescan be recovered) and generalizability (i.e., new words can belearned to incorporate lexicon knowledge). It also effectively improves the performance of downstream analysis applications, such as text classification.

Materials