توضیحات
ABSTRACT
Word embeddings — distributed word representations that can be learned from unlabelled data — have been shown to have high utility in many natural language processing applications. In this paper, we perform an extrinsic evaluation of four popular word embedding methods in the context of four sequence labelling tasks: part-of-speech tagging, syntactic chunking, named entity recognition, and multiword expression identification. A particular focus of the paper is analysing the effects of task-based updating of word representations. We show that when using word embeddings as features, as few as several hundred training instances are sufficient to achieve competitive results, and that word embeddings lead to improvements over out-of-vocabulary words and
also out of domain. Perhaps more surprisingly, our results indicate there is little difference between the different word embedding methods, and that simple Brown clusters are often competitive with word embeddings across all tasks we consider
INTRODUCTION
Recently, distributed word representations have grown to become a mainstay of natural language processing (NLP), and have been shown to have empirical utility in a myriad of tasks (Collobert and Weston, 2008; Turian et al., 2010; Baroni etal., 2014; Andreas and Klein, 2014)
Year : 2015
Publisher : Association for Computational Linguistics
By : Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou,Nathan Schneider and Timothy Baldwin
File Information : English Language /11 Page /Size : 383 K
Download : click
سال : 2015
ناشر : Association for Computational Linguistics
کاری از : Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou,Nathan Schneider and Timothy Baldwin
اطلاعات فایل : زبان انگلیسی /11 صفحه /حجم : 383 k
لینک دانلود : روی همین لینک کلیک کنید
نقد و بررسیها
هنوز بررسیای ثبت نشده است.