توضیحات
ABSTRACT
Historically Natural Language Processing (NLP) focuses on unstructured data (speech and text) understanding while Data Mining (DM) mainly focuses on massive, structured or semi-structured datasets. The general research directions of thesetwo fields also have followed different philosophies and principles. For example, NLP aims at deep understanding of individual words, phrases and sentences (“micro-level”), whereas DM aims to conduct a high-level understanding, discovery and synthesis of the most salient information from a large set of documents when working on text data (“macro-level”). But they share the same goal of distilling knowledge from data. In the past five years, these two areas have had intensive interactions and thus mutually enhanced each other through many successful text mining tasks. This positive progress mainly benefits from some innovative intermediate representations such as “heterogeneous information networks” [Han et al., 2010, Sun et al., 2012b]. However, successful collaborations between any two fields require substantial mutual understanding, patience and passion among researchers. Similar to the applications of machine learning techniques in NLP, there is usually a gap of at least several years between the creation of a new DM approach and its first successful application in NLP. More importantly, many DM approaches such as gSpan [Yan and Han, 2002] and RankClus [Sun et al., 2009a] have demonstrated their power on structured data. But they remain relatively unknown in the NLP community, even though there are many obvious potential applications. On the other hand, compared to DM, the NLP community has paid more attention to developing large-scale data annotations, resources, shared tasks which cover a wide range of multiple genres and multiple domains. NLP can also provide the basic building blocks for many DM tasks such as text cube construction [Tao et al., 2014]. Therefore in many scenarios, for the same approach the NLP experiment setting is often much closer to real-world applications than its DM counterpart.
INTRODUCTION
Yizhou Sun is an assistant professor in the College of Computer and Information Science of Northeastern University. She received her Ph.D. in Computer Science from the University of Illinois at Urbana Champaign in 2012. Her principal research interest is in mining information and social networks, and more generally in data mining, database systems, statistics, machine learning, information retrieval, and network science, with a focus on modeling novel problems and proposing scalable algorithms for large scale, real-world applications. Yizhou has over 60 publications in books, journals, and major conferences. Tutorials based on her thesis work on mining heterogeneous information networks have been given in several premier conferences, including EDBT 2009, SIGMOD 2010, KDD 2010, ICDE 2012, VLDB 2012, and ASONAM 2012. She received 2012 ACM SIGKDD Best Student Paper Award, 2013 ACM SIGKDD Doctoral Dissertation Award, and 2013 Yahoo ACE (Academic Career Enhancement) Award.
Year: 2015
Publisher : ACL,7th IJCNLP
By : Jiawei Han, Heng Ji, Yizhou Sun
File Information: English Language/2 Page / size:152 KB
Download: click
سال : 2015
ناشر : ACL,7th IJCNLP
کاری از : Jiawei Han, Heng Ji, Yizhou Sun
اطلاعات فایل : زبان انگلیسی / 2 صفحه / حجم : KB 152
لینک دانلود : روی همین لینک کلیک کنید
نقد و بررسیها
هنوز بررسیای ثبت نشده است.