توضیحات
ABSTRACT
Machine learning and data mining are research areas of computer science whose quick development is due to the advances in data analysis research, growth in the database industry and the resulting market needs for methods that are capable of extracting valuable knowledge from large data stores. This chapter gives an informal introduction to machine learning and data mining, and describes selected machine learning and data mining methods illustrated by examples. After a brief general introduction, Sect. 1.2 briefly sketches the historical background of the research area, followed by an outline of the knowledge discovery process and the emerging standards in Sect. 1.3. Section 1.4 establishes the basic terminology and provides a categorization of different learning tasks. Predictive and descriptive data mining techniques are illustrated by means of simplified examples of data mining tasks in Sects. 1.5 and 1.6, respectively. In Sect. 1.7, we highlight the importance of relational data mining techniques. The chapter concludes with some speculations about future developments in data mining.
INTRODUCTION
Machine learning (Mitchell, 1997) is a mature and well-recognized research area of computer science, mainly concerned with the discovery of models, patterns, and other regularities in data. Machine learning approaches can be roughly categorized into two different groups: Symbolic approaches. Inductive learning of symbolic descriptions, such as rules (Clark & Niblett, 1989; Michalski, Mozetic, Hong, & Lavra ˇ c, 1986) decision trees (Quinlan, 1986) or logical representations (De Raedt, 2008; Lavrac & ˇ Dˇ zeroski, 1994a; Muggleton, 1992). Textbooks that focus on this line of research include (Langley, 1996; Mitchell, 1997; Witten & Frank, 2005). Statistical approaches. Statistical or pattern- recognition methods, including k-nearest neighbor or instance-based learning (Aha, Kibler, & Albert, 1991; Dasarathy, 1991), Bayesian classifiers (Pearl, 1988), neural network learning (Rumelhart & McClelland, 1986), and support vector machines (Scholkopf & ¨ Smola, 2001; Vapnik, 1995). Textbooks in this area include (Bishop, 1995; Duda, Hart, & Stork, 2000; Hastie, Tibshirani, & Friedman, 2001; Ripley, 1996). Although the approaches taken in these fields are often quite different, their effectiveness in learning is often comparable (Michie, Spiegelhalter, & Taylor, 1994). Also, there are many approaches that cross the boundaries between the two approaches. For example, there are decision tree (Breiman, Friedman, Olshen, & Stone, 1984) and rule learning (Friedman & Fisher, 1999) algorithms that are firmly based in statistics. Similarly, ensemble techniques such as boosting (Freund & Schapire, 1997), bagging (Breiman, 1996) or random forests (Breiman, 2001a) may combine the predictions of multiple logical models on a sound statistical basis (Bennett et al., 2008; Mease & Wyner, 2008; Schapire, Freund, Bartlett, & Lee, 1998). This book is concerned only with the first group of methods, which result in symbolic, human-understandable patterns and models.
Year: 2012
Publishe: Springer
By: J. F¨ urnkranz, Lavrac & Grobelnik
File Information: English Language/ 18Page / size:465KB
Download: click
سال :2012
ناشر : Springer
کاری از : J. F¨ urnkranz, Lavrac & Grobelnik
اطلاعات فایل : زبان انگلیسی /18 صفحه / حجم : 465KB
لینک دانلود : روی همین لینک کلیک کنید
نقد و بررسیها
هنوز بررسیای ثبت نشده است.