توضیحات
ABSTRACT
The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering, and feature selection—common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it.
INTRODUCTION
Bioinformatics research entails many problems that can be cast as machine learning tasks. In classification or regression the task is to predict the outcome associated with a particular individual given a feature vector describing that individual; in clustering, individuals are grouped together because they share certain properties; and in feature selection the task is to select those features that are important in predicting the outcome for an individual. The Weka data mining suite provides algorithms for all three of these problem types. In the bioinformatics arena it has been used for automated protein annotation (Kretschmann et al., 2001; Bazzan et al., 2002), probe selection for geneexpression arrays (Tobler et al., 2002), experiments with automatic cancer diagnosis (Li et al., 2003a), developing a computational model for frame-shifting sites (Bekaert et al., 2003), plant genotype discrimination (Taylor et al., 2002), classifying gene expression profiles (Li and Wong, 2002), and extracting rules from them (Li et al., 2003b). Many of the algorithms in Weka are described in (Witten and Frank, 2000). Real datasets vary: no single algorithm is superior on all data mining problems. The algorithm needs to match the structure of the problem to obtain useful information or an accurate model. The aim in developing Weka was to permit a maximum of flexibility when trying machine learning meth- ods on new datasets. This includes algorithms for learning different types of models (e.g. decision trees, rule sets, linear discriminants), feature selection schemes (fast filtering as well as wrapper approaches) and pre-processing methods (e.g. discretization, arbitrary mathematical transformations and combinations of attributes).
Year: 2004
Publishe: University of Waikato
By: Eibe Frank , Mark Hall , Len Trigg , Geoffrey Holmes , and Ian H. Witten
File Information: English Language/ 2 Page / size:213KB
Download: click
سال : 2004
ناشر : University of Waikato
کاری از : Eibe Frank , Mark Hall , Len Trigg , Geoffrey Holmes , and Ian H. Witten
اطلاعات فایل : زبان انگلیسی / 2 صفحه / حجم :213KB
لینک دانلود : روی همین لینک کلیک کنید
نقد و بررسیها
هنوز بررسیای ثبت نشده است.