Name: DATA MINING FOR IMBALANCED DATASETS: AN OVERVIEW
SKU: 1243
Availability: InStock

DATA MINING FOR IMBALANCED DATASETS: AN OVERVIEW

0 تومان

A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniquesto difficult”real-world” problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performanceof a classifier, might not be appropriatewhen the data is imbalanced andlor the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniquesused for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

توضیحات

ABSTRACT

A dataset is imbalanced if the classification categories are not approximately qually represented. Recent years brought increased interest in applying machine learning techniquesto difficult”real-world” problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performanceof a classifier, might not be appropriatewhen the data is imbalanced andlor the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniquesused for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

INTRODUCTION

The issue with imbalance in the class distribution became more pronounced with the applications of the machine learning algorithms to the real world. These applications range from telecommunications management (Ezawaet al., 1996), bioinformatics (Radivojac et al., 2004), text classification (Lewis and Catlett, 1994; Dumais et al., 1998; Mladeni6 and Grobelnik, 1999; Cohen, 1995b), speechrecognition (Liu et al., 2004), to detection of oil spills in satellite images (Kubat et al., 1998). The imbalance can be an artifact of class distribution and/or different costs of errors or examples. It has received attention from machine learningand Data Mining community in form of Workshops (Japkowicz, 2000b; Chawla et al., 2003a; Dietterich et al., 2003; Fem et al. 2004) and Special Issues (Chawla et al., 2004a). The range of papers in these venues exhibitedthe pervasive and ubiquitous nature of the class imbalanceissues faced by the Data Mining community. Samplingmethodologies continue to be popular in the research work. However, the research continues to evolve with different applications,as each application providesa compellingproblem. One focus of the initial workshops was primarily the performance evaluation criteria for mining imbalanced datasets. The limitation of the accuracy as the performance measure was quickly established. ROC curves soon emerged as a popular choice (Fem et al., 2004). 2004) and Special Issues (Chawla et al., 2004a). The range of papers in these venues exhibitedthe pervasive and ubiquitous nature of the class imbalanceissues faced by the Data Mining community. Samplingmethodologies continue to be popular in the research work. However, the research continues to evolve with different applications,as each application providesa compellingproblem. One focus of the initial workshops was primarily the performance evaluation criteria for mining imbalanced datasets. The limitation of the accuracy as the performance measure was quickly established. ROC curves soon emerged as a popular choice (Fem et al., 2004).

Year: 2004

Publishe: Universityof Notre Dame

By: Nitesh V. Chawla

File Information: English Language/ 15 Page / size:2,855KB

Download: click

سال : 2004

ناشر : Universityof Notre Dame

کاری از : Nitesh V. Chawla

اطلاعات فایل : زبان انگلیسی / 15 صفحه / حجم :2,855KB

لینک دانلود : روی همین لینک کلیک کنید

نقد و بررسی‌ها

هنوز بررسی‌ای ثبت نشده است.

اولین کسی باشید که دیدگاهی می نویسد “DATA MINING FOR IMBALANCED DATASETS: AN OVERVIEW”

برای فرستادن دیدگاه، باید وارد شده باشید.

DATA MINING FOR IMBALANCED DATASETS: AN OVERVIEW

توضیحات

نقد و بررسی‌ها

کتاب آموزش راهنمای استفاده از جوملا1.5(از مبتدی تا پیشرفته)

دانلود کتاب آموزش طراحی سایت با دروپال

دانلود رایگان کتاب آموزش CSS

کتاب آموزش برنامه نویسی پایتون

درباره فروشگاه

ارتباط با ما

DATA MINING FOR IMBALANCED DATASETS: AN OVERVIEW

توضیحات

نقد و بررسی‌ها

محصولات مرتبط

کتاب آموزش راهنمای استفاده از جوملا1.5(از مبتدی تا پیشرفته)

دانلود کتاب آموزش طراحی سایت با دروپال

دانلود رایگان کتاب آموزش CSS

کتاب آموزش برنامه نویسی پایتون

درباره فروشگاه

ارتباط با ما