• A Comprehensive Survey of Data Mining-based Fraud Detection Research


    This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries.

  • A data mining approach for grouping and analyzing trajectories of care using claim data


    Background: With the increasing burden of chronic diseases, analyzing and understanding trajectories of care is essential for efficient planning and fair allocation of resources. We propose an approach based on mining claim data.to support the exploration of trajectories of car

    Methods: A clustering of trajectories of care for breast cancer was performed with Formal Concept Analysis. We exported Data from the French national casemix system, covering all inpatient admissions in the country. Patients admitted for breast cancer surgery in 2009 were selected and their trajectory of care was recomposed with all hospitalizations occuring within one year after surgery. The main diagnoses of hospitalizations were used to produce morbidity profiles. Cumulative hospital costs were computed for each profile

  • A data mining approach for grouping and analyzing trajectories of care using claim data: the example of breast cancer


    Health-care systems face a crisis of an increasing burden of chronic diseases aggravated by aging populations [1]. It is of much importance that policy makers and healthcare managers can make decisions based on sufficient knowledge and understanding of chronic care activities. This is especially true in the field of cancer where incidence, therapeutics, practices and costs can vary quickly [2,3]. On the one hand, policy-makers need cost-effectiveness and cost-of-illness analyzes for planning and fair allocation of funding. On the other hand, care providers should be able to adapt their resources and costs while they share patients in multidisciplinary and coordinated approaches

  • A Delivery Framework for Health Data Mining and Analytics


    The iHealth Explorer tool, developed by CSIRO and DoHA, delivers web services type data mining and analytic facilities over a web interface, providing desktop access to sophisticated analyses over very large data collections. The tool allows users to access large transactional datasets to create profiles of selected patients. The patients’ profiles, together with windowed event sequences data, can then be analyzed using a user chosen data mining tool. The results of the analysis can then be visualized using various forms of knowledge representation methods. Although the initial implementation of the tool is focused on its application in adverse drug reaction exploration, this tool and the embedded data mining tools can be used in broad areas of health data analysis

  • A hybrid evolutionary algorithm for attribute selection in data mining[taliem.ir]

    A hybrid evolutionary algorithm for attribute selection in data mining


    Real life data sets are often interspersed with noise, making the subsequent data mining process difficult. The task of the classifier could be simplified by eliminating attributes that are deemed to be redundant for classification, as the retention of only pertinent attributes would reduce the size of the dataset and subsequently allow more comprehensible analysis of the extracted patterns or rules. In this article, a new hybrid approach comprising of two conventional machine learning algorithms has been proposed to carry out attribute selection. Genetic algorithms (GAs) and support vector machines (SVMs) are integrated  effectively based on a wrapper approach. Specifically, the GA component searches for the best attribute set by applying the principles of an evolutionary process. The SVM then classifies the patterns in the reduced datasets, corresponding to the attribute subsets represented by the GA chromosomes. The proposed GA- SVM hybrid is subsequently validated using datasets obtained from the UCI machine learning repository.  Simulation results demonstrate that the GA-SVM hybrid produces good classification accuracy and a higher level of consistency that is comparable to other established algorithms. In addition, improvements are made to the hybrid by using a correlation measure between attributes as a fitness measure to replace the weaker members in the population with newly formed chromosomes. This injects greater diversity and increases the overall fitness of the population. Similarly, the improved mechanism is also validated on the same data sets used in the first stage. The results justify the improvements in the classification accuracy and demonstrate its potential to be a good classifier for future data mining purposes.

  • Achieving Full Security in Privacy-Preserving Data Mining


    In privacy-preserving data mining, a number of parties would like to jointly learn a function of their private data sets in a way that no information about their inputs, beyond the output itself, is revealed as a result of such computation.

  • An Integrated Data Mining System for Toxicity Prediction of Speciality Organic Chemicals


    The paper describes a novel, user friendly integrated data mining system which can be used by users to analyse in-house commercially  sensitive toxicity data sets. The system provides a unified userinterface and environment in a way that users can be freed from time- consuming data transformation and exchange between tools, pre-processing and preparation as well as tool selection.

  • An Unsupervised Approach to Modeling Personalized Contexts of Mobile Users


    Mobile context modeling is a process of recognizing and reasoning about contexts and situations in a mobile
    environment, which is critical for the success of contextaware mobile services. While there are prior work on mobile context modeling, the use of unsupervised learning techniques for mobile context modeling is still under-explored. Indeed, unsupervised techniques have the ability to learn personalized contexts which are difficult to be predefined. To that end, in this paper, we propose an unsupervised approach to modeling personalized contexts of mobile users

  • Association Rule Mining: A Survey


    Data mining [Chen et al. 1996] is the process of extracting interesting (non-trivial,implicit, previously unknown and potentially useful) information or patterns fromlarge information repositories such as: relational database, data warehouses, XML
    repository, etc. Also data mining is known as one of the core processes of KnowledgeDiscovery in Database (KDD).

  • Automatic Subspace Clustering of High Dimensional Data


    Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of  the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets

  • Crime vs. demographic factors revisited: Application of data mining methods


    The aim of this article is to inquire about correlations between criminal phenomena and demographic factors. This international-level  comparative study used a dataset covering 56 countries and 28 attributes. The data were processed with the Self-Organizing Map (SOM),  assisted other clustering methods, and several statistical methods for obtaining comparable results. The article is an exploratory application  of  the SOM in mapping criminal phenomena through processing of multivariate data. We found out that SOM was able to group efficiently  the present data and characterize these different groups. Other machine learning methods were applied to ensure groups computed with  SOM. The correlations obtained between attributes were chiefly weak.

  • Data Mining Applications in Healthcare


    Data mining has been used intensively and extensively by many organizations. In healthcare, data mining is becoming increasingly popular, if  not increasingly essential. Data mining applications can greatly benefit all parties involved in the healthcare industry. For example, data  mining can help healthcare insurers detect fraud and abuse, healthcare organizations make customer relationship management decisions, physicians identify effective treatments and best practices, and patients receive better and more affordable healthcare services.


    تومان تومانحراج!

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest  in applying machine learning techniquesto difficult”real-world” problems, many of which are characterized by imbalanced data. Additionally  the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performanceof a classifier, might not be appropriatewhen  the data is imbalanced andlor the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniquesused for balancing the  datasets, and the performance measures more appropriate for mining imbalanced datasets.



    This article presents a data mining methodology for driving-condition monitoring via CAN-bus data that is based on the general data mining process. The approach is applicable to many driving condition problems, and the example of road type classification without the use of location information is investigated. Location information from Global Positioning Satellites and related map data are often not available for business reasons, or cannot represent the full dynamics of road conditions.

  • Data Mining in Bioinformatics using Weka


    The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering, and  feature selection—common data mining problems in bioinformatics research. It contains an  xtensive collection of machine learning  algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational  table.