Data mining بایگانی

بایگانی برچسب برای: Data mining

Design and implementation of data warehouse.[taliem.ir]

Design and Implementation of Data Warehouse with Data Model using Survey-based Services Data

Various business organization or government bodies are enhancing their decision making capabilities using data warehouse. For government bodies, data warehouse provides a means by enabling policy making to be formulated much easier based on available data such as survey-based services data. In this paper we present a survey-based service data with the design and implementation of a Data Warehouse framework for data mining and business intelligence reporting. In the design of the data warehouse, we developed a multidimensional Data Model for the creation of multiple data marts and design of an ETL process for populating the data marts from the data source. The development of multiple data marts will enable easier report generation by identifying common dimension amongst the data marts. The cross-join capabilities of the data marts through common dimensions, demonstrate the ability to easily drill across the data marts for cross data analysis and reporting. In addition, we also have incorporate data quality checking on the data source as well as data detection rules to filter out unmatched data schema and data range from being stored in the data warehouse for analysis.

Mining association rules for the quality improvement of the production process

اطلاع رسانی

Academics and practitioners have a common interest in the continuing development of methods and computer applications that support or perform knowledge-intensive engineering tasks. Operations management dysfunctions and lost production time are problems of enormous magnitude that impact the performance and quality of industrial systems as well as their cost of production. Association rule mining is a data mining technique used to find out useful and invaluable information from huge databases. This work develops a better conceptual base for improving the application of association rule mining methods to extract knowledge on operations and information management. The emphasis of the paper is on the improvement of the operations processes. The application example details an industrial experiment in which association rule mining is used to analyze the manufacturing process of a fully integrated provider of drilling products. The study reports some new interesting results with data mining and knowledge discovery techniques applied to a drill production process. Experiment’s results on real-life data sets show that the proposed approach is useful in finding effective knowledge associated to dysfunctions causes.

Amoeba-Based Knowledge Discovery

اطلاع رسانی

We propose an amoeba-based knowledge discovery or data mining system, that is implemented using an amoeboid organism and an associated control system. The amoeba system can be considered as one of the new non-traditional computing paradigms, and it can perform intriguing, massively parallel computing that utilizes the chaotic behavior of the amoeba .Our system is a hybrid of a traditional knowledge-based unit implemented on an ordinary computer with an amoeba-based search unit and an optical control unit interface. The solutions in our system can have one-to-one mapping to solutions of other well-known areas such as neural networks and genetic algorithms. This mapping feature allows the amoeba to use and apply techniques developed in other areas. Various forms of knowledge discovery processes are introduced. Also, a new type of knowledge discovery technique, called “autonomous metaproblem solving,” is discussed.

Knowledge management vs. data mining Research trend, forecast and[taliem.ir]

Knowledge management vs. data mining: Research trend, forecast and citation approach

اطلاع رسانی

Knowledge management (KM) and data mining (DM) have become more important today, however, there are few comprehensive researches and categorization schemes to discuss the characteristics for both of them. Using a bibliometric approach, this paper analyzes KM and DM research trends, forecasts and citations from 1989 to 2009 by locating headings ‘‘knowledge management’’ and ‘‘data mining’’ in topics in the SSCI database. The bibliometric analytical technique was used to examine these two topics in SSCI journals from 1989 to 2009, we found 1393 articles with KM and 1181 articles with DM. This paper implemented and classified KM and DM articles using the following eight categories—publication year, citation, country/territory, document type, institute name, language, source title and subject area— for different distribution status in order to explore the differences and how KM and DM technologies have developed in this period and to analyze KM and DM technology tendencies under the above result. Also, the paper performs the K–S test to check whether the distribution of author article production follows Lotka’s law. The research findings can be extended to investigate author productivity by analyzing variables such as chronological and academic age, number and frequency of previous publications, access to research grants, job status, etc. In such a way characteristics of high, medium and low publishing activity of authors can be identified. Besides, these findings will also help to judge scientific research trends and understand the scale of development of research in KM and DM through comparing the increases of the article author.

Four Decades of Data Mining in Network and[taliem.ir]

Four Decades of Data Mining in Network and Systems Management

اطلاع رسانی

How has the interdisciplinary data mining field been practiced in Network and Systems Management (NSM)? In Science and Technology, there is a wide use of data mining in areas like bioinformatics, genetics, Web and more recently astroinformatics. However, the application in NSM has been limited and inconsiderable. In this article, we provide an account of how data mining has been applied in managing networks and systems for the past four decades, presumably since its birth. We look into the field’s applications in the key NSM activities – discovery, monitoring, analysis, reporting and domain knowledge acquisition. In the end, we discuss our perspective on the issues that are considered critical for the effective application of data mining in the modern systems which are characterized by heterogeneity and high dynamism .

Database Preprocessing and Comparison between Data Mining[taliem.ir]

Database Preprocessing and Comparison between Data Mining Methods

اطلاع رسانی

Database preprocessing is very important to utilize memory usage, compression is one of the preprocessing needed to reduce the memory required to store and load data for processing, the method of compression introduced in this paper was tested, by using proposed examples to show the effect of repetition in database, as well as the size of database, the results showed that as the repetition increased the compression ratio will be increased. The compression is one of the important activities for data preprocessing before implementing data mining. Data mining methods such as Na¨ıve Bayes, Nearest Neighbor and Decision Tree are tested. The implementation of the three methods showed that Na¨ıve Bayes method is effectively used when the data attributes are categorized, and it can be used successfully in machine learning. The Nearest Neighbor is most suitable when the data attributes are continuous or categorized. The third method tested is the Decision Tree, it is a simple predictive method implemented by using simple rule methods in data classification. The success of data mining implementation depends on the completeness of database, that represented by data warehouse, that must be organized by using the important characteristics of data warehouse.

Using Data Mining to Detect Insurance Fraud

اطلاع رسانی

Insurance companies lose millions of dollars each year through fraudulent claims, largely because they do not have a way to easily determine which claims are legitimate and which may be fraudulent. To ensure that adjusters target claims which have the greatest likelihood of adjustment, many insurance companies have incorporated IBM SPSS data mining into their investigating and auditing processes. This report describes how data mining techniques can enable you to improve accuracy and save time, money and resources.

A hybrid evolutionary algorithm for attribute selection in data mining

اطلاع رسانی

Real life data sets are often interspersed with noise, making the subsequent data mining process difficult. The task of the classifier could be simplified by eliminating attributes that are deemed to be redundant for classification, as the retention of only pertinent attributes would reduce the size of the dataset and subsequently allow more comprehensible analysis of the extracted patterns or rules. In this article, a new hybrid approach comprising of two conventional machine learning algorithms has been proposed to carry out attribute selection. Genetic algorithms (GAs) and support vector machines (SVMs) are integrated effectively based on a wrapper approach. Specifically, the GA component searches for the best attribute set by applying the principles of an evolutionary process. The SVM then classifies the patterns in the reduced datasets, corresponding to the attribute subsets represented by the GA chromosomes. The proposed GA- SVM hybrid is subsequently validated using datasets obtained from the UCI machine learning repository. Simulation results demonstrate that the GA-SVM hybrid produces good classification accuracy and a higher level of consistency that is comparable to other established algorithms. In addition, improvements are made to the hybrid by using a correlation measure between attributes as a fitness measure to replace the weaker members in the population with newly formed chromosomes. This injects greater diversity and increases the overall fitness of the population. Similarly, the improved mechanism is also validated on the same data sets used in the first stage. The results justify the improvements in the classification accuracy and demonstrate its potential to be a good classifier for future data mining purposes.