نمایش همه 11 نتیجه
A Case-based Data Warehousing Courseware
Data warehousing isرایگان!
Data warehousing is one of the important approaches for data integration and data preprocessing. The objective of this project is to develop a web-based interactive courseware to help beginner data warehouse designers to reinforce the key concepts of data warehousing using a case study approach. The case study is to build a data warehouse for a university student enrollment prediction data mining system. This data warehouse is able to generate summary reports as input data files for a data mining system to predict future student enrollment. The data sources include: (1) the enrollment data from California State University, Sacramento and (2) the related public data of California. The ourseware is designed to build the data warehouse systematically using a set of 4 demonstrations covering the following data warehousing topics: fundamentals, design principle, building an enterprise data warehouse using incremental approach, and aggregation.
A Taxonomy of Data Scheduling in Data Grids and Data Centers: Problems and Intelligent Resolution Techniques
Scheduling in traditرایگان!
Scheduling in traditional distributed systems has been mainly studied for system performance parameters without data transmission requirements. With the emergence of Data Grids (DGs) and Data Centers, data-aware scheduling has become a major research issue. DGs arise quite naturally to support needs of scientific communities to share, access, process, and manage large data collections geographicallydistributed. In fact, DGs can be seen as precursors of Data Centers of Cloud Computing platforms, which serve as basis for collaboration at large scale. In such computational infrastructures, the large amount of data to be efficiently processed is a real challenge. One of the key issues contributing to the efficiency of massive processing is the scheduling with data transmission requirements. Data-aware scheduling, although similar in nature with Grid scheduling, is giving rise to the definition of a new family of optimization problems. New requirements such as data transmission, decoupling of data from processing, data replication, data access and security are the basis for the definition of a whole axonomy of data scheduling problems from an optimization perspective. In this work we present the modelling of such requirements and define data scheduling problems. We exemplify the methodology for the case of data-ware independent batch task scheduling and present several heuristic resolution methods for the problem.
Application of Association Rule Mining for Replication in Scientific Data Grid
Grid computing is thرایگان!
Grid computing is the most popular infrastructure in many emerging field of science and engineering where extensive data driven experiments are conducted by thousands of scientists all over the world. Efficient transfer and replication of these peta-byte scale data sets are the fundamental challenges in Scientific Grid. Data grid technology is developed to permit data sharing across many organizations in geographically disperse locations. Replication of data helps thousands of researchers all over the world to access those data sets more efficiently. Data replication is essential to ensure data reliability and availability across the grid. Replication ensures above mentioned criteria by creating more copies of same data sets across the grid. In this paper, we proposed a data mining based replication to accelerate the data access time. Our proposed algorithm mines the hidden rules of association among different files for replica optimization which proves highly efficient for different access patterns. The algorithm is simulated using data grid simulator, OptorSim, developed by European Data Grid project. Then our algorithm is compared with the existing approaches where it outperforms others.
Chameleon: A Resource Scheduler in A Data Grid Environment
Grid computing is moرایگان!
Grid computing is moving into two ways. The Computational Grid focuses on reducing execution time of applications that require a great number of computer processing cycles. The Data Grid provides the way to solve large scale data management problems. Data intensive applications such as High Energy Physics and Bioinformatics require both Computational and Data Grid features. Job scheduling in Grid has been mostly discussed from the perspective of computational Grid. However, scheduling on Data Grid is just a recent focus of Grid computing activities. In Data Grid environment, effective scheduling mechanism considering both computational and data storage resources must be provided for large scale data intensive applications. In this paper, we describe new scheduling model that considers both amount of computational resources and data availability in Data Grid environment. We implemented a scheduler, called Chameleon, based on the proposed application scheduling model. Chameleon shows performance improvements in data intensive applications that require both large number of processors and data replication mechanisms. The results achieved from Chameleon are presented .
COMPARISON PLAN FOR DATA WAREHOUSE SYSTEM ARCHITECTURES
Many organizations lرایگان!
Many organizations look for a proper way to make better and faster decisions about their businesses. Data warehouse has unique features such as data mining and ad hoc querying on data collected and integrated from many of the computerized systems used in organization. Data warehouse can be built using a number of architectures. Each one of the architecture has its own advantages and disadvantages. Thispaper investigates five different data warehouse architectures: centralized data warehouse, independent data mart, dependent data mart, homogeneous distributed data warehouse andheterogeneous distributed data warehouse and subsequently an optimal plan will be described and then all data warehouse architectures mentioned above will be parametrically measured by this method. Finally as case study, a ompany’s problem in their data accessing system is studied and the best architecture is chosen to accommodate the needs of the company’s system.
Flash Memory SSD based DBMS for Data Warehouses and Data Marts
Flash memory based hرایگان!
Flash memory based high capacity SSDs open the doors for large enterprise applications for better performance and high reliability. Flash memory hardware characteristics not allow disk based schemes implication directly. For employing such schemes, we need to revise them on some level to make them effective for flash media storage. In this paper, we aim to implement DBMS on flash memory SSD based large enterprise applications. This paper presents the relevancy of SSD characteristics with storage features of data warehouses and data marts and proposes the architecture of data storage for variable-length records in and data retrieval using virtual sequential access method by multilevel indexing from flash memory based SSDs for such applications. We prove less overhead with high reliability, and more throughput compare to hard disk drives.
Job Scheduling for Dynamic Data Replication Strategy in Heterogeneous Federation Data Grid Systems
Data grid technologyرایگان!
Data grid technology has the advantage of allowing data sharing and collaboration activities across many organizations in geographically dispersed locations. Due to collaboration activities in distributed data grid, improving data access is one of the issues that need to be wisely addressed in achieving better optimization. In this paper, a replication model for federation data grid system called Sub-Grid-Federation hadbeen proposed to improve access latency by accessing data from an area identified as ‘Network Core Area’ (NCA). The performance of access latency in Sub-Grid-Federation had been tested based on the mathematical proving and simulated using OptorSim simulator. Hence, Sub-Grid-Federation is a better alternative for the implementation of collaboration and data sharing in data grid system .
Managing Data Source Quality for Data Warehouse in Manufacturing
Data quality and Datرایگان!
Data quality and Data Source Management is one of the key success factors for data warehouse project. Many data warehouse projects fail due to poor quality of the data. It is believed that the problems can be fixed later and because of that, a lot of time will be spent to fix the error. If low quality data fed into the data warehouse system, the result will be not accurate if these data are used in the decision making. Many data warehouse and business intelligence projects failure are due to wrong or low quality data. Therefore this paper will underpins several aspects such as Total Data Quality Management (TDQM), ISO 9001:2008 and Quality Management System (QMS) in order to address data quality problems in the early stage and find out the best procedure to manage the data sources. To find a standard procedure in managing data source base on ISO 9001:2008 standard, process in managing data source is identified and a compared to the ISO 9001:2008 Quality Management System (QMS) requirements. As a result, this process is viewed as a kind of production process and relate to the concepts of quality management known from the manufacturing and service domain. More precisely, a high quality management system in managing data source is proposed. This system is based on ISO 9001:2008 standard and hopes it can help organizations in implementing and operating quality management system. By using ISO 9001:2008 framework to the process of managing data source, this approach will be similar to the manufacturing concept that hasan added advantage when compared to traditional approaches in managing data source.
OPTIMAL DISTRIBUTED DATA WAREHOUSE SYSTEM ARCHITECTURE
Many organizations lرایگان!
Many organizations look for a proper way to make better and faster decisions about their businesses. Data warehouse has unique features such as data mining and ad hoc querying on data collected and integrated from many of the computerized systems used in organization. Data warehouse can be built using a number of architectures. Each one of the architecture has its own advantages and disadvantages. This paper investigates five different data warehouse architectures: centralized data warehouse, independent data mart, dependent data mart, homogeneous distributed data warehouse and heterogeneous distributed data warehouse and subsequently a comparison plan will be described and then all data warehouse architectures mentioned above will be parametrically measured by this plan. Then an optimal architecture is proposed in which we tried to resolve drawbacks in previous distributed architectures to reach to a desired design. It must not be forgotten, however, that a desired design will not suit every situation in every project, but should be customized for particular environments.
Study on Methods of Building Data Warehouse for Multi-Mine Group Companies
In the light of theرایگان!
In the light of the qualities of grouped companies with multi-mine, we presented a proposal for implementing scientific management referred to as center-distributed data warehouse or hybrid data warehouse that local data warehouse is built for each mine enterprise to satisfy the major production management and decision-making while global one extracts information from them. Applying the objected oriented theory, firstly, we divide basic detail data into six aggregated classes of entities, including engineering information, geological information, mineral resources, other equipment and material information, safety and environment information and financial information management. Then, in the view of difficulties and complexity of building data warehouse, it is significant to create data marts for monitoring department-level transaction data with robust labor-division feature. For avoiding appearance of information silo, distributed data warehouses are primarily built and the subsequent extracted data to construct dependent data marts due to business requirements of various departments. Furthermore, we establish the bridges among these distributed data warehouses to achieve data communication, and finally fulfill the entire plan of data warehouse for multi-mine grouped companies.
Towards A Scalable Scientific Data Grid Model and Services
Scienti fic Data Griرایگان!
Scienti fic Data Grid mostly deals with large computational problems. It provides geographically distributed resources for large-scale data-intensive applications that generate large scienti fic data sets. This required the scientist in modern scienti fic computing communities involve in managing massive amounts of a very large data collections that geographically distributed. Research in the area of grid has given various ideas and solutions to address these requirements. However, nowadays the number of participants (scientists and institutes) that involve in this kind of environment is increasing tremendously. This situation has leads to a problem of scalability. In order to overcome this problem we need a data grid model that can scale well with the increasing of user. Peer-to-peer (P2P) is one of the architecture that promising scale and dynamism environment. In this paper, we present a P2P model for Scienti fic Data Grid that utilizes the P2P services to address the scalability problem. By using this model, we study and propose various decentralized discovery strategies that intend to address the problem of scalability. We also investigate the impact of data replication that addressing the data distribution and reliability problem for our Scientific Data Grid model on the propose discovery strategies.