Showing 1–12 of 70 results
A big data analytics framework for scientific data management
The Ophidia projectرایگان!
The Ophidia project is a research effort addressing big data analytics requirements, issues, and challenges for eScience. We present here the Ophidia analytics framework, which is responsible for atomically processing, transforming and manipulating array-based data. This framework provides a common way to run on large clusters analytics tasks applied to big datasets. The paper highlights the design principles, algorithm, and most relevant implementation aspects of the Ophidia analytics framework. Some experimental results, related to a couple of data analytics operators in a real cluster environment, are also presented.
A Comparison of Approaches to Large-Scale Data Analysis
There is currently cرایگان!
There is currently considerable enthusiasm around the MapReduce(MR) paradigm for large-scale data analysis . Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17].
A Comprehensive View of Hadoop MapReduce Scheduling Algorithms
Hadoop is a Java-basرایگان!
Hadoop is a Java-based programming framework that supports the storing and processing of large data sets in a distributed computing environment and it is very much appropriate for high volume of data. it’s using HDFS for data storing and using MapReduce to processing that data. MapReduce is a popular programming model to support data-intensive applications using shared-nothing clusters. the main objective of MapReduce programming model is to parallelize the job execution across multiple nodes for execution
A genetic algorithm-based job scheduling model for big data analytics
Big data analytics (رایگان!
Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and not mutually separated
A Secure Cloud Computing Based Framework for Big Data Information Management of Smart Grid
Smart grid is a techرایگان!
Smart grid is a technological innovation that improves efficiency, reliability, economics, and sustainability of electricity services. It plays a crucial role in modern energy infrastructure. The main challenges of smart grids, however, are how to manage different types of front-end intelligent devices such as power assets and smart meters efficiently; and how to process a huge amount of data received from these devices. Cloud computing, a technology that provides computational resources on demands, is a good candidate to address these challenges since it has several good properties such as energy saving, cost saving, agility, scalability, and flexibility. In this paper, we propose a secure cloud computing based framework for big data information management in smart grids, which we call “Smart-Frame.” The main idea of our framework is to build a hierarchical structure of cloud computing centers to provide different types of computing services for information management and big data analysis. In addition to this structural framework, we present a security solution based on identity-based encryption, signature and proxy re-encryption to address critical security issues of the proposed framework.
A Unified MapReduce Domain-Specific Language for Distributed and Shared Memory Architectures
MapReduce is a suitaرایگان!
MapReduce is a suitable and ecient parallel programming pattern for processing big data analysis. In recent
years, many frameworks/languages have implemented this pattern to achieve high performance in data mining applications, particularly for distributed memory architectures (e.g., clusters).Nevertheless, the industry of processors is now able to oer powerful processing on single machines (e.g., multi-core). Thus, these applications may address the parallelism in another architectural level.
AI2 : Training a big data machine to defend
We present an analysرایگان!
We present an analyst-in-the-loop security system, where analyst intuition is put together with stateof- the-art machine learning to build an end-to-end active learning system. The system has four key features: a big data behavioral analytics platform, an ensemble of outlier detection methods, a mechanism to obtain feedback from security analysts, and a supervised learning module. When these four components are run in conjunction on a daily basis and are compared to an unsupervised outlier detection method, detection rate improves by an average of 3.41, and false positives are reduced fivefold. We validate our system with a real-world data set consisting of 3.6 billion log lines. These results show that our system is capable of learning to defend against unseen attacks
An Economic-based Resource Management and Scheduling for Grid Computing Applications
Resource management and scheduling plays a crucial role in achieving high utilization of resources in grid computingenvironments. Due to heterogeneity of resources, scheduling an application is significantly complicated and challenging task in grid system. Most of the researches in this area are mainly focused on to improve the performance of the grid system. There were some allocation model has been proposed based on divisible load theory with different type of workloads and a single originating processor. In this paper we introduce a new resource allocation model with multiple load originating processors as an economic model. Solutions for an optimal allocation of fraction of loads to nodes obtained to minimize the cost of the grid users via linear programming approach. It is found that the resource allocation model can efficiently and effectively allocate workloads to proper resources. Experimental results showed that the proposed model obtained the better solution in terms of cost and time.
Bayes and Big Data: The Consensus Monte Carlo Algorithm
A useful denition oرایگان!
A useful denition of \big data” is data that is too big to comfortably process on a single machine, either because of processor, memory, or disk bottlenecks. Graphics processing units can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated by splitting data across multiple machines
Big Data – Opportunities and Challenges
This paper summarizeرایگان!
This paper summarizes opportunities and challenges of big data. It identifies important research directions
and includes a number of questions that have been debated by the panel
Big data analytics in E-commerce: a systematic review and agenda for future research
Big data analytics iرایگان!
Big data analytics in E-commerce: a systematic review and agenda for future research
Big Data and Cloud Computing: Current State and Future Opportunities
Scalable database maرایگان!
Scalable database management systems (DBMS)—both for update intensive application workloads as well as decision support systems for descriptive and deep analytics—are a critical part of the cloud infrastructure and play an important role in ensuring the smooth transition of applications from the traditional enterprise infrastructures to next generation cloud infrastructures. Though scalable data management has been a vision for more than three decades and much research has focussed on large scale data management in traditional enterprise setting, cloud computing brings its own set of novel challenges that must be addressed to ensure the success of data management solutions in the cloud environment. This tutorial presents an organized picture of the challenges faced by application developers and DBMS designers in developing and deploying internet scale applications. Our background study encompasses both classes of systems: (i) for supporting update heavy applications, and (ii) for ad-hoc analytics and decision support. We then focus on providing an in-depth analysis of systems for supporting update intensive web-applications and provide a survey of the state-of-theart in this domain. We crystallize the design choices made by some successful systems large scale database management systems, analyze the application demands and access patterns, and enumerate the desiderata for a cloud-bound DBMS.