Showing 85–96 of 148 results
Optimizing Hadoop for the cluster
The total number ofادامه/دانلودرایگان!
The total number of clusters running Hadoop increases ev-
ery day. The reason for this is that companies have found a
simple model that works well.
Overcoming Hadoop Scaling Limitations through Distributed Task Execution
Data driven programmادامه/دانلودرایگان!
Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely affect Hadoop’s scalability to tomorrow’s extreme-scale data centers.
Performance Characterization of Hadoop and DataMPI Based on Amdahl’s Second Law
Amdahl’s second lawادامه/دانلودرایگان!
Amdahl’s second law has been seen as a useful guideline for designing and evaluating balanced computer systems for decades
PREDICTIVE MODELING WITH BIG DATA:Is Bigger Really Better
With the increasinglادامه/دانلودرایگان!
With the increasingly widespread collection and processing of ‘‘big data,’’ there is natural interest in using these
data assets to improve decision making. One of the best understood ways to use data to improve decision making is via predictive analytics
Promoting Distributed Accountability in the Cloud
Cloud computing enabادامه/دانلودرایگان!
Cloud computing enables highly scalable services to be easily consumed over the Internet on an as-needed basis. A major feature of the cloud services is that users’ data is usually processed remotely in unknown machines that users do not own or operate. While enjoying the convenience brought by this new emerging technology, users’ fears of losing control of their own data (particularly financial and health data) can become a significant barrier to the wide adoption of cloud services. To address this problem, in this paper, we propose a novel highly decentralized information accountability framework to keep track of the actual usage of the users’ data in the cloud.
Quality Assurance for Big Data Application– Issues, Challenges, and Needs
With the fast advancادامه/دانلودرایگان!
With the fast advance of big data technology and analytics solutions, building high-quality big data computing
services in different application domains is becoming a very hot research and application topic among academic and industry communities, and government agencies
Recent Job Scheduling Algorithms in Hadoop Cluster Environments: A Survey
This paper discussesادامه/دانلودرایگان!
This paper discusses Cloud Computing is rising as a replacement machine paradigm shift. Hadoop MapReduce has become a robust Computation Model for process giant knowledge on distributed commodity hardwar e clusters like Clouds. Studies describe that Hadoop implementations, the default first in first out scheduler hardware is accessible wherever jobs are scheduled in FIFO order with support for different priority primarily based schedulers also other defaulter schedulers
RECOMMENDER SYSTEM FOR ANIMATED VIDEO
Finding entertainment in the domain of animation is a challenging process that often forces many customers to ask others for assistance in forums or chat rooms. The manual process of recommending shows to one another based on often flimsy examples prevents many users from efficiently identifying media that suites their taste. Recommender systems, software tools and techniques for providing suggestions of items to a user  , present an exceptional use case for resolving this problem. In this paper we compare the various implementations and benefits of using the collaborative filtering (user based approach). Improving the identification of animation customers are interested in is a problem domain recommender systems are well suited for, benefiting both customers and vendors of such media. The aim of this study is to provide a proof of concept system capable of providing valuable recommendations based on show rankings
Recommender Systems and the Social Web
In the past, classic...ادامه/دانلودرایگان!
In the past, classic recommender systems relied solely on the user models they were able to construct by themselves and suffered from the “cold start” problem. Recent decade advances, among them internet connectivity and data sharing, now enable them to bootstrap their user models from external sources such as user modeling servers or other recommender systems. However, this approach has only been demonstrated by research prototypes. Recent developments have brought a new source for bootstrapping recommender systems: social web services. The variety of social web services, each with its unique user model characteristics, could aid bootstrapping recommender systems in different ways. In this paper we propose a mapping of how each of the classical user modeling approaches can benefit from nowadays active services‟ user models, and also supply an example of a possible application.
Revisiting the Data Lifecycle with Big Data Curation
As science becomes mادامه/دانلودرایگان!
As science becomes more data-intensive and collaborative, researchers increasingly use larger and more complex data to answer research questions. The capacity of storage infrastructure, the increased sophistication and deployment of sensors, the ubiquitous availability of computer clusters, the development of new analysis techniques, and larger collaborations allow researchers to address grand societal challenges in a way that is unprecedented.
Ricardo: Integrating R and Hadoop
ABSTRACT Many modernادامه/دانلودرایگان!
ABSTRACT Many modern enterprises are collecting data at the most detailed level possible, creating data repositories ranging from terabytes to petabytes in size. The ability to apply sophisticated statistical analysis methods to this data is becoming essential for marketplace competitiveness. This need to perform deep analysis over huge data repositories poses a significant challenge to existing statistical software and data management systems. On the one hand, statistical software provides rich functionality for data analysis and modeling, but can handle only limited amounts of data; e.g., popular packages like R and SPSS operate entirely in main memory. On the other hand, data management systems—such as MapReduce-based systems—can scale to petabytes of data, but provide insufficient analytical functionality. We report our experiences in building Ricardo, a scalable platform for deep analytics. Ricardo is part of the eXtreme Analytics Platform (XAP) project at the IBM Almaden Research Center, and rests on a decomposition of data-analysis algorithms into parts executed by the R statistical analysis system and parts handled by the Hadoop data management system. This decomposition attempts to minimize the transfer of data across system boundaries. Ricardo contrasts with previous approaches, which try to get along with only one type of system, and allows analysts to work on huge datasets from within a popular, well supported, and powerful analysis environment. Because our approach avoids the need to re-implement either statistical or data-management functionality, it can be used to solve complex problems right now.