Narrow your search
Listing 1 - 10 of 12 << page
of 2
>>
Sort by

Book
Simplify big data analytics with Amazon EMR : a beginner's guide to learning and implementing Amazon EMR for building data analytics solutions
Author:
ISBN: 180107772X 9781801077729 Year: 2022 Publisher: Birmingham, England : Packt Publishing, Limited,

Loading...
Export citation

Choose an application

Bookmark

Abstract

Design scalable big data solutions using Hadoop, Spark, and AWS cloud native services Key Features Build data pipelines that require distributed processing capabilities on a large volume of data Discover the security features of EMR such as data protection and granular permission management Explore best practices and optimization techniques for building data analytics solutions in Amazon EMR Book Description Amazon EMR, formerly Amazon Elastic MapReduce, provides a managed Hadoop cluster in Amazon Web Services (AWS) that you can use to implement batch or streaming data pipelines. By gaining expertise in Amazon EMR, you can design and implement data analytics pipelines with persistent or transient EMR clusters in AWS. This book is a practical guide to Amazon EMR for building data pipelines. You'll start by understanding the Amazon EMR architecture, cluster nodes, features, and deployment options, along with their pricing. Next, the book covers the various big data applications that EMR supports. You'll then focus on the advanced configuration of EMR applications, hardware, networking, security, troubleshooting, logging, and the different SDKs and APIs it provides. Later chapters will show you how to implement common Amazon EMR use cases, including batch ETL with Spark, real-time streaming with Spark Streaming, and handling UPSERT in S3 Data Lake with Apache Hudi. Finally, you'll orchestrate your EMR jobs and strategize on-premises Hadoop cluster migration to EMR. In addition to this, you'll explore best practices and cost optimization techniques while implementing your data analytics pipeline in EMR. By the end of this book, you'll be able to build and deploy Hadoop- or Spark-based apps on Amazon EMR and also migrate your existing on-premises Hadoop workloads to AWS. What you will learn Explore Amazon EMR features, architecture, Hadoop interfaces, and EMR Studio Configure, deploy, and orchestrate Hadoop or Spark jobs in production Implement the security, data governance, and monitoring capabilities of EMR Build applications for batch and real-time streaming data analytics solutions Perform interactive development with a persistent EMR cluster and Notebook Orchestrate an EMR Spark job using AWS Step Functions and Apache Airflow Who this book is for This book is for data engineers, data analysts, data scientists, and solution architects who are interested in building data analytics solutions with the Hadoop ecosystem services and Amazon EMR. Prior experience in either Python programming, Scala, or the Java programming language and a basic understanding of Hadoop and AWS will help you make the most out of this book.


Book
MapReduce design patterns
Authors: ---
ISBN: 9781449327170 1449327176 Year: 2013 Publisher: Beijing : O'Reilly,

Loading...
Export citation

Choose an application

Bookmark

Abstract


Book
面向MapReduce的Hadoop优化 : Chinese Edition.
Authors: ---
ISBN: 1836204507 9781836204503 Year: 2024 Publisher: Birmingham : Packt Publishing, Limited,

Loading...
Export citation

Choose an application

Bookmark

Abstract

Detailed summary in vernacular field only.


Dissertation
Development of a web service for storing, managing and processing user data from mobile applications
Authors: --- --- --- ---
Year: 2013 Publisher: Gent : s.n.,

Loading...
Export citation

Choose an application

Bookmark

Abstract

In deze masterproef wordt een webservice ontwikkeld die gebruikersdata verwerkt, bewaart en beheert. Deze data is afkomstig van mobiele diensten en applicaties. Het succes van deze mobiele diensten en applicaties wordt namelijk bepaald door de gebruikerservaring, uitgedrukt door Quality of Experience parameters. Het zijn voornamelijk deze parameters in combinatie met de Quality of Service parameters die worden geanalyseerd. Voor de opslag van de gebruikersdata wordt er een comparatieve studie gemaakt tussen verschillende dataformaten. De dataformaten die worden vergeleken zijn HBase, mysql, Cassandra en mongoDB. De uiteindelijke keuze wordt gebaseerd op de kenmerken van de doorgestuurde data. Er wordt beschreven hoe de data wordt opgeslagen en welke technieken gebruikt worden om de data op te vragen en te verwerken tot een analyseerbare voorstelling. Hiervoor wordt er gebruik gemaakt van Apache Hadoop en Apache Hive. Uiteindelijk kan de webservice worden gebruikt om de data te analyseren zodat bijvoorbeeld bepaalde evoluties in de gebruikerservaring kunnen worden gedetecteerd. The development of a web service for storing, managing and processing user data will be documented in this thesis. The user data comes from mobile services and applications. The success of these mobile services and applications is determined by the user experience, expressed by Quality of Experience parameters. It is primarily these parameters in combination with the Quality of Service parameters that are being analyzed. For the storage of this data, a comparative study is made between different data formats. HBase, mysql, Cassandra and mongoDB will be compared. The final choice will be based on the characteristics of the transmitted data. It describes how the data is stored and what techniques are used to retrieve and process the data. This data will be formatted into an analyzable representation. Apache Hadoop and Apache Hive will be used to process the data. An analyst can use the web service to retrieve data. This data can be used to detect, for example an evolution in the user experience.


Book
Distributed and Parallel Architectures for Spatial Data
Authors: --- --- ---
Year: 2020 Publisher: Basel, Switzerland MDPI - Multidisciplinary Digital Publishing Institute

Loading...
Export citation

Choose an application

Bookmark

Abstract

This book aims at promoting new and innovative studies, proposing new architectures or innovative evolutions of existing ones, and illustrating experiments on current technologies in order to improve the efficiency and effectiveness of distributed and cluster systems when they deal with spatiotemporal data.


Book
Distributed and Parallel Architectures for Spatial Data
Authors: --- --- ---
Year: 2020 Publisher: Basel, Switzerland MDPI - Multidisciplinary Digital Publishing Institute

Loading...
Export citation

Choose an application

Bookmark

Abstract

This book aims at promoting new and innovative studies, proposing new architectures or innovative evolutions of existing ones, and illustrating experiments on current technologies in order to improve the efficiency and effectiveness of distributed and cluster systems when they deal with spatiotemporal data.


Book
Distributed and Parallel Architectures for Spatial Data
Authors: --- --- ---
Year: 2020 Publisher: Basel, Switzerland MDPI - Multidisciplinary Digital Publishing Institute

Loading...
Export citation

Choose an application

Bookmark

Abstract

This book aims at promoting new and innovative studies, proposing new architectures or innovative evolutions of existing ones, and illustrating experiments on current technologies in order to improve the efficiency and effectiveness of distributed and cluster systems when they deal with spatiotemporal data.


Book
Optimizing hadoop for MapReduce : learn how to configure your hadoop cluster to run optimal MapReduce jobs
Author:
ISBN: 1783285664 9781783285662 9781783285655 1783285656 Year: 2014 Publisher: Birmingham, England : Packt Publishing Ltd,

Loading...
Export citation

Choose an application

Bookmark

Abstract

This book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance.If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.


Book
Hadoop mapreduce v2 cookbook : explore the hadoop mapreduce v2 ecosystem to gain insights from very large datasets
Authors: --- --- --- --- --- et al.
ISBN: 1783285486 9781783285488 9781783285471 Year: 2015 Publisher: Birmingham, England ; Mumbai, [India] : Packt Publishing,

Loading...
Export citation

Choose an application

Bookmark

Abstract

If you are a Big Data enthusiast and wish to use Hadoop v2 to solve your problems, then this book is for you. This book is for Java programmers with little to moderate knowledge of Hadoop MapReduce. This is also a one-stop reference for developers and system admins who want to quickly get up to speed with using Hadoop v2. It would be helpful to have a basic knowledge of software development using Java and a basic working knowledge of Linux.


Book
Learning YARN : moving beyond MapReduce : learn resource management and big data processing using YARN
Authors: ---
ISBN: 9781784394585 1784394580 9781784393960 1784393967 Year: 2015 Publisher: Birmingham : Packt Publishing,

Loading...
Export citation

Choose an application

Bookmark

Abstract

Moving beyond MapReduce - learn resource management and big data processing using YARN About This Book Deep dive into YARN components, schedulers, life cycle management and security architecture Create your own Hadoop-YARN applications and integrate big data technologies with YARN Step-by-step guide to provision, manage, and monitor Hadoop-YARN clusters with ease Who This Book Is For This book is intended for those who want to understand what YARN is and how to efficiently use it for the resource management of large clusters. For cluster administrators, this book gives a detailed explanation of provisioning and managing YARN clusters. If you are a Java developer or an open source contributor, this book will help you to drill down the YARN architecture, write your own YARN applications and understand the application execution phases. This book will also help big data engineers explore YARN integration with real-time analytics technologies such as Spark and Storm. What You Will Learn Explore YARN features and offerings Manage big data clusters efficiently using the YARN framework Create single as well as multi-node Hadoop-YARN clusters on Linux machines Understand YARN components and their administration Gain insights into application execution flow over a YARN cluster Write your own distributed application and execute it over YARN cluster Work with schedulers and queues for efficient scheduling of applications Integrate big data projects like Spark and Storm with YARN In Detail Today enterprises generate huge volumes of data. In order to provide effective services and to make smarter and more intelligent decisions from these huge volumes of data, enterprises use big-data analytics. In recent years, Hadoop has been used for massive data storage and efficient distributed processing of data. The Yet Another Resource Negotiator (YARN) framework solves the design problems related to resource management faced by the Hadoop 1.x framework by providing a more scalable, efficient, flexible, and highly available resource management framework for distributed data processing. This book starts with an overview of the YARN features and explains how YARN provides a business solution for growing big data needs. You will learn to provision and manage single, as well as multi-node, Hadoop-YARN clusters in the easiest way. You will walk through the YARN administration, life cycle management, application execution, REST APIs, schedulers, security framework and so o...

Listing 1 - 10 of 12 << page
of 2
>>
Sort by