Listing 1 - 10 of 12 | << page >> |
Sort by
|
Choose an application
Design scalable big data solutions using Hadoop, Spark, and AWS cloud native services Key Features Build data pipelines that require distributed processing capabilities on a large volume of data Discover the security features of EMR such as data protection and granular permission management Explore best practices and optimization techniques for building data analytics solutions in Amazon EMR Book Description Amazon EMR, formerly Amazon Elastic MapReduce, provides a managed Hadoop cluster in Amazon Web Services (AWS) that you can use to implement batch or streaming data pipelines. By gaining expertise in Amazon EMR, you can design and implement data analytics pipelines with persistent or transient EMR clusters in AWS. This book is a practical guide to Amazon EMR for building data pipelines. You'll start by understanding the Amazon EMR architecture, cluster nodes, features, and deployment options, along with their pricing. Next, the book covers the various big data applications that EMR supports. You'll then focus on the advanced configuration of EMR applications, hardware, networking, security, troubleshooting, logging, and the different SDKs and APIs it provides. Later chapters will show you how to implement common Amazon EMR use cases, including batch ETL with Spark, real-time streaming with Spark Streaming, and handling UPSERT in S3 Data Lake with Apache Hudi. Finally, you'll orchestrate your EMR jobs and strategize on-premises Hadoop cluster migration to EMR. In addition to this, you'll explore best practices and cost optimization techniques while implementing your data analytics pipeline in EMR. By the end of this book, you'll be able to build and deploy Hadoop- or Spark-based apps on Amazon EMR and also migrate your existing on-premises Hadoop workloads to AWS. What you will learn Explore Amazon EMR features, architecture, Hadoop interfaces, and EMR Studio Configure, deploy, and orchestrate Hadoop or Spark jobs in production Implement the security, data governance, and monitoring capabilities of EMR Build applications for batch and real-time streaming data analytics solutions Perform interactive development with a persistent EMR cluster and Notebook Orchestrate an EMR Spark job using AWS Step Functions and Apache Airflow Who this book is for This book is for data engineers, data analysts, data scientists, and solution architects who are interested in building data analytics solutions with the Hadoop ecosystem services and Amazon EMR. Prior experience in either Python programming, Scala, or the Java programming language and a basic understanding of Hadoop and AWS will help you make the most out of this book.
Choose an application
Electronic data processing --- Cluster analysis --- Software patterns. --- Computer algorithms. --- Distributed processing. --- Data processing. --- Apache Hadoop. --- MapReduce.
Choose an application
Detailed summary in vernacular field only.
Electronic data processing --- Cluster analysis --- Distributed processing. --- Data processing. --- Apache Hadoop. --- MapReduce (Computer file)
Choose an application
In deze masterproef wordt een webservice ontwikkeld die gebruikersdata verwerkt, bewaart en beheert. Deze data is afkomstig van mobiele diensten en applicaties. Het succes van deze mobiele diensten en applicaties wordt namelijk bepaald door de gebruikerservaring, uitgedrukt door Quality of Experience parameters. Het zijn voornamelijk deze parameters in combinatie met de Quality of Service parameters die worden geanalyseerd. Voor de opslag van de gebruikersdata wordt er een comparatieve studie gemaakt tussen verschillende dataformaten. De dataformaten die worden vergeleken zijn HBase, mysql, Cassandra en mongoDB. De uiteindelijke keuze wordt gebaseerd op de kenmerken van de doorgestuurde data. Er wordt beschreven hoe de data wordt opgeslagen en welke technieken gebruikt worden om de data op te vragen en te verwerken tot een analyseerbare voorstelling. Hiervoor wordt er gebruik gemaakt van Apache Hadoop en Apache Hive. Uiteindelijk kan de webservice worden gebruikt om de data te analyseren zodat bijvoorbeeld bepaalde evoluties in de gebruikerservaring kunnen worden gedetecteerd. The development of a web service for storing, managing and processing user data will be documented in this thesis. The user data comes from mobile services and applications. The success of these mobile services and applications is determined by the user experience, expressed by Quality of Experience parameters. It is primarily these parameters in combination with the Quality of Service parameters that are being analyzed. For the storage of this data, a comparative study is made between different data formats. HBase, mysql, Cassandra and mongoDB will be compared. The final choice will be based on the characteristics of the transmitted data. It describes how the data is stored and what techniques are used to retrieve and process the data. This data will be formatted into an analyzable representation. Apache Hadoop and Apache Hive will be used to process the data. An analyst can use the web service to retrieve data. This data can be used to detect, for example an evolution in the user experience.
Cassandra. --- Data warehouse - data warehouse. --- Gegevensbanken - databases. --- Hadoop. --- Hive. --- Informatica - Informatics. --- Java - Java. --- MapReduce. --- RESTful. --- Webservice - webservice.
Choose an application
This book aims at promoting new and innovative studies, proposing new architectures or innovative evolutions of existing ones, and illustrating experiments on current technologies in order to improve the efficiency and effectiveness of distributed and cluster systems when they deal with spatiotemporal data.
History of engineering & technology --- spatial big data --- parallel processing --- MapReduce --- arable land quality (ALQ) --- GIS --- big data --- IoT --- Hadoop --- geospatial big data --- geospatial applications --- buffer analysis --- real-time --- visualization-oriented --- tile-pyramid --- parallel computing --- soil erosion modelling --- mobility --- data warehouses --- spatiotemporal OLAP --- mobility analytics --- location-based aggregate queries --- distributed processing technique --- grid structure --- MapReduce-based aggregate query algorithm --- watershed analysis --- multiple flow accumulation --- DEM --- CUDA --- OpenACC --- GPU --- sustainable development --- Agenda 2063 --- geoportal --- monitoring and evaluation --- geospatial data
Choose an application
This book aims at promoting new and innovative studies, proposing new architectures or innovative evolutions of existing ones, and illustrating experiments on current technologies in order to improve the efficiency and effectiveness of distributed and cluster systems when they deal with spatiotemporal data.
spatial big data --- parallel processing --- MapReduce --- arable land quality (ALQ) --- GIS --- big data --- IoT --- Hadoop --- geospatial big data --- geospatial applications --- buffer analysis --- real-time --- visualization-oriented --- tile-pyramid --- parallel computing --- soil erosion modelling --- mobility --- data warehouses --- spatiotemporal OLAP --- mobility analytics --- location-based aggregate queries --- distributed processing technique --- grid structure --- MapReduce-based aggregate query algorithm --- watershed analysis --- multiple flow accumulation --- DEM --- CUDA --- OpenACC --- GPU --- sustainable development --- Agenda 2063 --- geoportal --- monitoring and evaluation --- geospatial data
Choose an application
This book aims at promoting new and innovative studies, proposing new architectures or innovative evolutions of existing ones, and illustrating experiments on current technologies in order to improve the efficiency and effectiveness of distributed and cluster systems when they deal with spatiotemporal data.
History of engineering & technology --- spatial big data --- parallel processing --- MapReduce --- arable land quality (ALQ) --- GIS --- big data --- IoT --- Hadoop --- geospatial big data --- geospatial applications --- buffer analysis --- real-time --- visualization-oriented --- tile-pyramid --- parallel computing --- soil erosion modelling --- mobility --- data warehouses --- spatiotemporal OLAP --- mobility analytics --- location-based aggregate queries --- distributed processing technique --- grid structure --- MapReduce-based aggregate query algorithm --- watershed analysis --- multiple flow accumulation --- DEM --- CUDA --- OpenACC --- GPU --- sustainable development --- Agenda 2063 --- geoportal --- monitoring and evaluation --- geospatial data
Choose an application
This book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance.If you are a Hadoop administrator, developer, MapReduce user, or beginner, this book is the best choice available if you wish to optimize your clusters and applications. Having prior knowledge of creating MapReduce applications is not necessary, but will help you better understand the concepts and snippets of MapReduce class template code.
Electronic data processing --- Cluster analysis --- Open source software. --- Free software (Open source software) --- Open code software --- Opensource software --- Computer software --- Distributed computer systems in electronic data processing --- Distributed computing --- Distributed processing in electronic data processing --- Computer networks --- Distributed processing. --- Data processing. --- Apache Hadoop. --- MapReduce (Computer file) --- Hadoop
Choose an application
If you are a Big Data enthusiast and wish to use Hadoop v2 to solve your problems, then this book is for you. This book is for Java programmers with little to moderate knowledge of Hadoop MapReduce. This is also a one-stop reference for developers and system admins who want to quickly get up to speed with using Hadoop v2. It would be helpful to have a basic knowledge of software development using Java and a basic working knowledge of Linux.
Electronic data processing --- File organization (Computer science) --- File management (Computer science) --- File systems (Computer science) --- Organization, File (Computer science) --- Database management --- Distributed computer systems in electronic data processing --- Distributed computing --- Distributed processing in electronic data processing --- Computer networks --- Distributed processing. --- Apache Hadoop. --- MapReduce (Computer file) --- Hadoop
Choose an application
Moving beyond MapReduce - learn resource management and big data processing using YARN About This Book Deep dive into YARN components, schedulers, life cycle management and security architecture Create your own Hadoop-YARN applications and integrate big data technologies with YARN Step-by-step guide to provision, manage, and monitor Hadoop-YARN clusters with ease Who This Book Is For This book is intended for those who want to understand what YARN is and how to efficiently use it for the resource management of large clusters. For cluster administrators, this book gives a detailed explanation of provisioning and managing YARN clusters. If you are a Java developer or an open source contributor, this book will help you to drill down the YARN architecture, write your own YARN applications and understand the application execution phases. This book will also help big data engineers explore YARN integration with real-time analytics technologies such as Spark and Storm. What You Will Learn Explore YARN features and offerings Manage big data clusters efficiently using the YARN framework Create single as well as multi-node Hadoop-YARN clusters on Linux machines Understand YARN components and their administration Gain insights into application execution flow over a YARN cluster Write your own distributed application and execute it over YARN cluster Work with schedulers and queues for efficient scheduling of applications Integrate big data projects like Spark and Storm with YARN In Detail Today enterprises generate huge volumes of data. In order to provide effective services and to make smarter and more intelligent decisions from these huge volumes of data, enterprises use big-data analytics. In recent years, Hadoop has been used for massive data storage and efficient distributed processing of data. The Yet Another Resource Negotiator (YARN) framework solves the design problems related to resource management faced by the Hadoop 1.x framework by providing a more scalable, efficient, flexible, and highly available resource management framework for distributed data processing. This book starts with an overview of the YARN features and explains how YARN provides a business solution for growing big data needs. You will learn to provision and manage single, as well as multi-node, Hadoop-YARN clusters in the easiest way. You will walk through the YARN administration, life cycle management, application execution, REST APIs, schedulers, security framework and so o...
Big data. --- Open source software. --- Electronic data processing --- Distributed computer systems in electronic data processing --- Distributed computing --- Distributed processing in electronic data processing --- Computer networks --- Free software (Open source software) --- Open code software --- Opensource software --- Computer software --- Data sets, Large --- Large data sets --- Data sets --- Distributed processing. --- Apache Hadoop. --- MapReduce. --- Hadoop
Listing 1 - 10 of 12 | << page >> |
Sort by
|