Narrow your search

Library

UGent (9)

KU Leuven (6)

Odisee (6)

Thomas More Kempen (6)

Thomas More Mechelen (6)

UCLL (6)

ULB (6)

ULiège (6)

VIVES (6)

KBC (4)

More...

Resource type

book (9)


Language

English (9)


Year
From To Submit

2019 (1)

2018 (4)

2017 (1)

2016 (2)

2015 (1)

Listing 1 - 9 of 9
Sort by

Book
Spark : the definitive guide : big data processing made simple
Authors: ---
ISBN: 9781491912218 Year: 2018 Publisher: Beijing : O'Reilly,

Loading...
Export citation

Choose an application

Bookmark

Abstract


Book
High performance spark : best practices for scaling and optimizing Apache Spark
Authors: ---
ISBN: 9781491943205 Year: 2017 Publisher: Beijing : O'Reilly,

Loading...
Export citation

Choose an application

Bookmark

Abstract


Book
Learning spark
Authors: --- --- ---
ISBN: 9781449358624 1449358624 Year: 2015 Publisher: Beijing O'Reilly

Loading...
Export citation

Choose an application

Bookmark

Abstract

This book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. You'll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning.


Book
Big Data SMACK : A Guide to Apache Spark, Mesos, Akka, Cassandra, and Kafka
Authors: ---
ISBN: 1484221745 1484221753 Year: 2016 Publisher: Berkeley, CA : Apress : Imprint: Apress,

Loading...
Export citation

Choose an application

Bookmark

Abstract

Integrate full-stack open-source fast data pipeline architecture and choose the correct technology—Spark, Mesos, Akka, Cassandra, and Kafka (SMACK)—in every layer. Fast data is becoming a requirement for many enterprises. So far, however, the focus has largely been on collecting, aggregating, and crunching large data sets in a timely manner. In many cases organizations need more than one paradigm to perform efficient analyses. Big Data SMACK explains each technology and, more importantly, how to integrate them. It provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples. The book focuses on the problems and scenarios solved by the architecture, as well as the solutions provided by each technology. This book covers the five main concepts of data pipeline architecture and how to integrate, replace, and reinforce every layer: The engine: Apache Spark The container: Apache Mesos The model: Akka< The storage: Apache Cassandra The broker: Apache Kafka.


Book
Pro Spark Streaming : The Zen of Real-Time Analytics Using Apache Spark
Author:
ISBN: 1484214803 148421479X Year: 2016 Publisher: Berkeley, CA : Apress : Imprint: Apress,

Loading...
Export citation

Choose an application

Bookmark

Abstract

Learn the right cutting-edge skills and knowledge to leverage Spark Streaming to implement a wide array of real-time, streaming applications. This book walks you through end-to-end real-time application development using real-world applications, data, and code. Taking an application-first approach, each chapter introduces use cases from a specific industry and uses publicly available datasets from that domain to unravel the intricacies of production-grade design and implementation. The domains covered in Pro Spark Streaming include social media, the sharing economy, finance, online advertising, telecommunication, and IoT. In the last few years, Spark has become synonymous with big data processing. DStreams enhance the underlying Spark processing engine to support streaming analysis with a novel micro-batch processing model. Pro Spark Streaming by Zubair Nabi will enable you to become a specialist of latency sensitive applications by leveraging the key features of DStreams, micro-batch processing, and functional programming. To this end, the book includes ready-to-deploy examples and actual code. Pro Spark Streaming will act as the bible of Spark Streaming. What You'll Learn Discover Spark Streaming application development and best practices Work with the low-level details of discretized streams Optimize production-grade deployments of Spark Streaming via configuration recipes and instrumentation using Graphite, collectd, and Nagios Ingest data from disparate sources including MQTT, Flume, Kafka, Twitter, and a custom HTTP receiver Integrate and couple with HBase, Cassandra, and Redis Take advantage of design patterns for side-effects and maintaining state across the Spark Streaming micro-batch model Implement real-time and scalable ETL using data frames, SparkSQL, Hive, and SparkR Use streaming machine learning, predictive analytics, and recommendations Mesh batch processing with stream processing via the Lambda architecture Who This Book Is For Data scientists, big data experts, BI analysts, and data architects.


Book
Practical Apache Spark : Using the Scala API
Authors: ---
ISBN: 1484236521 1484236513 Year: 2018 Publisher: Berkeley, CA : Apress : Imprint: Apress,

Loading...
Export citation

Choose an application

Bookmark

Abstract

Work with Apache Spark using Scala to deploy and set up single-node, multi-node, and high-availability clusters. This book discusses various components of Spark such as Spark Core, DataFrames, Datasets and SQL, Spark Streaming, Spark MLib, and R on Spark with the help of practical code snippets for each topic. Practical Apache Spark also covers the integration of Apache Spark with Kafka with examples. You’ll follow a learn-to-do-by-yourself approach to learning – learn the concepts, practice the code snippets in Scala, and complete the assignments given to get an overall exposure. On completion, you’ll have knowledge of the functional programming aspects of Scala, and hands-on expertise in various Spark components. You’ll also become familiar with machine learning algorithms with real-time usage. You will: Discover the functional programming features of Scala Understand the complete architecture of Spark and its components Integrate Apache Spark with Hive and Kafka Use Spark SQL, DataFrames, and Datasets to process data using traditional SQL queries Work with different machine learning concepts and libraries using Spark's MLlib packages.


Book
Next-Generation Big Data : A Practical Guide to Apache Kudu, Impala, and Spark
Author:
ISBN: 9781484231470 1484231473 1484231465 Year: 2018 Publisher: Berkeley, CA : Apress :

Loading...
Export citation

Choose an application

Bookmark

Abstract

Utilize this practical and easy-to-follow guide to modernize traditional enterprise data warehouse and business intelligence environments with next-generation big data technologies. Next-Generation Big Data takes a holistic approach, covering the most important aspects of modern enterprise big data. The book covers not only the main technology stack but also the next-generation tools and applications used for big data warehousing, data warehouse optimization, real-time and batch data ingestion and processing, real-time data visualization, big data governance, data wrangling, big data cloud deployments, and distributed in-memory big data computing. Finally, the book has an extensive and detailed coverage of big data case studies from Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and Mastercard. What You'll Learn Install Apache Kudu, Impala, and Spark to modernize enterprise data warehouse and business intelligence environments, complete with real-world, easy-to-follow examples, and practical adviceIntegrate HBase, Solr, Oracle, SQL Server, MySQL, Flume, Kafka, HDFS, and Amazon S3 with Apache Kudu, Impala, and SparkUse StreamSets, Talend, Pentaho, and CDAP for real-time and batch data ingestion and processingUtilize Trifacta, Alteryx, and Datameer for data wrangling and interactive data processingTurbocharge Spark with Alluxio, a distributed in-memory storage platformDeploy big data in the cloud using Cloudera DirectorPerform real-time data visualization and time series analysis using Zoomdata, Apache Kudu, Impala, and SparkUnderstand enterprise big data topics such as big data governance, metadata management, data lineage, impact analysis, and policy enforcement, and how to use Cloudera Navigator to perform common data governance tasksImplement big data use cases such as big data warehousing, data warehouse optimization, Internet of Things, real-time data ingestion and analytics, complex event processing, and scalable predictive modelingStudy real-world big data case studies from innovative companies, including Navistar, Cerner, British Telecom, Shopzilla, Thomson Reuters, and MastercardWho This Book Is For BI and big data warehouse professionals interested in gaining practical and real-world insight into next-generation big data processing and analytics using Apache Kudu, Impala, and Spark; and those who want to learn more about other advanced enterprise topics


Book
Beginning Apache Spark 2 : With Resilient Distributed Datasets, Spark SQL, Structured Streaming and Spark Machine Learning library
Author:
ISBN: 1484235797 1484235789 Year: 2018 Publisher: Berkeley, CA : Apress : Imprint: Apress,

Loading...
Export citation

Choose an application

Bookmark

Abstract

Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it. Along the way, you’ll discover resilient distributed datasets (RDDs); use Spark SQL for structured data; and learn stream processing and build real-time applications with Spark Structured Streaming. Furthermore, you’ll learn the fundamentals of Spark ML for machine learning and much more. After you read this book, you will have the fundamentals to become proficient in using Apache Spark and know when and how to apply it to your big data applications. You will: Understand Spark unified data processing platform Use and manipulate RDDs Deal with structured data using Spark SQL Build real-time applications using Spark Structured Streaming Develop intelligent applications with the Spark Machine Learning library.


Book
Scala Programming for Big Data Analytics : Get Started With Big Data Analytics Using Apache Spark
Author:
ISBN: 1484248104 1484248090 Year: 2019 Publisher: Berkeley, CA : Apress : Imprint: Apress,

Loading...
Export citation

Choose an application

Bookmark

Abstract

Gain the key language concepts and programming techniques of Scala in the context of big data analytics and Apache Spark. The book begins by introducing you to Scala and establishes a firm contextual understanding of why you should learn this language, how it stands in comparison to Java, and how Scala is related to Apache Spark for big data analytics. Next, you’ll set up the Scala environment ready for examining your first Scala programs. You’ll start with code blocks that allow you to group and execute related statements together as a block and see the implications for Scala’s type system. The author discusses functions at length and highlights a number of associated concepts such as zero-parity functions, single-line functions, and anonymous functions. Along the way you’ll see the development life cycle of a Scala program. This involves compiling and building programs using the industry-standard Scala Build Tool (SBT). You’ll cover guidelines related to dependency management using SBT as this is critical for building large Apache Spark applications. Scala Programming for Big Data Analytics concludes by demonstrating how you can make use of the concepts to write programs that run on the Apache Spark framework. These programs will provide distributed and parallel computing, which is critical for big data analytics. You will: See the fundamentals of Scala as a general-purpose programming language Understand functional programming and object-oriented programming constructs in Scala Comprehend the use and various features of Scala REPL (shell) Use Scala collections and functions Employ functional programming constructs.

Listing 1 - 9 of 9
Sort by