Listing 1 - 9 of 9 |
Sort by
|
Choose an application
Leverage Apache Spark within a modern data engineering ecosystem. This hands-on guide will teach you how to write fully functional applications, follow industry best practices, and learn the rationale behind these decisions. With Apache Spark as the foundation, you will follow a step-by-step journey beginning with the basics of data ingestion, processing, and transformation, and ending up with an entire local data platform running Apache Spark, Apache Zeppelin, Apache Kafka, Redis, MySQL, Minio (S3), and Apache Airflow.
Data mining. --- Spark (Electronic resource : Apache Software Foundation) --- Algorithmic knowledge discovery --- Factual data analysis --- KDD (Information retrieval) --- Knowledge discovery in data --- Knowledge discovery in databases --- Mining, Data --- Database searching --- Apache Spark (Electronic resource : Apache Software Foundation)
Choose an application
"Depuis 2015, Spark s'impose comme le standard de-facto pour le big data : en apportant simplicité d'usage, puissance de calcul, analyses en temps réel, algorithmes de machine learning et deep learning, le tout accessible en Python. Spark est devenu la porte d'entrée incontournable des projets de valorisation de données. Alors que vient de sortir Spark 3avec son lot d'innovations (Koalas, DeltaLake, et gestion des GPU), les environnements simplifiés « clicks boutons » sont légion (DataBricks, Dataiku, RapidMiner, etc.). Mais pour les utiliser à bon escient, il vous faudra comprendre son fonctionnement interne de Spark afin de paramétrer correctement votre cluster et vos applications. C'est ce que propose ce livre : vous emmener dans une compréhension fine des tenants et aboutissants de Spark. L'analyse des données n'est utile que dans des cas business précis. C'est pourquoi nous insistons sur une méthode d'analyse des données qui vous permettra de connaître les étapes d'un projet de machine learning, et les questions indispensables à se poser pour réussir une analyse pertinente. Nous l'illustrons via un exemple complet d'une entreprise (virtuelle) de location de vélo en libre service. Ainsi, en lisant ce livre, vous maîtriserez l'outil et la méthode adéquats pour valoriser vos données de manière éclairée, vous assurant une meilleure efficacité et rentabilité de vos projets data."
Choose an application
Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in batch and streaming; and the scalable machine learning algorithms and practical utilities to build machine learning applications. Beginning Apache Spark 3 begins by explaining different ways of interacting with Apache Spark, such as Spark Concepts and Architecture, and Spark Unified Stack. Next, it offers an overview of Spark SQL before moving on to its advanced features. It covers tips and techniques for dealing with performance issues, followed by an overview of the structured streaming processing engine. It concludes with a demonstration of how to develop machine learning applications using Spark MLlib and how to manage the machine learning development lifecycle. This book is packed with practical examples and code snippets to help you master concepts and features immediately after they are covered in each section. After reading this book, you will have the knowledge required to build your own big data pipelines, applications, and machine learning applications. What You Will Learn Master the Spark unified data analytics engine and its various components Work in tandem to provide a scalable, fault tolerant and performant data processing engine Leverage the user-friendly and flexible programming model to perform simple to complex data analytics using dataframe and Spark SQL Develop machine learning applications using Spark MLlib Manage the machine learning development lifecycle using MLflow Who This Book Is For Data scientists, data engineers and software developers.
Spark (Electronic resource : Apache Software Foundation) --- Big data. --- Distributed databases. --- Distributed data bases --- Distributed database systems --- Databases --- Cyberinfrastructure --- Data sets, Large --- Large data sets --- Data sets --- Open source software. --- Machine learning. --- Learning, Machine --- Artificial intelligence --- Machine theory --- Free software (Open source software) --- Open code software --- Opensource software --- Computer software --- Apache Spark (Electronic resource : Apache Software Foundation)
Choose an application
Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster. This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything about configuring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data. This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned. What You Will Learn Discover the value of big data analytics that leverage the power of the cloud Get started with Databricks using SQL and Python in either Microsoft Azure or AWS Understand the underlying technology, and how the cloud and Spark fit into the bigger picture See how these tools are used in the real world Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free This book is for data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation. Robert Ilijason is a 20-year veteran in the business intelligence (BI) segment. He has worked as a contractor for some of Europe’s biggest companies and has conducted large-scale analytics projects within the areas of retail, telecom, banking, government, and more. Robert has seen his share of analytic trends come and go over the years, but unlike most of them, he strongly believes that Apache Spark in the cloud, especially with Azure Databricks, is a game changer.
Spark (Electronic resource : Apache Software Foundation) --- Apache Spark (Electronic resource : Apache Software Foundation) --- Big data. --- Microsoft software. --- Microsoft .NET Framework. --- Open source software. --- Computer programming. --- Big Data/Analytics. --- Microsoft and .NET. --- Open Source. --- Computers --- Electronic computer programming --- Electronic data processing --- Electronic digital computers --- Programming (Electronic computers) --- Coding theory --- Free software (Open source software) --- Open code software --- Opensource software --- Computer software --- Data sets, Large --- Large data sets --- Data sets --- Programming
Choose an application
Integrate full-stack open-source fast data pipeline architecture and choose the correct technology—Spark, Mesos, Akka, Cassandra, and Kafka (SMACK)—in every layer. Fast data is becoming a requirement for many enterprises. So far, however, the focus has largely been on collecting, aggregating, and crunching large data sets in a timely manner. In many cases organizations need more than one paradigm to perform efficient analyses. Big Data SMACK explains each technology and, more importantly, how to integrate them. It provides detailed coverage of the practical benefits of these technologies and incorporates real-world examples. The book focuses on the problems and scenarios solved by the architecture, as well as the solutions provided by each technology. This book covers the five main concepts of data pipeline architecture and how to integrate, replace, and reinforce every layer: The engine: Apache Spark The container: Apache Mesos The model: Akka< The storage: Apache Cassandra The broker: Apache Kafka.
Computer science. --- Data structures (Computer science). --- Database management. --- Computer Science. --- Big Data. --- Database Management. --- Data Structures. --- Big data. --- Data sets, Large --- Large data sets --- Data base management --- Data services (Database management) --- Database management services --- DBMS (Computer science) --- Generalized data management systems --- Services, Database management --- Systems, Database management --- Systems, Generalized database management --- Electronic data processing --- Information structures (Computer science) --- Structures, Data (Computer science) --- Structures, Information (Computer science) --- File organization (Computer science) --- Abstract data types (Computer science) --- Informatics --- Science --- Spark (Electronic resource : Apache Software Foundation) --- Apache Mesos (Electronic resource) --- Akka (Electronic resource) --- Apache Cassandra. --- Apache Kafka. --- Cassandra (Electronic resource) --- Apache Spark (Electronic resource : Apache Software Foundation) --- Mesos (Electronic resource) --- Data sets --- Data structures (Computer scienc.
Choose an application
Learn the right cutting-edge skills and knowledge to leverage Spark Streaming to implement a wide array of real-time, streaming applications. This book walks you through end-to-end real-time application development using real-world applications, data, and code. Taking an application-first approach, each chapter introduces use cases from a specific industry and uses publicly available datasets from that domain to unravel the intricacies of production-grade design and implementation. The domains covered in Pro Spark Streaming include social media, the sharing economy, finance, online advertising, telecommunication, and IoT. In the last few years, Spark has become synonymous with big data processing. DStreams enhance the underlying Spark processing engine to support streaming analysis with a novel micro-batch processing model. Pro Spark Streaming by Zubair Nabi will enable you to become a specialist of latency sensitive applications by leveraging the key features of DStreams, micro-batch processing, and functional programming. To this end, the book includes ready-to-deploy examples and actual code. Pro Spark Streaming will act as the bible of Spark Streaming. What You'll Learn Discover Spark Streaming application development and best practices Work with the low-level details of discretized streams Optimize production-grade deployments of Spark Streaming via configuration recipes and instrumentation using Graphite, collectd, and Nagios Ingest data from disparate sources including MQTT, Flume, Kafka, Twitter, and a custom HTTP receiver Integrate and couple with HBase, Cassandra, and Redis Take advantage of design patterns for side-effects and maintaining state across the Spark Streaming micro-batch model Implement real-time and scalable ETL using data frames, SparkSQL, Hive, and SparkR Use streaming machine learning, predictive analytics, and recommendations Mesh batch processing with stream processing via the Lambda architecture Who This Book Is For Data scientists, big data experts, BI analysts, and data architects.
Computer science. --- Data mining. --- Application software. --- Computer Science. --- Computer Appl. in Administrative Data Processing. --- Data Mining and Knowledge Discovery. --- Streaming technology (Telecommunications) --- Big data. --- Spark (Electronic resource : Apache Software Foundation) --- Data sets, Large --- Large data sets --- Streamed media --- Streaming media --- Streaming resources --- Apache Spark (Electronic resource : Apache Software Foundation) --- Application computer programs --- Application computer software --- Applications software --- Apps (Computer software) --- Computer software --- Algorithmic knowledge discovery --- Factual data analysis --- KDD (Information retrieval) --- Knowledge discovery in data --- Knowledge discovery in databases --- Mining, Data --- Database searching --- Informatics --- Science --- Data transmission systems --- Multimedia systems --- Data sets --- Information systems. --- Big Data.
Choose an application
Work with Apache Spark using Scala to deploy and set up single-node, multi-node, and high-availability clusters. This book discusses various components of Spark such as Spark Core, DataFrames, Datasets and SQL, Spark Streaming, Spark MLib, and R on Spark with the help of practical code snippets for each topic. Practical Apache Spark also covers the integration of Apache Spark with Kafka with examples. You’ll follow a learn-to-do-by-yourself approach to learning – learn the concepts, practice the code snippets in Scala, and complete the assignments given to get an overall exposure. On completion, you’ll have knowledge of the functional programming aspects of Scala, and hands-on expertise in various Spark components. You’ll also become familiar with machine learning algorithms with real-time usage. You will: Discover the functional programming features of Scala Understand the complete architecture of Spark and its components Integrate Apache Spark with Hive and Kafka Use Spark SQL, DataFrames, and Datasets to process data using traditional SQL queries Work with different machine learning concepts and libraries using Spark's MLlib packages.
Scala (Computer program language) --- Functional programming languages --- Object-oriented programming languages --- Multiparadigm programming (Computer science) --- Big data. --- Open source software. --- Computer programming. --- Computer science. --- Big Data. --- Open Source. --- Programming Languages, Compilers, Interpreters. --- Informatics --- Science --- Computers --- Electronic computer programming --- Electronic data processing --- Electronic digital computers --- Programming (Electronic computers) --- Coding theory --- Free software (Open source software) --- Open code software --- Opensource software --- Computer software --- Data sets, Large --- Large data sets --- Data sets --- Programming --- Spark (Electronic resource : Apache Software Foundation) --- SPARK (Electronic resource) --- Apache Spark (Electronic resource : Apache Software Foundation) --- Programming languages (Electronic computers). --- Computer languages --- Computer program languages --- Computer programming languages --- Machine language --- Languages, Artificial
Choose an application
Data science is an interdisciplinary field that applies numerous techniques, such as machine learning, neural networks, and deep learning, to create value based on extracting knowledge and insights from available data. Advances in data science have a significant impact on healthcare. While advances in the sharing of medical information result in better and earlier diagnoses as well as more patient-tailored treatments, information management is also affected by trends such as increased patient centricity (with shared decision making), self-care (e.g., using wearables), and integrated care delivery. The delivery of health services is being revolutionized through the sharing and integration of health data across organizational boundaries. Via data science, researchers can deliver new approaches to merge, analyze, and process complex data and gain more actionable insights, understanding, and knowledge at the individual and population levels. This Special Issue focuses on how data science is used in healthcare (e.g., through predictive modeling) and on related topics, such as data sharing and data management.
Medicine --- Pharmacology --- data sharing --- data management --- data science --- big data --- healthcare --- depression --- psychological treatment --- task sharing --- primary care --- pilot study --- non-specialist health worker --- training --- digital technology --- mental health --- COVID-19 --- SARS-CoV-2 --- pneumonia --- computed tomography --- case fatality rate --- social distancing --- smoking --- metabolically healthy obese phenotype --- metabolic syndrome --- obesity --- coronavirus --- machine learning --- social media --- apache spark --- Twitter --- Arabic language --- distributed computing --- smart cities --- smart healthcare --- smart governance --- Triple Bottom Line (TBL) --- thoracic pain --- tree classification --- cross-validation --- hand-foot-and-mouth disease --- early-warning model --- neural network --- genetic algorithm --- sentinel surveillance system --- outbreak prediction --- artificial intelligence --- vascular access surveillance --- arteriovenous fistula --- end stage kidney disease --- dialysis --- kidney failure --- chronic kidney disease (CKD) --- end-stage kidney disease (ESKD) --- kidney replacement therapy (KRT) --- risk prediction --- naïve Bayes classifiers --- precision medicine --- machine learning models --- data exploratory techniques --- breast cancer diagnosis --- tumors classification
Choose an application
Smart cities operate under more resource-efficient management and economy than ordinary cities. As such, advanced business models have emerged around smart cities, which led to the creation of smart enterprises and organizations that depend on advanced technologies. This book includes 21 selected and peer-reviewed articles contributed in the wide spectrum of artificial intelligence applications to smart cities. Chapters refer to the following areas of interest: vehicular traffic prediction, social big data analysis, smart city management, driving and routing, localization, safety, health, and life quality.
Information technology industries --- spatio-temporal --- residual networks --- bus traffic flow prediction --- advance rate --- shield performance --- principal component analysis --- ANFIS-GA --- tunnel --- online learning --- extreme learning machine --- cyclic dynamics --- transfer learning --- knowledge preservation --- Feature Adaptive --- optimization --- Bacterial Foraging algorithm --- Swarm Intelligence algorithm --- Isolated Microgrid --- traffic surveillance video --- state analysis --- Grassmann manifold --- neural network --- machine-learning --- quality of life --- Better Life Index --- bagging --- ensemble learning --- pedestrian attributes --- surveillance image --- semantic attributes recognition --- multi-label learning --- large-scale database --- traffic congestion detection --- minimizing traffic congestion --- traffic prediction --- deep learning --- urban mobility --- ITS --- Vehicle-to-Infrastructure --- neural networks --- LSTM --- embeddings --- trajectories --- motion behavior --- smart tourism --- driver’s behavior detection --- texting and driving --- convolutional neural network --- smart car --- smart cities --- smart infotainment --- driver distraction --- cameras --- convolution --- detection --- image recognition --- DSS --- diabetes prediction --- homecare assistance information system --- muti-attribute analysis --- artificial training dataset --- machine learning --- big data --- data analysis --- sensors --- Internet of Things --- vehicular networks --- VDTN --- routing --- message scheduling --- traffic flow prediction --- wavenet --- TrafficWave --- RNN --- GRU --- SAEs --- risk assessment --- neural architecture search --- recurrent neural network --- automated driving vehicle --- decision support system --- artificial intelligence --- disaster management --- Smart city --- program management --- integrated model --- smart city --- intelligence transportation system --- computer vision --- potential pedestrian safety --- data mining --- healthcare --- Apache Spark --- disease detection --- symptoms detection --- Arabic language --- Saudi dialect --- Twitter --- high performance computing (HPC) --- spatial-temporal dependencies --- traffic periodicity --- graph convolutional network --- traffic speed prediction --- vehicular traffic --- surveillance video --- big data analysis --- autonomous driving --- life quality --- pattern recognition
Listing 1 - 9 of 9 |
Sort by
|