Spark - The Definitive Guide: Big data processing made simple 1st Edition

-22%

In Stock

Spark – The Definitive Guide: Big data processing made simple 1st Edition

Original price was: 510.00 Dhs.Current price is: 400.00 Dhs.

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals.

Categories: Data Engineering, Technology & Programming Tag: Books

Description
Additional information

Description

Youâ ll explore the basic operations and common functions of Sparkâ s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ s scalable machine-learning library.

Get a gentle overview of big data and Spark
Learn about DataFrames, SQL, and Datasetsâ Sparkâ s core APIsâ through worked examples
Dive into Sparkâ s low-level APIs, RDDs, and execution of SQL and DataFrames
Understand how Spark runs on a cluster
Debug, monitor, and tune Spark clusters and applications
Learn the power of Structured Streaming, Sparkâ s stream-processing engine
Learn how you can apply MLlib to a variety of problems, including classification or recommendation

What Is Apache Spark?

Apache Spark is a unified computing engine and a set of libraries for parallel data processing on computer clusters. As of this writing, Spark is the most actively developed open source engine for this task, making it a standard tool for any developer or data scientist interested in big data. Spark supports multiple widely used programming languages (Python, Java, Scala, and R), includes libraries for diverse tasks ranging from SQL to streaming and machine learning, and runs anywhere from a laptop to a cluster of thousands of servers. This makes it an easy system to start with and scale-up to big data processing or incredibly large scale.

Although the project has existed for multiple years-first as a research project started at UC Berkeley in 2009, then at the Apache Software Foundation since 2013-the open source community is continuing to build more powerful APIs and high-level libraries over Spark, so there is still a lot to write about the project. We decided to write this book for two reasons. First, we wanted to present the most comprehensive book on Apache Spark, covering all of the fundamental use cases with easy-to-run examples. Second, we especially wanted to explore the higher-level ‘structured’ APIs that were finalized in Apache Spark 2.0-namely DataFrames, Datasets, Spark SQL, and Structured Streaming-which older books on Spark don’t always include. We hope this book gives you a solid foundation to write modern Apache Spark applications using all the available tools in the project.

Who This Book Is For

We designed this book mainly for data scientists and data engineers looking to use Apache Spark. The two roles have slightly different needs, but in reality, most application development covers a bit of both, so we think the material will be useful in both cases. Specifically, in our minds, the data scientist workload focuses more on interactively querying data to answer questions and build statistical models, while the data engineer job focuses on writing maintainable, repeatable production applications-either to use the data scientist’s models in practice, or just to prepare data for further analysis (e.g., building a data ingest pipeline). However, we often see with Spark that these roles blur. For instance, data scientists are able to package production applications without too much hassle and data engineers use interactive analysis to understand and inspect their data to build and maintain pipelines.

While we tried to provide everything data scientists and engineers need to get started, there are some things we didn’t have space to focus on in this book. First, this book does not include in-depth introductions to some of the analytics techniques you can use in Apache Spark, such as machine learning. Instead, we show you how to invoke these techniques using libraries in Spark, assuming you already have a basic background in machine learning. Many full, standalone books exist to cover these techniques in formal detail, so we recommend starting with those if you want to learn about these areas. Second, this book focuses more on application development than on operations and administration (e.g., how to manage an Apache Spark cluster with dozens of users). Nonetheless, we have tried to include comprehensive material on monitoring, debugging, and configuration in Parts V and VI of the book to help engineers get their application running efficiently and tackle day-to-day maintenance. Finally, this book places less emphasis on the older, lower-level APIs in Spark-specifically RDDs and DStreams-to introduce most of the concepts using the newer, higher-level structured APIs. Thus, the book may not be the best fit if you need to maintain an old RDD or DStream application, but should be a great introduction to writing new applications.

Book details

Author : Bill Chambers, Matei Zaharia,
Publisher ‏: O’Reilly Media
Publication date ‏: April 3, 2018
Edition ‏: ‎1st
Print length : 603 pages
Language : English
Format : Paperback

Additional information

book-author	Bill Chambers, Matei Zaharia
Select Format	Paperback

Original Books | Free delivery over 300 MAD | All Morocco 💫

(+212) 682-08-02-05

Find a Book Store

Spark – The Definitive Guide: Big data processing made simple 1st Edition

Description

Description

Book details

Additional information

Additional information

Who might be interested

Guide infirmier des urgences

Orthopédie Traumatologie – Conforme à la réforme R2C de l’EDN

Neuro-imagerie diagnostique (Imagerie médicale : Précis)

Neurophysiologie: De la physiologie à l’exploration fonctionnelle

Médecine cardio-vasculaire: Réussir les ECNi (French Edition) 1st Edition

Free Delivery

100% Secure

Expert Customer

payment

All Books

Support

Wishlist

Contact Us

Email: contact@Rababooks.com

WhatsApp: +212 068 208 0205

Information

Categories

Categories

Raba Books Morocco's

Spark – The Definitive Guide: Big data processing made simple 1st Edition

Description

Description

Book details

Additional information

Additional information

Related products

Who might be interested

Free Delivery

100% Secure

Expert Customer

payment