Библиографическое описание:Gupta, Rajesh. Hands-On Data Analysis with Scala: Perform Data Collection, Processing, Manipulation, and Visualization with Scala / Rajesh Gupta. - Электрон. текстовые дан. - Birmingham : Packt publishing Ltd, 2019. - 1 online resource (288 pages). - Загл. с титул. экрана. - ISBN 1789344263. - ISBN 9781789344264 : Б. ц. - Текст : электронный. Natural language processing for data analysis Перевод заглавия: Практический анализ данных с помощью Scala: выполняйте сбор, обработку, манипулирование и визуализацию данных с помощью Scala Примечания о происхождении: Коллекция цифровых книг Ebsco ebook (централизованная подписка 2023 г., бессрочный доступ), НБ СФУ
Аннотация:This book will help you perform effective data analysis with Scala using practical examples. You will come across different challenges and their effective solutions for a variety of data processing tasks - be it data exploration, data manipulation, or real-time data analysis using Apache Spark.
Эта книга поможет вам эффективно анализировать данные с помощью Scala на практических примерах. Вы познакомитесь с различными проблемами и их эффективными решениями для различных задач обработки данных - будь то исследование данных, манипулирование данными или анализ данных в реальном времени с помощью Apache Spark.
Cover; Title Page; Copyright and Credits; Dedication; About Packt; Contributors; Table of Contents; Preface; Section 1: Scala and Data Analysis Life Cycle; Chapter 1: Scala Overview; Getting started with Scala; Running Scala code online; Scastie; ScalaFiddle; Installing Scala on your computer; Installing command-line tools; Installing IDE; Overview of object-oriented and functional programming; Object-oriented programming using Scala; Functional programming using Scala; Scala case classes and the collection API; Scala case classes; Scala collection API; Array; List; Map
Overview of Scala libraries for data analysisApache Spark; Breeze; Breeze-viz; DeepLearning; Epic; Saddle; Scalalab; Smile; Vegas; Summary; Chapter 2: Data Analysis Life Cycle; Data journey; Sourcing data; Data formats; XML; JSON; CSV; Understanding data; Using statistical methods for data exploration; Using Scala; Other Scala tools; Using data visualization for data exploration; Using the vegas-viz library for data visualization; Other libraries for data visualization; Using ML to learn from data; Setting up Smile; Running Smile; Creating a data pipeline; Summary; Chapter 3: Data Ingestion
Data extractionPull-oriented data extraction; Push-oriented data delivery; Data staging; Why is the staging important?; Cleaning and normalizing; Enriching; Organizing and storing; Summary; Chapter 4: Data Exploration and Visualization; Sampling data; Selecting the sample; Selecting samples using Saddle; Performing ad hoc analysis; Finding a relationship between data elements; Visualizing data; Vegas viz for data visualization; Spark Notebook for data visualization; Downloading and installing Spark Notebook; Creating a Spark Notebook with simple visuals; More charts with Spark Notebook
Box plotHistogram; Bubble chart; Summary; Chapter 5: Applying Statistics and Hypothesis Testing; Basics of statistics; Summary level statistics; Correlation statistics; Vector level statistics; Random data generation; Pseudorandom numbers; Random numbers with normal distribution; Random numbers with Poisson distribution; Hypothesis testing; Summary; Section 2: Advanced Data Analysis and Machine Learning; Chapter 6: Introduction to Spark for Distributed Data Analysis; Spark setup and overview; Spark core concepts; Spark Datasets and DataFrames; Sourcing data using Spark; Parquet file format
Avro file formatSpark JDBC integration; Using Spark to explore data; Summary; Chapter 7: Traditional Machine Learning for Data Analysis; ML overview; Characteristics of ML; Categories or types of ML; Decision trees; Implementing decision trees; Decision tree algorithms; Implementing decision tree algorithms in our example; Evaluating the results; Using our model with a decision tree; Random forest; Random forest algorithms; Ridge and lasso regression; Characteristics of ridge regression; Characteristics of lasso regression; k-means cluster analysis