Spark Development and data analysis

 
Free Reservation button
 

Course overview

 
Data scientists/engineer/analyst build information platform to provide deep insight and answer previously unimaginable questions. Spark and Hadoop are transforming how data scientists/engineer/analyst works by allowing interactive and integrative data analysis at scale.
 
You will learn how Spark and Hadoop enable data scientists/engineer/analyst to help companies reduce costs, increase profits, improve products, retain customers, and identify the new opportunities.
 
You will learn what data scientists/engineer/analyst do, the problems they solve, the tools and techniques they use. Through in-class simulations, participates apply data analysis methods to real-world challenges in different industries and, ultimately, prepare for big data application development and big data analyst roles in the field.
 
 
 

Outline

 

Part I Fundamental

 
Module 1 - Spark Introduction and Basic Programming
 
Introduction Spark
 
What is Spark?
 
A brief History of Spark
 
Programming with RDDs
 
Module 2 - Advanced Spark Programming
 
Spark Storage - Loading and saving data
 
Advanced Spark Programming          
 
Standalone applications
 
Module 3 - Spark SQL
 
          Linking with Spark SQL
 
            Using Spark SQL in Applications
 
            JDBC/ODBC server
 
            User-Defined Functions
 
            Spark SQL Performance
 
Module 4 - Spark Streaming
 
          Architecture and abstraction
 
          Input/output operations
 
          Streaming UI
 
          Performance Considerations
 
Module 5 - Tuning and Debug Spark
 
          Configuration Spark
 
          Key Performance considerations
 
Module 6 - Running on Cluster
 
          Runtime Architecture
 
          Cluster Manager
 
 

Part II Applications

 
Module 7 - Machine Learning
 
 Designing a Machine learning system
 
 Building a Recommendation Engine with Spark      
 
MLlib Decision Trees
 
Module 8 – Prediction with Decision tree
 
          Decision tree
 
          Training Examples
          Preparing the data
 
          A First Decision tree
 
          Tuning Decision Trees
 
          Making Predictions
 
          Conclusions
 
Module 9 – Anomaly Detection with K-means Clustering
 
          Anomaly Detection
 
            K-means clustering
 
            A First Take on Clustering
 
            Choosing k
 
            Visualization
 
            Feature Normalisation
 
            Clustering in action
 
Module 10 – Exploring Property Location data 
 
             Loading data
 
Variables to explore
 
Exploring property value
 
Exploring lot size
 
Exploring costs   
 
Exploring the year a property has been built      
 
Exploring rent and income     

Module 11 - Estimating Financial Risk through Mote Carlo Simulation
 
 Build model
 
Getting the data
 
Preprocessing
 
Determine the factor Weights
 
Visualizing the results
 
Evaluating results
 
Module 12 - Interactive Data Analysis with Zeppelin
  
Appendix Scala programming Essential 
 
Free Reservation button
Last updated: | -- | Powered by WECAN.ca CMS