All Courses

Apache Spark with Python – Big Data with PySpark and Spark

Apache Spark with Python - Big Data with PySpark and Spark
Apache Spark with Python - Big Data with PySpark and Spark

Apache Spark with Python – Big Data with PySpark and Spark

Learn Apache Spark and Python by 12+ hands-on examples of analyzing big data with PySpark and Spark

What Will I Learn?

Apache Spark with Python – Big Data with PySpark and Spark

  • An overview of the architecture of Apache Spark.
  • Develop Apache Spark 2.0 applications using RDD transformations and actions and Spark SQL.
  • Work with Apache Spark’s primary abstraction, resilient distributed datasets (RDDs) to process and analyze large data sets.
  • Analyze structured and semi-structured data using DataFrames, and develop a thorough understanding of Spark SQL.
  • Advanced techniques to optimize and tune Apache Spark jobs by partitioning, caching and persisting RDDs.
  • Scale-up Spark applications on a Hadoop YARN cluster through Amazon’s Elastic MapReduce service.
  • Share information across different nodes on an Apache Spark cluster by broadcast variables and accumulators.
  • Write Spark applications using the Python API – PySpark

Requirements

  • A computer running Windows, OSX or Linux
  • Previous Python programming skills

 Description

Apache Spark with Python – Big Data with PySpark and Spark

What is this course about:

This course covers 10+ hands-on big data examples. You will learn valuable knowledge about how to frame data analysis problems as Spark problems. And much much more.

What will you learn from this lecture:

  • An overview of the architecture of Apache Spark.
  • Develop Apache Spark 2.0 applications with PySpark using RDD transformations and actions and Spark SQL.
  • Work with Apache Spark’s primary abstraction, resilient distributed datasets(RDDs) to process and analyze large data sets.
  • Deep dive into advanced techniques to optimize and tune Apache Spark jobs by partitioning, caching and persisting RDDs.
  • Scale-up Spark applications on a Hadoop YARN cluster through Amazon’s Elastic MapReduce service.
  • Analyze structured and semi-structured data using Datasets and DataFrames, and develop a thorough understanding of Spark SQL.
  • Share information across different nodes on an Apache Spark cluster by broadcast variables and accumulators.
  • Best practices of working with Apache Spark in the field.
  • Big data ecosystem overview.

Who is the target audience?

  • Software engineers who want to develop Apache Spark 2.0 applications using Spark Core and Spark SQL.
  • Data scientists or data engineers who want to advance their careers by improving their big data processing skills.
  • PHP Data Objects with CRUD Application

Content From: http://www.udemy.com/apache-spark-with-python-big-data-with-pyspark-and-spark/

Advertisement



Friendly Website

Learn Python

WebCourses.us

Advertisement



Categories