v2.11.0 (5491)

Enseignement de Master - DATA915 : Big Data Frameworks

Descriptif

The module Big Data Frameworks is composed of two courses:

Big Data with Hadoop (27 hours)
Data Science with Spark (13 hours)
Big Data with Hadoop:

Apache Hadoop has been evolving as the Big Data platform on top of which multiple building blocks are being developed. This course presents the Hadoop ecosystem, Hadoop Distributed File System (HDFS) as well as many of the tools developed on it:

MapReduce and YARN
Hive and HBase
Kafka, Flume, NiFi, Flink, Oozie, etc.
Students will also discover various subjects such as security, resource allocation and data governance in Hadoop

Data Science with Spark:

Apache Spark is rapidly becoming the computation engine of choice for big data. This course presents:

Spark’s architecture and Spark Core: RDDs (Resilient Distributed Datasets), Transformations, and Actions
Spark and Structured Data: explore Spark SQL and Spark Data Frames
Spark Machine Learning libraries (MLLIB and ML)
Spark Streaming

Objectifs pédagogiques

The objectives of this course are the following:
  • Discover the different components of a Big Data cluster and how they interact.
  • Understand Big Data paradigms.
  • Understand the benefits of open source solutions.
  • Develop a Big Data project from scratch
  • Master Spark, its data models and its different methods of operation
  • Learn how to use Spark to analyze data, develop Machine Learning pipelines and finally do streaming with Spark
  • Understand and implement distributed algorithms.
  • Understand the advantages of SQL/NOSQL databases.

40 heures en présentiel

effectifs minimal / maximal:

/20

Diplôme(s) concerné(s)

Parcours de rattachement

Pour les étudiants du diplôme Echange non diplomant

Java, Python, Machine Learning and basic knowledge in Linux system administration and SQL

Pour les étudiants du diplôme Data Science

Java, Python, Machine Learning and basic knowledge in Linux system administration and SQL

Pour les étudiants du diplôme Diplôme d'ingénieur

Java, Python, Machine Learning and basic knowledge in Linux system administration and SQL

Format des notes

Numérique sur 20

Littérale/grade européen

Pour les étudiants du diplôme Diplôme d'ingénieur

Vos modalités d'acquisition :

The final mark of the module is a weighted average of 2 marks:

  • Big Data with Hadoop (weight 2)
  • Data Science with Spark (weight 1)

Each course is evaluated by a midterm exam (coefficient 0.3, 1 hour) and a continuous evaluation (coefficient 0.7, labs and mini-projects).

L'UE est acquise si Note finale >= 10
  • Crédits ECTS acquis : 5 ECTS
  • Crédit d'Option 3A acquis : 5

La note obtenue rentre dans le calcul de votre GPA.

Pour les étudiants du diplôme Data Science

Vos modalités d'acquisition :

The final mark of the module is a weighted average of 2 marks:

  • Big Data with Hadoop (weight 2)
  • Data Science with Spark (weight 1)

Each course is evaluated by a midterm exam (coefficient 0.3, 1 hour) and a continuous evaluation (coefficient 0.7, labs and mini-projects).

L'UE est acquise si Note finale >= 10
  • Crédits ECTS acquis : 5 ECTS

La note obtenue rentre dans le calcul de votre GPA.

Pour les étudiants du diplôme Echange non diplomant

Vos modalités d'acquisition :

The final mark of the module is a weighted average of 2 marks:

  • Big Data with Hadoop (weight 2)
  • Data Science with Spark (weight 1)

Each course is evaluated by a midterm exam (coefficient 0.3, 1 hour) and a continuous evaluation (coefficient 0.7, labs and mini-projects).

L'UE est acquise si Note finale >= 10
  • Crédits ECTS acquis : 5 ECTS

La note obtenue rentre dans le calcul de votre GPA.

Veuillez patienter