FERRARO PETRILLO UMBERTO
(programma)
Introduction to Big Data
• Part I: Managing Big Data
- Distributed Systems
- Distributed Relational DBMSs
⁃ Limitations of the relational approach
⁃ Introduction to the NoSQL DBMSs
- The Key Value approach
-- Redis:
--- design principles
--- data modelling
--- data querying
-- Cassandra:
--- architecture
--- design principles
--- data modelling
--- data querying
--- Java binding
- The Document based approach
-- MongoDB:
--- architecture
--- design principles
⁃-- the JSON format
--- data modelling
--- data querying
--- Java binding
-- Neo4J:
--- architecture
--- design principles
--- data modelling
--- data querying
--- Java binding
• Part II: Big Data Computing
- Processing Big Data
- The distributed approach
- The MapReduce paradigm
- Apache Hadoop
-- architecture
-- design principles
- Apache Spark
-- architecture
-- design principles
-- distributed programming with Spark:
--- Resilient Distributed Data structures
--- Actions and Transformations
--- distributed joins
--- Dataframes and Datasets
--- case studies:
---- Implementing a distributed data exploration framework
---- Implementing a distributed ETL procedure
---- Implementing a distributed recommendation engine
---- Implementing a graph analytics framework
Appunti, articoli scientifici ed altro materiale didattico forniti dal docente durante il corso.
|