Insegnamento:

BIG DATA ANALYTICS (obiettivi)

Codice

1047773

Modulo: BIG DATA ANALYTICS (obiettivi)

Lingua

ENG

Tipo di attestato

Attestato di profitto

Crediti

Settore scientifico disciplinare

INF/01

Ore Aula

Ore Studio

Attività formativa

Attività formative affini ed integrative

Canale Unico

Docente	FERRARO PETRILLO UMBERTO (programma) The first part of the course will provide an introduction to the Python programming language. This will include the following notions: basic syntax of the language; functions; modules; data structures; I/O management. These notions will be used, in turn, to face more complex tasks related to the analytics scenario. On a side, the student will be taught about the different formats commonly used to exchange data through files or over the network like the CSV format or the JSON format. On the other side, the student will learn about the development of applications able to gather data from the web through web scraping or web mining techniques. Finally, all these techniques will be put in practice by developing a couple of case-studies involving the extraction and the processing of data from real-world social networks. This part of the course will also feature a brief introduction to the MapReduce paradigm and to its reference implementation, Hadoop, useful for the elaboration of Big Data. The second part of the course starts with an introduction to the applications of big-data and Internet of Thing. The student will learn some peculiarity of these data and some useful pre-processing methods, but also other problematic aspects will be considered: privacy, selection bias, data security, data storage, the computing power. Moreover, the high dimensionality of data introduces unique computational and statistical challenges that must be faced. A dimensionality reduction of the original data matrix can be useful in the initial steps of analysis and the student will learn the more useful techniques, depending on the aim of the analysis. Companies are no longer satisfied to extract detailed information from their archives, but now require the application of complex predictive models (the analytics). Some predictive models very effective with big-data will be introduced: Forests, Gradient Boosting, Neural Networks; in order to apply these models, techiques to avoid the overfitting problem will be illustrated. Finally, the outlined strategies of analysis will be applied to real data sets of different type: numeric, textual and image. a
Date di inizio e termine delle attività didattiche	-
Modalità di erogazione	Tradizionale
Modalità di frequenza	Non obbligatoria
Metodi di valutazione	Prova scritta Prova orale