BIG DATA ANALYTICS
(obiettivi)
Learning goals. The different techniques existing for Big Data management will be illustrated, with a particular emphasis on NoSQL databases. The course will also deal with the problem of collecting Big Data from various sources such as from the web or from the online social networks. This will require also the introduction of the different formats that are commonly used to encode unstructured, semi-structured and structured data and of the different techniques that can be used to automate their processing. Successively, pre-processing techniques, including denoising and imputation of missing data, will be considered. Then, the course will treat dimensionality reduction techniques, based on feature extraction and feature selection. Finally, some statistical learning models, supervised and unsupervised, for the analysis of Big Data, will be presented. Real-world problems will be addressed during the course using suitable software.
Knowledge and understanding. The student will learn as to apply some statistical learning techniques for dimensionality reduction, based on feature extraction and feature selection. Moreover, he will know and understand some powerful statistical learning models, supervised and unsupervised, to analyse Big Data.
Applying knowledge and understanding. The student will be able to manage Big Data collected from various sources. He will learn as to apply dimensionality reduction techniques, based on feature extraction and feature selection. Moreover, he will be able to choose and apply some powerful statistical learning models to analyse Big Data.
Making judgements. Students will develop critical skills through the application of a wide range of machine learning and statistical models. They also will develop the critical sense through the comparison between alternative solutions to the same problem obtained using different learning logics. They will learn to critically interpret the results obtained by applying the procedures to real data sets.
Communication skills. Students, through the study and execution of practical exercises, acquire the technical-scientific language of the discipline, which must be used appropriately in both the intermediate and final written tests and in the oral tests. Communication skills are also developed through group activities.
Learning skills. Students who pass the exam will have learned an analytical approach that allows them to tackle Big Data analysis with statistical models and machine learning methods.
|
Codice
|
1047773 |
Modulo: BIG DATA ANALYTICS
(obiettivi)
Learning goals. The different techniques existing for Big Data management will be illustrated, with a particular emphasis on NoSQL databases. The course will also deal with the problem of collecting Big Data from various sources such as from the web or from the online social networks. This will require also the introduction of the different formats that are commonly used to encode unstructured, semi-structured and structured data and of the different techniques that can be used to automate their processing. Successively, pre-processing techniques, including denoising and imputation of missing data, will be considered. Then, the course will treat dimensionality reduction techniques, based on feature extraction and feature selection. Finally, some statistical learning models, supervised and unsupervised, for the analysis of Big Data, will be presented. Real-world problems will be addressed during the course using suitable software.
Knowledge and understanding. The student will learn as to apply some statistical learning techniques for dimensionality reduction, based on feature extraction and feature selection. Moreover, he will know and understand some powerful statistical learning models, supervised and unsupervised, to analyse Big Data.
Applying knowledge and understanding. The student will be able to manage Big Data collected from various sources. He will learn as to apply dimensionality reduction techniques, based on feature extraction and feature selection. Moreover, he will be able to choose and apply some powerful statistical learning models to analyse Big Data.
Making judgements. Students will develop critical skills through the application of a wide range of machine learning and statistical models. They also will develop the critical sense through the comparison between alternative solutions to the same problem obtained using different learning logics. They will learn to critically interpret the results obtained by applying the procedures to real data sets.
Communication skills. Students, through the study and execution of practical exercises, acquire the technical-scientific language of the discipline, which must be used appropriately in both the intermediate and final written tests and in the oral tests. Communication skills are also developed through group activities.
Learning skills. Students who pass the exam will have learned an analytical approach that allows them to tackle Big Data analysis with statistical models and machine learning methods.
|
Lingua
|
ENG |
Tipo di attestato
|
Attestato di profitto |
Crediti
|
3
|
Settore scientifico disciplinare
|
INF/01
|
Ore Aula
|
24
|
Ore Studio
|
-
|
Attività formativa
|
Attività formative affini ed integrative
|
Canale Unico
Docente
|
FERRARO PETRILLO UMBERTO
(programma)
The first part of the course will provide an introduction to the Python programming language. This will include the following notions: basic syntax of the language; functions; modules; data structures; I/O management. These notions will be used, in turn, to face more complex tasks related to the analytics scenario. On a side, the student will be taught about the different formats commonly used to exchange data through files or over the network like the CSV format or the JSON format. On the other side, the student will learn about the development of applications able to gather data from the web through web scraping or web mining techniques. Finally, all these techniques will be put in practice by developing a couple of case-studies involving the extraction and the processing of data from real-world social networks. This part of the course will also feature a brief introduction to the MapReduce paradigm and to its reference implementation, Hadoop, useful for the elaboration of Big Data.
The second part of the course starts with an introduction to the applications of big-data and Internet of Thing. The student will learn some peculiarity of these data and some useful pre-processing methods, but also other problematic aspects will be considered: privacy, selection bias, data security, data storage, the computing power. Moreover, the high dimensionality of data introduces unique computational and statistical challenges that must be faced. A dimensionality reduction of the original data matrix can be useful in the initial steps of analysis and the student will learn the more useful techniques, depending on the aim of the analysis. Companies are no longer satisfied to extract detailed information from their archives, but now require the application of complex predictive models (the analytics). Some predictive models very effective with big-data will be introduced: Forests, Gradient Boosting, Neural Networks; in order to apply these models, techiques to avoid the overfitting problem will be illustrated. Finally, the outlined strategies of analysis will be applied to real data sets of different type: numeric, textual and image.
a
|
Date di inizio e termine delle attività didattiche
|
- |
Modalità di erogazione
|
Tradizionale
|
Modalità di frequenza
|
Non obbligatoria
|
Metodi di valutazione
|
Prova scritta
Prova orale
|
|
|
Modulo: BIG DATA ANALYTICS
(obiettivi)
Learning goals. The different techniques existing for Big Data management will be illustrated, with a particular emphasis on NoSQL databases. The course will also deal with the problem of collecting Big Data from various sources such as from the web or from the online social networks. This will require also the introduction of the different formats that are commonly used to encode unstructured, semi-structured and structured data and of the different techniques that can be used to automate their processing. Successively, pre-processing techniques, including denoising and imputation of missing data, will be considered. Then, the course will treat dimensionality reduction techniques, based on feature extraction and feature selection. Finally, some statistical learning models, supervised and unsupervised, for the analysis of Big Data, will be presented. Real-world problems will be addressed during the course using suitable software.
Knowledge and understanding. The student will learn as to apply some statistical learning techniques for dimensionality reduction, based on feature extraction and feature selection. Moreover, he will know and understand some powerful statistical learning models, supervised and unsupervised, to analyse Big Data.
Applying knowledge and understanding. The student will be able to manage Big Data collected from various sources. He will learn as to apply dimensionality reduction techniques, based on feature extraction and feature selection. Moreover, he will be able to choose and apply some powerful statistical learning models to analyse Big Data.
Making judgements. Students will develop critical skills through the application of a wide range of machine learning and statistical models. They also will develop the critical sense through the comparison between alternative solutions to the same problem obtained using different learning logics. They will learn to critically interpret the results obtained by applying the procedures to real data sets.
Communication skills. Students, through the study and execution of practical exercises, acquire the technical-scientific language of the discipline, which must be used appropriately in both the intermediate and final written tests and in the oral tests. Communication skills are also developed through group activities.
Learning skills. Students who pass the exam will have learned an analytical approach that allows them to tackle Big Data analysis with statistical models and machine learning methods.
|
Lingua
|
ENG |
Tipo di attestato
|
Attestato di profitto |
Crediti
|
3
|
Settore scientifico disciplinare
|
SECS-S/01
|
Ore Aula
|
24
|
Ore Studio
|
-
|
Attività formativa
|
Attività formative affini ed integrative
|
Canale Unico
Docente
|
DI CIACCIO AGOSTINO
(programma)
The first part of the course will provide an introduction to the Python programming language. This will include the following notions: basic syntax of the language; functions; modules; data structures; I/O management. These notions will be used, in turn, to face more complex tasks related to the analytics scenario. On a side, the student will be taught about the different formats commonly used to exchange data through files or over the network like the CSV format or the JSON format. On the other side, the student will learn about the development of applications able to gather data from the web through web scraping or web mining techniques. Finally, all these techniques will be put in practice by developing a couple of case-studies involving the extraction and the processing of data from real-world social networks. This part of the course will also feature a brief introduction to the MapReduce paradigm and to its reference implementation, Hadoop, useful for the elaboration of Big Data.
The second part of the course starts with an introduction to the applications of big-data and Internet of Thing. The student will learn some peculiarity of these data and some useful pre-processing methods, but also other problematic aspects will be considered: privacy, selection bias, data security, data storage, the computing power. Moreover, the high dimensionality of data introduces unique computational and statistical challenges that must be faced. A dimensionality reduction of the original data matrix can be useful in the initial steps of analysis and the student will learn the more useful techniques, depending on the aim of the analysis. Companies are no longer satisfied to extract detailed information from their archives, but now require the application of complex predictive models (the analytics). Some predictive models very effective with big-data will be introduced: Forests, Gradient Boosting, Neural Networks; in order to apply these models, techniques to avoid the overfitting problem will be illustrated. Finally, the outlined strategies of analysis will be applied to real data sets of different type: numeric, textual and image.
- How to Think Like a Computer Scientist: Learning with Python 3. 3rd Edition (Peter Wentworth, Jeffrey Elkner, Allen B. Downey and Chris Meyers) - Deep Learning with Python (F. Chollet) - An Introduction to Statistical Learning (G. James. D. Witten, T. Hastie, R. Tibshirani) - Texts and notes provided by the teachers
|
Date di inizio e termine delle attività didattiche
|
- |
Date degli appelli
|
Date degli appelli d'esame
|
Modalità di erogazione
|
Tradizionale
|
Modalità di frequenza
|
Non obbligatoria
|
Metodi di valutazione
|
Prova scritta
Prova orale
Valutazione di un progetto
|
|
|
|