Docente
|
DI CIACCIO AGOSTINO
(programma)
The first part of the course will provide an introduction to the Python programming language. This will include the following notions:
basic syntax of the language; functions; modules; data structures; I/O management.
These notions will be used, in turn, to face more complex tasks related to the analytics scenario. On a side, the student will be taught about the different formats commonly used to exchange data through files or over the network like the CSV format or the JSON format. On the other side, the student will learn about the development of applications able to gather data from the web through web scraping or web mining techniques. Finally, all these techniques will be put in practice by developing a couple of case-studies involving the extraction and the processing of data from real-world social networks. This part of the course will also feature a brief introduction to the MapReduce paradigm and to its reference implementation, Hadoop, useful for the elaboration of Big Data.
The second part of the course starts with an introduction to the applications of big-data and Internet of Thing. The student will learn some peculiarity of these data and some useful pre-processing methods, but also other problematic aspects will be considered: privacy, selection bias, data security, data storage, the computing power. Moreover, the high dimensionality of data introduces unique computational and statistical challenges that must be faced. A dimensionality reduction of the original data matrix can be useful in the initial steps of analysis and the student will learn the more useful techniques, depending on the aim of the analysis. Companies are no longer satisfied to extract detailed information from their archives, but now require the application of complex predictive models (the analytics).
Some predictive models very effective with big-data will be introduced: Forests, Gradient Boosting, Neural Networks; in order to apply these models, techniques to avoid the overfitting problem will be illustrated. Finally, the outlined strategies of analysis will be applied to real data sets of different type: numeric, textual and image.
- How to Think Like a Computer Scientist: Learning with Python 3. 3rd Edition (Peter Wentworth, Jeffrey Elkner, Allen B. Downey and Chris Meyers)
- Deep Learning with Python (F. Chollet)
- An Introduction to Statistical Learning (G. James. D. Witten, T. Hastie, R. Tibshirani)
- Texts and notes provided by the teachers
|