This is a supporting codebase for data analysis courses in MS in Business Analytics at the Central European University.
The codes follow and supplement the textbook Data Analysis for Business, Economics, and Policy by Gábor Békés (CEU) and Gábor Kézdi (U. Michigan), Cambridge University Press, 2021.
Note: this is an alternative to the book's original coding material: da-coding-python.
The reason for this repo was to create a downloadable coding course content which any instructor can rely on to follow the book's topic, almost on a chapter-by-chapter basis.
The content is broken down to essentially two parts.
- Classes 00-06 introduce major concepts in Python, including using Python on the user's laptop. These classes do not follow the book's first six chatpers, rather, as students are picking up the theory, it is preparing them for using Python's tools for data analytics. Class 06 is a standalone analytical exercise, the topic of which (Premier League 2021/22 matches) is not covered in the book.
- Classes 07-18 rigorously follow the book's syllabus. These classes take the reader through the analytical workflow covered in the book. Most (but not all) of the chart and table outputs reproduce visualizations in the book.
The course material does not require any previous knowledge of Python or computer programming in general. The first seven classes introduce basic programming concepts and their implementations in Python. These classes won't make anyone a programmer but completing them will prepare the user for his/her data analysis journey in Python.
It is difficult to interpret the classes by themselves without reading and undertanding the underlyng course material in Data Analysis for Business, Economics, and Policy. By reading the codebase and the book's chapters in a paralell fashion, nevertheless, will help the reader put the pieces in their places.
Class | Main points & learning outcomes |
---|---|
Class 0 | How to setup the environment; Python, Jupyter notebooks, virtual environments |
Class 1 | coding principles, variables, basis operations |
Class 2 | files & file system, time-related variables, error handling |
Class 3 | user-defined functions, numpy, classes |
Class 4 | Pandas dataframes |
Class 5 | charts |
Class 6 | How to do data analysis in Python? A hands-on exercise |
Class 7 | simple linear regression |
Class 8 | compicated regression patterns: non-linear relationships and data transformations |
Class 9 | generalzing regression results, comparing optional regression models, visualizations and output |
Class 10 | multiple linear regression, using statmsmodels and stargazer for tidy regression outputs |
Class 11 | modelling probabilities, assessing their performance, calculating marginal effects |
Class 12 | time series regressions, intricacies of time series modelling, handling seasonal patterns |
Class 13 | a framewrok for comparing and presenting multiple linear regression model options |
Class 14 | first venture into machine learning: lasso , cross-validation, and the scikit-learn library |
Class 15 | building and visualizing classification and regression trees |
Class 16 | random forest models, introducing gridsearch and hyperparameter tuning |
Class 17 | estimating probablities using linear and non-linear models (logit ), introducing AUC and the ROC curve |
Class 18 | modelling deterministic and stochastic time series, ARIMA, fbprophet |
Feel free to use the codes as you wish, in their current or in any modified form.