Skip to content

peterduronelly/python-for-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python For Data Analysis

This is a supporting codebase for data analysis courses in MS in Business Analytics at the Central European University.

Overview

The codes follow and supplement the textbook Data Analysis for Business, Economics, and Policy by Gábor Békés (CEU) and Gábor Kézdi (U. Michigan), Cambridge University Press, 2021.

Note: this is an alternative to the book's original coding material: da-coding-python.

The reason for this repo was to create a downloadable coding course content which any instructor can rely on to follow the book's topic, almost on a chapter-by-chapter basis.

The content is broken down to essentially two parts.

  • Classes 00-06 introduce major concepts in Python, including using Python on the user's laptop. These classes do not follow the book's first six chatpers, rather, as students are picking up the theory, it is preparing them for using Python's tools for data analytics. Class 06 is a standalone analytical exercise, the topic of which (Premier League 2021/22 matches) is not covered in the book.
  • Classes 07-18 rigorously follow the book's syllabus. These classes take the reader through the analytical workflow covered in the book. Most (but not all) of the chart and table outputs reproduce visualizations in the book.

How to use

The course material does not require any previous knowledge of Python or computer programming in general. The first seven classes introduce basic programming concepts and their implementations in Python. These classes won't make anyone a programmer but completing them will prepare the user for his/her data analysis journey in Python.

It is difficult to interpret the classes by themselves without reading and undertanding the underlyng course material in Data Analysis for Business, Economics, and Policy. By reading the codebase and the book's chapters in a paralell fashion, nevertheless, will help the reader put the pieces in their places.

Course content

Class        Main points & learning outcomes
Class 0 How to setup the environment; Python, Jupyter notebooks, virtual environments
Class 1 coding principles, variables, basis operations
Class 2 files & file system, time-related variables, error handling
Class 3 user-defined functions, numpy, classes
Class 4 Pandas dataframes
Class 5 charts
Class 6 How to do data analysis in Python? A hands-on exercise
Class 7 simple linear regression
Class 8 compicated regression patterns: non-linear relationships and data transformations
Class 9 generalzing regression results, comparing optional regression models, visualizations and output
Class 10 multiple linear regression, using statmsmodels and stargazer for tidy regression outputs
Class 11 modelling probabilities, assessing their performance, calculating marginal effects
Class 12 time series regressions, intricacies of time series modelling, handling seasonal patterns
Class 13 a framewrok for comparing and presenting multiple linear regression model options
Class 14 first venture into machine learning: lasso, cross-validation, and the scikit-learn library
Class 15 building and visualizing classification and regression trees
Class 16 random forest models, introducing gridsearch and hyperparameter tuning
Class 17 estimating probablities using linear and non-linear models (logit), introducing AUC and the ROC curve
Class 18 modelling deterministic and stochastic time series, ARIMA, fbprophet

Note

Feel free to use the codes as you wish, in their current or in any modified form.

About

Supporting codebase for data analysis courses in MS in Business Analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published