Skip to content

TheMrityunjayPathak/Netflix-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Netflix Data Analysis

Netflix is a leading entertainment platform offering a diverse global library of TV shows and movies.

This dataset offers a snapshot of its content, including titles, genres, ratings, durations, and more.

Dataset

The dataset used for this analysis is sourced from Kaggle and includes information on Netflix TV Shows and Movies.

Link to the Dataset : Netflix TV Shows and Movies

Problem Statement

  • To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.

Table of Contents

Setting up the Enviroment

Jupyter Notebook is required for this project and you can install and set it up in the terminal.

  • Install the Notebook
pip install notebook
  • Run the Notebook
jupyter notebook

Libraries required for the Project

NumPy

  • Go to the terminal and run this code
pip install numpy

Pandas

  • Go to the terminal and run this code
pip install pandas

Matplotlib

  • Go to the terminal and run this code
pip install matplotlib

Seaborn

  • Go to the terminal and run this code
pip install seaborn

Getting Started

  • Clone this repository to your local machine by using the following command :
git clone https://github.com/TheMrityunjayPathak/Netflix-Data-Analysis.git

Steps involved in the Project

Importing Libraries

  • Importing necessary libraries like numpy, pandas, matplotlib and seaborn.

Reading CSV File

  • Reading CSV file by using pd.read_csv() method.

Overview of the Dataset

  • Information about shape and size of the dataset.

  • Types of column present in the dataset (numerical, categorical, text).

  • Detailed info about the dataset using df.info() method.

Handling Null values in the Dataset

  • Filling the null values with most frequent category in categorical columns.

Changing DataType of Columns

  • Modifying the datatype of date_added column to pandas datetime format.

Utilizing existing information to create new Columns

  • Extracting year, month and dates from date_added column.

  • Splitting listed_in column based on (,) and selecting first value as genre.

  • Splitting cast column based on (,) and selecting first value as lead actor.

Splitting the Dataset

  • Splitting the dataset based on type of content (like TV Shows and Movies).

Statistical Analysis

  • No. of TV Shows and Movies available on Netflix.

  • No. of shows in each rating category.

  • No. of shows released each year.

Data Visualization

  • No. of TV Shows and Movies available on Netflix.

download

  • No. of shows in each Rating Category.

download

  • No. of shows uploaded on Netflix each year.

download

  • No. of shows uploaded on Netflix each month.

download

  • No. of shows uploaded on Netflix each day.

download

  • No. of shows available on Netflix in each country.

download

  • No. of Movies released on Netflix in each genre.

download

  • No. of TV Shows released on Netflix in each genre.

download

  • No. of Movies for a lead actor on Netflix.

download

  • No. of TV Shows for a lead actor on Netflix.

download

  • Avg. length of Movies in each genre.

download

  • Avg. length of TV Shows in each genre.

download

  • Distribution of length of Movies on Netflix.

download

  • Distribution of seasons of TV Shows on Netflix.

download

Conclusion

Here are some key findings about the analysis :

  • Cleaned and analyzed dataset of 8000+ Netflix Movies and TV Shows.

  • More than 60% of content on Netflix is rated for mature audiences.

    • Suggests that Netflix targets adult viewers to boost engagement and retention.
  • More than 25% of Movies and TV Shows are released on 1st day of the month.

    • Shows a consistent release schedule, likely to align with subscription cycles.
  • More than 40% of the content on Netflix is exclusive to United States.

    • Shows a strong focus on the U.S. market and content availability by location.
  • More than 20% of the content on Netflix falls under the "Drama" genre.

    • Confirms that "Drama" is a key part of Netflix's content library.
  • More than 23% of the content on Netflix was released in 2019 alone.

    • Indicates a major content push that year, possibly tied to growth or user acquisition goals.