GitHub - LucDoh/CrowdRank: Crowdsourcing product recommendations from Reddit using NLP.

Description

CrowdRank is a package and app for interpreting community sentiments about brands and products from Reddit data. From a single product name (headphones, computers, laptops, tvs), CrowdRank skims through thousands of relevant comments, aggregates and scores the results, and gives back a ranking of the best brands in that product space.

Stack: Python, AWS EC2 & S3, JSON
NLP Models: SpaCy [Named Entity Recognition], VADER [Sentiment Analysis].
Packages: Requests, Streamlit, Pandas, Fuzzywuzzy...

This project was completed in 4 weeks as an AI Fellow at Insight Data Science, more information in these slides.

Data

Queried with Pushshift's API which indexes over 4 billion comments, dating back to 2007.

The web app stores comment and post data on S3, while the package stores data locally. Brand rankings exist for 9 products (headphones, laptops, computers, monitors, TVs, keyboards, mice) which are inferred from 35K+ comments across almost 20 subreddits.

Motivation

210 million Americans shop online every year and 80% of them do research before purchasing an item. There are 1000s of products in the same category (e.g. Wireless Headphones) with over 4 stars, making it almost impossible to sort through them. What if we could tap into the collective knowledge of communities, to help users quickly find the best brands and products?

Web app

Run the web app inside /scripts:

streamlit run crowdrank_app.py

Or simply use the app here:

http://54.177.99.61:8501

(hosted on an EC2 instance with data on S3)

Package

Run the package to see the best brands of a product (e.g. laptops):

python crowdrank_simple.py laptops

(inside scripts)

CrowdRank has 3 main modules: ingester, interpreter, postprocessing, which do the bulk of the work. Supporting functions are in the helpers and visualizer modules. The above script will be replaced by a single call to ranker:

from crowdrank import ranker  
df_ranking = ranker.rank('laptops')

Installing

Clone the repository:

git clone https://github.com/LucDoh/CrowdRank.git

Make subdirectories within data:

mkdir comment_data interpreted_data submission_data results

Install requirements:

pip install -r requirements.txt  
python -m spacy download en_core_web_md

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
crowdrank		crowdrank
data		data
notebooks		notebooks
scripts		scripts
static		static
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description

Data

Motivation

Web app

Package

Installing

About

Uh oh!

Releases

Packages

Languages

License

LucDoh/CrowdRank

Folders and files

Latest commit

History

Repository files navigation

Description

Data

Motivation

Web app

Package

Installing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages