"Real or NOT" : NLP to Identify Public Emergency Related Tweets (Part 1)

Introduction

Being able to predict which Tweets are about real Public Emergencies (eg Earthquakes, Floods, Terrorist Events) and which ones are not.

(The words 'Pubic Emergency' and 'Disasters' have been used interchangeably)

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster.

Source: Real or Not? NLP with Disaster Tweets, A Kaggle Competition

The dataset has the contents of the Tweet (text variable), the location from where the Tweet was posted (location variable), and a keyword associated with the Tweet (keyword variable).

The model will be built based on the contents of the Tweet, as from a domain standpoint, the location and the keyword information may not always be available (in the situation this algorithm is actually deployed)

Goal

This is the second exploration of the dataset. In the 1st attempt, I had used an ML based approach (an ensemble of ensemble methods) yielded 80% accuracy on the test data.

The goal is to get a better model (~83%) using one or an ensemble of multiple Deep Learning models.

Approach & Results

In this notebook, I have tried out two variants of LSTM based approaches - one with pretrained embeddings and one without. The model which incorporate pre-trained embeddings from the Glove model had an accuracy of 83% on the test set.

I have also created a set of helper functions for data preprocessing, vocabulary building and creating embedding matrix from pre-trained embeddings.

The best performing model had an accuracy of 83% in the test data

Next Steps

The next step is to get better results by trying out a few more approaches which have been listed below. These will be incorporated in the Second edition (Part 2) of the notebook.

An Ensemble model (dense - word focussed + lstm - sequence focussed)
An N-GRAM model (especially n = 2)
Using Attention based frameworks
(Optional) A Functional model (using Keras functional API)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
9. Real_or_Not_(NLP).ipynb		9. Real_or_Not_(NLP).ipynb
README.md		README.md
Twitter_for_PublicEmergencies_Part1.ipynb		Twitter_for_PublicEmergencies_Part1.ipynb
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

"Real or NOT" : NLP to Identify Public Emergency Related Tweets (Part 1)

Introduction

Goal

Approach & Results

Next Steps

About

Uh oh!

Releases

Packages

Languages

raamav/Text-Classification

Folders and files

Latest commit

History

Repository files navigation

"Real or NOT" : NLP to Identify Public Emergency Related Tweets (Part 1)

Introduction

Goal

Approach & Results

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages