π DATASET INFORMATION
We used and compared various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type Demojized, Captions, Latitude, Longitude, Created Time, Sentiment, Sentiment Score, Full text where the Demojized column consists of raw tweet text with the emojis translated, Captions consisting the captions of the instagram post mentioned in the tweet, Full Text consisting whole demojized and captions which are further translated to English, Sentiment score is between -1 (negative) to 1 (positive), and Sentiment is the remark given within a certain range of sentiment score.
π REQUIREMENTS
There are some general library requirements for the project and some which are specific to individual methods. The general requirements are as follows.
numpypandasgoslatetextblobemoji
π± GENERATING DATASET
- Run
image_urls.ipynbon test data. This will generate a dataset which will consist all the redirected image urls. - Run
translate.ipynbon test data. This gives the demojized, captions, and the other necessary columns in the training dataset. - Run
sentiment.ipynbon training data. This will generate the sentiment score and remark of the full text.
After the above steps, you should have two extra files for each test dataset: <Name>_captions.csv <Name>_Data.csv which are the image urls and final dataset respectively.
... In Progress