Text-Data-Normalisation

This is done according to the Screening Test of Code Vector

Task 1

Completed the download of 50 LinkedIn profiles, that are saved in the LinkedInProfiles folder of the repository.

converter.py contains the script for task 2,3.

Task 2

Extracted the text from the first line and saved it in the first column of the task file

Task 3

Extracted 10 most frequently used words from every profile's data, excluding stopwords.
Used tf-idf to scoring methodology to score every word, and return the 10 most important words(excluding stopwords) i.e. words with highest tfidf score, where tfidf - term frequency & Inverse document frequency.

Functions curated for tfidf implementation have been extracted to tfidf.py. Install the requirements.txt in your environment and Run python converter.py to generate the required task file. The code has been commented for explanation.

Task 4

I have used Django REST Framework to make the API. Navigate to the django project converter_api for accessing the same. The django app convert contains the required APIs. Run python manage.py runserver to use it in your local server. Make sure you have installed the requirements.

At 'pdf_to_text/' there exists the api which takes pdf_file in the input and returns the text within it as output.
At 'text_to_info/' there exists the api which takes text as input and returns the top 10 most frequent words, and the top 10 most important words in that piece of text. I have used tf-idf for the same, calculating idf with respect to data collected in Task-3.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
LinkedInProfiles		LinkedInProfiles
converter_api		converter_api
.gitignore		.gitignore
README.md		README.md
Screening_Test_-_AI_Champ_1.pdf		Screening_Test_-_AI_Champ_1.pdf
converter.py		converter.py
requirements.txt		requirements.txt
task.csv		task.csv
tfidf.py		tfidf.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text-Data-Normalisation

Task 1

Task 2

Task 3

Task 4

About

Uh oh!

Releases

Packages

Languages

priyanshisharma/Text-Data-Normalisation

Folders and files

Latest commit

History

Repository files navigation

Text-Data-Normalisation

Task 1

Task 2

Task 3

Task 4

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages