Experiments on RNN, LSTM etc Please have a look at the following word2vector playgrounds:
File: rossmann_timeSeries_noExtData.ipynb
Objectives:
i) Feature engineering on Time Series data Elapsed-event-time and Rolling summaries ii)Categorical embeddings iii) Using fastai on tabular data iv) Understanding 1-cycle policy
File: 2_simple_rnn_IMDB.ipynb
Objective(s):
A. Familiarising with Document processing using gensim. B. Convert tokens with each document to corresponding 'token-ids' or integer-tokens. (For text cleaning, pl refer wikiclustering file in folder: 10.nlp_workshop/text_clustering) (Keras also has Tokenizer class that can also be used for integer-tokenization. See file: 8.rnn/3.keras_tokenizer_class.py nltk can also tokenize. See file: 10.nlp_workshop/word2vec/nlp_workshop_word2vec.py) C. Creating a Bag-of-words model D. Discovering document similarity
File: tf data API
Objectives:
i) Learning to work with tensors<br> ii) Learning to work with tf.data API<br> iii) Text Classification--Work in progess<br>
File: textClassification_bidirectional_LSTM.ipynb
Objectives:
i) Learning to work with tfds.load
ii) Learning to work with tf.data API
iii) Text Classification--WORK IN PROGRESS
(but works perfectly)
File: 0_basic_document_processing.ipynb
Objective(s):
A. Familiarising with Document processing using gensim. B. Convert tokens with each document to corresponding 'token-ids' or integer-tokens. (For text cleaning, pl refer wikiclustering file in folder: 10.nlp_workshop/text_clustering) (Keras also has Tokenizer class that can also be used for integer-tokenization. See file: 8.rnn/3.keras_tokenizer_class.py nltk can also tokenize. See file: 10.nlp_workshop/word2vec/nlp_workshop_word2vec.py) C. Creating a Bag-of-words model D. Discovering document similarity
File: kingMinusWoman.ipynb
Objective(s)
Experimentation with pre-created word2vec file
Works with gensim 3.8.3
Test:paris− france+germany (should be close to Berlin) bought - bring + seek (should be close to sought)