Replicated Results

Unofficial PyTorch implementation of

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators by Kevin Clark. Minh-Thang Luong. Quoc V. Le. Christopher D. Manning

Replicated Results

I pretrain ELECTRA-small from scratch and has successfully replicate the paper's results on GLUE.

Model	CoLA	SST	MRPC	STS	QQP	MNLI	QNLI	RTE	Avg.
ELECTRA-Small	54.6	89.1	83.7	80.3	88.0	79.7	87.7	60.8	78.0
ELECTRA-Small (my)	57.2	87.1	82.1	80.4	88	78.9	87.9	63.1	78.08

Results for models on the GLUE test set.

Features of this implementation

You don't need to download and process datasets manually, the scirpt take care those for you automatically. (Thanks to huggingface/nlp and hugginface/transformers)
AFAIK, the closest reimplementation to the original one, taking care of many easily overlooked details (described below).
AFAIK, the only one successfully validate itself by replicating the results in the paper.
Comes with jupyter notebooks, which you can explore the code and inspect the processed data.
You don't need to download and preprocess anything by yourself, all you need is running the training script.

Usage

Note: This project is actually for my personal research. So I didn't trying to make it easy to use for all users, but trying to make it easy to read and modify.

Install requirements

pip install fastai nlp transformers hugdatafast

Steps

python pretrain.py
set pretrained_checkcpoint in finetune.py to use the checkpoint you've pretrained and saved in electra_pytorch/checkpoints/pretrain.
python finetune.py (with do_finetune set to True)
Go to neptune, pick the best run of 10 runs for each task, and set th_runs in finetune.py according to the numbers in the names of runs you picked.
python finetune.py (with do_finetune set to False), this outpus predictions on testset, you can then compress and send .tsvs in electra_pytorch/test_outputs/<group_name>/*.tsv to GLUE site to get test score.

Notes

I didn't use CLI arguments, so configure options enclosed within MyConfig in the python files to your needs before run them. (There're comments below it showing the options for vanilla settings)
You will need a Neptune account and create a neptune project on the website to record GLUE finetuning results. Don't forget to replace richarddwang/electra-glue with your neptune project's name
The python files pretrian.py, finetune.py are in fact converted from Pretrain.ipynb and Finetune_GLUE.ipynb. You can also use those notebooks to explore ELECTRA training and finetuning.

Advanced Details

Below lists the details of the original implementation/paper that are easy to be overlooked and I have taken care of. I found these details are indispensable to successfully replicate the results of the paper.

Use Adam optimizer without bias correction (bias correction is default for Pytorch and fastai Adam optimizer)
There is a bug in how original implementation decays learning rates through layers. See _get_layer_lrs
Use clip gradient
For MRPC and STS tasks, it appends the same dataset with swapped sentence1 and sentence2 to the original dataset, and call it "double_unordered"
For pretraing data preprocessing, it concat and truncate setences to fit the max length, and stop concating when it comes to the end of a document.
For pretraing data preprocessing, it by chance split the text into sentence A and sentence B, and also by chance change the max length
For finetuning data preprocessing, it follow BERT's way to truncate the longest one of sentence A and B to fit the max length
Use gradient clipping
The output layer is initialized by Tensorflow v1's default initialization which is xavier
It use gumbel softmax to sample generations from geneartor
It didn't mask like BERT, but mask for [MASK] for 85% and 15% remains the same
It didn't do warmup and then do linear decay but do them together, which means the learning rate warmups and decays at the same time when warming up. See here
It use a dropout and a linear layer for GLUE output layer, not what ElectraClassificationHead uses.
It didn't tie input and output embeddings for its generator, which is a common practice applied by many model.
It tie not only word/pos/token type embeddings but also layer norm in embedding layer, for generator and discriminator.
All public model of ELECTRA checkpoints are actually ++ model. See this issue
It downscales generator by hidden_size, number of attention heads, and intermediate size, but not number of layers.

File architecture

If you pretrain, finetune, and generate test results. electra_pytorch will generate these for you.

project root
|
|── datasets
|   |── glue
|       |── <task>
|       ...
|
|── checkpoints
|   |── pretrain
|   |   |── <base_run_name>_<seed>_<percent>.pth
|   |    ...
|   |
|   |── glue
|       |── <group_name>_<task>_<ith_run>.pth
|       ...
|
|── test_outputs
|   |── <group_name>
|   |   |── CoLA.tsv
|   |   ...
|   | 
|   | ...

Citation

@misc{clark2020electra,
    title={ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators},
    author={Kevin Clark and Minh-Thang Luong and Quoc V. Le and Christopher D. Manning},
    year={2020},
    eprint={2003.10555},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
_utils		_utils
.gitignore		.gitignore
Finetune_GLUE.ipynb		Finetune_GLUE.ipynb
Pretrain.ipynb		Pretrain.ipynb
README.md		README.md
finetune.py		finetune.py
pretrain.py		pretrain.py
wsc.ipynb		wsc.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Replicated Results

Features of this implementation

Usage

Install requirements

Steps

Notes

Advanced Details

File architecture

Citation

About

Uh oh!

Releases

Packages

Languages

sarwar187/electra_pytorch

Folders and files

Latest commit

History

Repository files navigation

Replicated Results

Features of this implementation

Usage

Install requirements

Steps

Notes

Advanced Details

File architecture

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages