-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Refactoring old run_swag.py #1004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uad in pytorch_transformers
Codecov Report
@@ Coverage Diff @@
## master #1004 +/- ##
=========================================
- Coverage 81.16% 80.77% -0.4%
=========================================
Files 57 57
Lines 8039 8092 +53
=========================================
+ Hits 6525 6536 +11
- Misses 1514 1556 +42
Continue to review full report at Codecov.
|
merge huggingface/master to update
roberta, xlnet for multiple choice
|
run_multiple_choice.py and utils_multiple_choice.py with roberta and xlnet have been tested on RACE, SWAG, ARC Challenge.
|
|
This looks really great. Thanks for updating and testing this script @erenup A few questions and remarks:
|
|
@thomwolf Thank you!
|
# Conflicts: # pytorch_transformers/__init__.py
Run multiple choice add doc
|
Hi @thomwolf, Docstrings of the multiple-choice models have been added. An example of run_multiple_choice.py has been added in the README of examples. Thank you. |
|
|
||
| tr_loss += loss.item() | ||
| if (step + 1) % args.gradient_accumulation_steps == 0: | ||
| scheduler.step() # Update learning rate schedule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTorch scheduler.step() should be called after optimizer.step() (see pytorch/pytorch#20124)
|
Ok this looks clean and almost ready to merge, just added a quick comment to fix in the code (order of calls to step). A few things for the merge as we have re-organized the examples folder, can you:
|
…hoice_merge # Conflicts: # examples/contrib/run_swag.py
# Please enter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit.
Run multiple choice merge
|
Hi @thomwolf. I have moved run_multiple_choice.py and utils_multiple_choice.py to examples, run_swag.py to example/contrib and scheduler.step after optimizer.step. I have also done a test of the example/contrib/run_swag.py on current pytorch-transformers. run_swag.py can get a normal result of dev 0.809 of bert-base-uncased model. Thank you. |
|
Awesome, thanks a lot for this contribution @erenup 🔥 |
Could you share your run -configuration on RACE and ARC dataset? line 638, in create_examples KeyError: 'para' |
|
Hi, @PantherYan For ARC, you need to ask ai2 for the retrieved text named |
Thanks a lot for your prompt reply! Appreciate! For the ARC. Thanks, I have write a email to AI2 for the help. Thank you! |
Thank you for your sharing your training configuration to guid us. I used the pytorch backend, and strictly following your configure setting, except roberta-base and the batch_size= 2(per_gpu_train_batch_size)*4(gpu_num) , which you set [ train_batch_size=8]. In other words, you setting batch_size = 8, and my setting batch_size =2.
data/nlp/MCQA/RACE/cached_test_roberta-base_384_race
11/01/2019 00:31:22 - INFO - transformers.configuration_utils - Configuration saved in models_race/roberta-base/checkpoint-12000/config.json @erenup Could I learn your training loss and test loss after 5 epochs? |
|
Hi @PantherYan I did not run race dataset with roberta base. In my experience, I thought the results of RACE with roberta base make sense, Since Bert large can only reach about 71~72. You can check the leaderboard for reference. |
@erenup |
|
@erenup |
I also met the problem of missing item "para", have you got some methods for converting raw corpus? |
|
Please see PatherYan's comments and mine |
Pytorch-transformers! Nice work!
Refactoring old run_swag.py.
Motivation:
I have seen the swag PR1 #951 and related issues #931
According to @thomwolf 's comments on PR1, I think it's necessary to adopt code styles of run_squad.py in run_swag.py so that we can easily take advantage of the new powerful pytorch_transformers.
Changes:
I refactored the old run_swag.py following run_squad.py and tested it on bert_base_uncased pretrained model, on Tesla P100.
Tests:
export SWAG_DIR=/path/to/SWAG python -m torch.distributed.launch --nproc_per_node 1 run_swag.py \ --train_file SWAG_DIR/train.csv \ --predict_file SWAG_DIR/val.csv \ --model_type bert \ --model_name_or_path bert-base-uncased \ --max_seq_length 80 \ --do_train \ --do_eval \ --do_lower_case \ --output_dir ../models/swag_output \ --per_gpu_train_batch_size 32 \ --per_gpu_eval_batch_size 32 \ --learning_rate 2e-5 \ --gradient_accumulation_steps 2 \ --num_train_epochs 3.0 \ --logging_steps 200 \ --save_steps 200Results:
I have also tested the
--fp16and the acc is 0.801.Other args have been tested:
--evaluate_during_training,--eval_all_checkpoints,--overwrite_output_dir, `--overwrite_cache``.Things have not been tested: multi-gpu, distributed trianing. since I only have one gpu and one computer.
Questions:
It seems the performance is worse than the pytorch-pretrain-bert results. Is this gap of result normal (0.82 and 0.86)?
Future work:
I think it's good to add multiple choice model in XLnet since there are many multiple choice datasets such as RACE.
Thank you all!