Fix ineffective no_decay bug when using BERTAdam #32

xiaoda99 · 2018-11-18T08:21:37Z

With the original code, all parameters are decayed because the condition "parameter_name in no_decay" will never be satisfied.

thomwolf · 2018-11-20T09:11:51Z

thanks!

Fix ineffective no_decay bug when using BERTAdam

xelibrion · 2019-05-13T12:22:36Z

Question - wouldn't .named_parameters() for the model return a tuple (name, param_tensor), where name looks similar to these

['bert.embeddings.word_embeddings.weight',
 'bert.embeddings.position_embeddings.weight',
 'bert.embeddings.token_type_embeddings.weight',
 'bert.embeddings.LayerNorm.weight',
 'bert.embeddings.LayerNorm.bias',
 'bert.encoder.layer.0.attention.self.query.weight',
 'bert.encoder.layer.0.attention.self.query.bias',
 'bert.encoder.layer.0.attention.self.key.weight',
 'bert.encoder.layer.0.attention.self.key.bias',
 'bert.encoder.layer.0.attention.self.value.weight',
 'bert.encoder.layer.0.attention.self.value.bias',
 'bert.encoder.layer.0.attention.output.dense.weight',
 'bert.encoder.layer.0.attention.output.dense.bias',
 'bert.encoder.layer.0.attention.output.LayerNorm.weight',
 'bert.encoder.layer.0.attention.output.LayerNorm.bias',
...
...
'classifier.linear.weight',
'classifier.linear.bias']

therefore requiring slightly smarter conditions than just in? Something along the lines?

[p for n, p in param_optimizer if any(True for x in no_decay if n.endswith(x))]

artemisart · 2018-11-19T09:17:01Z

examples/run_classifier.py

    optimizer_grouped_parameters = [
-        {'params': [p for n, p in param_optimizer if n not in no_decay], 'weight_decay_rate': 0.01},
-        {'params': [p for n, p in param_optimizer if n in no_decay], 'weight_decay_rate': 0.0}
+        {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay_rate': 0.01},


I think all(nd not in n for nd in no_decay) would be clearer

xelibrion · 2019-05-14T00:10:58Z

Don't mind my comment, tested it further this morning and everything seems to work as expected!

…ave_pretrained fix save_pretrained test

* Update trainer and model flows to accommodate sparseml Disable FP16 on QAT start (huggingface#12) * Override LRScheduler when using LRModifiers * Disable FP16 on QAT start * keep wrapped scaler object for training after disabling Using QATMatMul in DistilBERT model class (huggingface#41) Removed double quantization of output of context layer. (huggingface#45) Fix DataParallel validation forward signatures (huggingface#47) * Fix: DataParallel validation forward signatures * Update: generalize forward_fn selection Best model after epoch (huggingface#46) fix sclaer check for non fp16 mode in trainer (huggingface#38) Mobilebert QAT (huggingface#55) * Remove duplicate quantization of vocabulary. enable a QATWrapper for non-parameterized matmuls in BERT self attention (huggingface#9) * Utils and auxillary changes update Zoo stub loading for SparseZoo 1.1 refactor (huggingface#54) add flag to signal NM integration is active (huggingface#32) Add recipe_name to file names * Fix errors introduced in manual cherry-pick upgrade Co-authored-by: Benjamin Fineran <[email protected]>

ra

Remove tied weights warning

Fix ineffective no_decay bug

6c4789e

xiaoda99 changed the title ~~Fix ineffective no_decay bug when using BertAdam~~ Fix ineffective no_decay bug when using BERTAdam Nov 18, 2018

xiaoda99 mentioned this pull request Nov 18, 2018

[Bug report] Ineffective no_decay when using BERTAdam #33

Closed

thomwolf merged commit 061eeca into huggingface:master Nov 20, 2018

qwang70 pushed a commit to DRL36/pytorch-pretrained-BERT that referenced this pull request Mar 2, 2019

Merge pull request huggingface#32 from xiaoda99/master

18c4841

Fix ineffective no_decay bug when using BERTAdam

artemisart reviewed May 13, 2019

View reviewed changes

maeotaku mentioned this pull request May 23, 2019

bert->onnx ->caffe2 weird error #633

Closed

fabrahman mentioned this pull request Oct 9, 2019

How is it possible to furthur tune gpt-2(or gpt) in a seq2seq manner? #1464

Closed

SaulLu pushed a commit to SaulLu/transformers that referenced this pull request Jan 11, 2022

Merge pull request huggingface#32 from SaulLu/modeling_markuplm_fix_s…

397e515

…ave_pretrained fix save_pretrained test

jameshennessytempus pushed a commit to jameshennessytempus/transformers that referenced this pull request Jun 1, 2023

Merge pull request huggingface#32 from huggingface/main

e6f966b

ra

lwmlyy mentioned this pull request Aug 15, 2023

add util for ram efficient loading of model when using fsdp #25107

Merged

1 task

ArthurZucker added a commit that referenced this pull request Apr 5, 2025

Merge pull request #32 from huggingface/remove-warning

4eabf8f

Remove tied weights warning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ineffective no_decay bug when using BERTAdam #32

Fix ineffective no_decay bug when using BERTAdam #32

Uh oh!

xiaoda99 commented Nov 18, 2018

Uh oh!

thomwolf commented Nov 20, 2018

Uh oh!

xelibrion commented May 13, 2019

Uh oh!

artemisart Nov 19, 2018

Uh oh!

xelibrion commented May 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix ineffective no_decay bug when using BERTAdam #32

Fix ineffective no_decay bug when using BERTAdam #32

Uh oh!

Conversation

xiaoda99 commented Nov 18, 2018

Uh oh!

thomwolf commented Nov 20, 2018

Uh oh!

xelibrion commented May 13, 2019

Uh oh!

artemisart Nov 19, 2018

Choose a reason for hiding this comment

Uh oh!

xelibrion commented May 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants