Pull master #1

shoarora · 2020-03-30T04:11:00Z

No description provided.

…ion_penalty_in_tf_generate fix repetition penalty mask in tf

* Minimal example * Proposal 2 * Proposal 2 for fast tokenizers * Typings * Docs * Revert "Docs" for easier review This reverts commit eaf0f97. * Remove unnecessary assignments * Tests * Fix faulty type * Remove prints * return_outputs -> model_input_names * Revert "Revert "Docs" for easier review" This reverts commit 6fdc694. * code quality

* 1. seqeval required by ner pl example. install from examples/requirements. 2. unrecognized arguments: save_steps * pl checkpoint callback filenotfound error: make directory and pass * huggingface#3159 pl checkpoint path difference * 1. Updated Readme for pl 2. pl script now also correct displays logs 3. pass gpu ids compared to number of gpus * Updated results in readme * 1. updated readme 2. removing deprecated pl methods 3. finalizing scripts * comment length check * using deprecated validation_end for stable results * style related changes

…tion_tests_lm_generate_torch_tf Add integration tests lm generate torch tf

- Fix path of tokenizer - Clarify that the model is not trained on the evaluation set

- Clarify that the model is not trained on the evaluation dataset

@patrickvonplaten

Reference: huggingface#1778 (comment) cc @patrickvonplaten and @dreasysnail

Co-Authored-By: Thomas Wolf <[email protected]>

* add new default configs * change prefix default to None

* solve conflicts * move warnings below * incorporate changes * add pad_to_max_length to pipelines * add bug fix for T5 beam search * add prefix patterns * make style * fix conflicts * adapt pipelines for task specific parameters * improve docstring * remove unused patterns

* add bert bahasa readme * update readme * update readme * added xlnet

* fix merge conflicts * add t5 summarization example * change parameters for t5 summarization * make style * add first code snippet for translation * only add prefixes * add prefix patterns * make style * renaming * fix conflicts * remove unused patterns * solve conflicts * fix merge conflicts * remove translation example * remove summarization example * make sure tensors are in numpy for float comparsion * re-add t5 config * fix t5 import config typo * make style * remove unused numpy statements * update doctstring * import translation pipeline

* Add the missing token classification for XLM * fix styling * Add XLMForTokenClassification to AutoModelForTokenClassification class * Fix docstring typo for non-existing class * Add the missing token classification for XLM * fix styling * fix styling * Add XLMForTokenClassification to AutoModelForTokenClassification class * Fix docstring typo for non-existing class * Add missing description for AlbertForTokenClassification * fix styling * Add missing docstring for AlBert * Slow tests should be slow Co-authored-by: Sakares Saengkaew <[email protected]> Co-authored-by: LysandreJik <[email protected]>

* rebase to master * change tf to pytorch * change to pytorch * small fix * renaming * add gpu training possibility * renaming * improve README * incoorporate collins feedback * better Readme * better README.md

* add translation example * make style * adapt docstring * add gpu device as input for example * small renaming * better README

* delete lm_head, skips weight tying * Fixed s3

…face#3370)

* Dummy inputs to model.device * Move self.device to ModuleUtilsMixin

…gingface#3369)

…ingface#3400) * trim seq_len below 1024 if there are columns full of pad_token_id * Centralize trim_batch so SummarizationDataset can use it too

…ace#3371)

For some reason Sphinx extremely dislikes this and crashes.

T5-small in test isort

* add t5 docs basis * improve docs * add t5 docs * improve t5 docstring * add t5 tokenizer docstring * finish docstring * make style * add pretrained models * correct typo * make examples work * finalize docs

…ggingface#2991) * Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers. Signed-off-by: Morgan Funtowicz <[email protected]> * run_ner.py - Do not add a label to the labels_ids if word_tokens is empty. This can happen when using bert-base-multilingual-cased with an input containing an unique space. In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior over the labels_ids tokens adding one more tokens than tokens vector. Signed-off-by: Morgan Funtowicz <[email protected]>

* force bleu * fix wrong file name * rename file * different filenames for each example test * test files should clean up after themselves * test files should clean up after themselves * do not force bleu * correct typo * fix isort

…gingface#3516)

patrickvonplaten and others added 30 commits March 8, 2020 15:29

updated all tests

5759761

fix typo in test

314bdc7

fix typo in test gpt2

66c8276

Updated Tokenw ise in print statement to Token wise

b29fed7

fix xlnet & transfotests

b4a3a64

fixed all tests, still need to check ctrl tf and pt and xlm tf

fbd02d4

fix if use lang embeddings in tf xlm

4620caa

fix typo in test xlm tf

b73dd1a

fix repetition penalty mask in tf

3e624c6

test ctrl

b12541c

add print statement to avoid code quality problem

efb6192

delete w! -> need to be more careful with vim

9050ffe

cased -> uncased in BERT SQuAD example

eb3e6cb

closes huggingface#3183

fix typo

847d370

Merge pull request huggingface#3190 from patrickvonplaten/fix_repetit…

49debe6

…ion_penalty_in_tf_generate fix repetition penalty mask in tf

Bart example: model.to(device) (huggingface#3194)

3aca02e

TFQA pipeline marked as slow test

525b6b1

[model upload] Support for organizations

cbf8f5d

Model card for albert-base-v2-squad2

f51ba05

Merge pull request huggingface#3191 from patrickvonplaten/add_integra…

31f2437

…tion_tests_lm_generate_torch_tf Add integration tests lm generate torch tf

Create README.md

e57533c

Update README.md

6a13448

- Fix path of tokenizer - Clarify that the model is not trained on the evaluation set

Update README.md

2661d80

- Clarify that the model is not trained on the evaluation dataset

[dialogpt] conversion script

270dfa1

Reference: huggingface#1778 (comment) cc @patrickvonplaten and @dreasysnail

[doc] Document the new --organization flag of CLI

0e56dc3

[doc] --organization tweak

d6de642

Co-Authored-By: Thomas Wolf <[email protected]>

fix conflicts

d8e2b3c

refactored code a bit and made more generic

c0d9dd3

patrickvonplaten and others added 29 commits March 25, 2020 21:32

Extend config with task specific configs. (huggingface#3433)

ffa17fe

* add new default configs * change prefix default to None

Updated/added model cards (huggingface#3435)

010e046

Force the return of token type IDs (huggingface#3439)

ffcffeb

Update model card huseinzol05/bert-base-bahasa-cased (huggingface#3425)

3c5c567

* add bert bahasa readme * update readme * update readme * added xlnet

Create card for model GPT-2-finetuned-CORD19

7420a6a

rename string in pipeline

3119705

Add t5 summarization example (huggingface#3411)

e703e92

* rebase to master * change tf to pytorch * change to pytorch * small fix * renaming * add gpu training possibility * renaming * improve README * incoorporate collins feedback * better Readme * better README.md

revert unpin isort commit

b4fb94f

Add wmt translation example (huggingface#3428)

5ad2ea0

* add translation example * make style * adapt docstring * add gpu device as input for example * small renaming * better README

[Bart/Memory] don't create lm_head (huggingface#3323)

39371ee

* delete lm_head, skips weight tying * Fixed s3

[Seq2Seq Generation] Call encoder before expanding input_ids (hugging…

1a5aefc

…face#3370)

[Bart] Fix: put dummy_inputs on correct device (huggingface#3398)

2b2a2f8

* Dummy inputs to model.device * Move self.device to ModuleUtilsMixin

[Bart/Memory] SelfAttention only returns weights if config.outp… (hug…

63f4d8c

…gingface#3369)

[Bart: example] drop columns that are exclusively pad_token_id… (hugg…

c10decf

…ingface#3400) * trim seq_len below 1024 if there are columns full of pad_token_id * Centralize trim_batch so SummarizationDataset can use it too

Model Cards: Fix grammar error (huggingface#3467)

53fe733

[Bart/Memory] Two separate, smaller decoder attention masks (huggingf…

3ee431d

…ace#3371)

Correct indentation in docstring

e2c05f0

For some reason Sphinx extremely dislikes this and crashes.

Add option to choose T5 model size. (huggingface#3480)

ff80b73

T5-small in test isort

Add T5 to docs (huggingface#3461)

fa9af24

* add t5 docs basis * improve docs * add t5 docs * improve t5 docstring * add t5 tokenizer docstring * finish docstring * make style * add pretrained models * correct typo * make examples work * finalize docs

Rename t5-large to t5-base in README.md

f4f4946

add summarization and translation to notebook (huggingface#3478)

00ea100

[model_cards]: use MIT license for all dbmdz models

601ac5b

[BART] add bart-large-xsum weights (huggingface#3422)

f6a23d1

[Docs] examples/summarization/bart: Simplify CNN/DM preprocessi… (hug…

33ef700

…gingface#3516)

shoarora merged commit daff82d into add-shoarora-model-cards Mar 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pull master #1

Pull master #1

Uh oh!

shoarora commented Mar 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

35 participants

Pull master #1

Pull master #1

Uh oh!

Conversation

shoarora commented Mar 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

35 participants