Skip to content

Conversation

@shoarora
Copy link
Owner

No description provided.

patrickvonplaten and others added 30 commits March 8, 2020 15:29
…ion_penalty_in_tf_generate

fix repetition penalty mask in tf
* Minimal example

* Proposal 2

* Proposal 2 for fast tokenizers

* Typings

* Docs

* Revert "Docs" for easier review

This reverts commit eaf0f97.

* Remove unnecessary assignments

* Tests

* Fix faulty type

* Remove prints

* return_outputs -> model_input_names

* Revert "Revert "Docs" for easier review"

This reverts commit 6fdc694.

* code quality
* 1. seqeval required by ner pl example. install from examples/requirements. 2. unrecognized arguments: save_steps

* pl checkpoint callback filenotfound error: make directory and pass

* huggingface#3159 pl checkpoint path difference

* 1. Updated Readme for pl 2. pl script now also correct displays logs 3. pass gpu ids compared to number of gpus

* Updated results in readme

* 1. updated readme 2. removing deprecated pl methods 3. finalizing scripts

* comment length check

* using deprecated validation_end for stable results

* style related changes
…tion_tests_lm_generate_torch_tf

Add integration tests lm generate torch tf
- Fix path of tokenizer
- Clarify that the model is not trained on the evaluation set
- Clarify that the model is not trained on the evaluation dataset
Co-Authored-By: Thomas Wolf <[email protected]>
patrickvonplaten and others added 29 commits March 25, 2020 21:32
* add new default configs

* change prefix default to None
* solve conflicts

* move warnings below

* incorporate changes

* add pad_to_max_length to pipelines

* add bug fix for T5 beam search

* add prefix patterns

* make style

* fix conflicts

* adapt pipelines for task specific parameters

* improve docstring

* remove unused patterns
* add bert bahasa readme

* update readme

* update readme

* added xlnet
* fix merge conflicts

* add t5 summarization example

* change parameters for t5 summarization

* make style

* add first code snippet for translation

* only add prefixes

* add prefix patterns

* make style

* renaming

* fix conflicts

* remove unused patterns

* solve conflicts

* fix merge conflicts

* remove translation example

* remove summarization example

* make sure tensors are in numpy for float comparsion

* re-add t5 config

* fix t5 import config typo

* make style

* remove unused numpy statements

* update doctstring

* import translation pipeline
* Add the missing token classification for XLM

* fix styling

* Add XLMForTokenClassification to AutoModelForTokenClassification class

* Fix docstring typo for non-existing class

* Add the missing token classification for XLM

* fix styling

* fix styling

* Add XLMForTokenClassification to AutoModelForTokenClassification class

* Fix docstring typo for non-existing class

* Add missing description for AlbertForTokenClassification

* fix styling

* Add missing docstring for AlBert

* Slow tests should be slow

Co-authored-by: Sakares Saengkaew <[email protected]>
Co-authored-by: LysandreJik <[email protected]>
* rebase to master

* change tf to pytorch

* change to pytorch

* small fix

* renaming

* add gpu training possibility

* renaming

* improve README

* incoorporate collins feedback

* better Readme

* better README.md
* add translation example

* make style

* adapt docstring

* add gpu device as input for example

* small renaming

* better README
* delete lm_head, skips weight tying
* Fixed s3
* Dummy inputs to model.device

* Move self.device to ModuleUtilsMixin
…ingface#3400)

* trim seq_len below 1024 if there are columns full of pad_token_id
* Centralize trim_batch so SummarizationDataset can use it too
For some reason Sphinx extremely dislikes this and crashes.
* add t5 docs basis

* improve docs

* add t5 docs

* improve t5 docstring

* add t5 tokenizer docstring

* finish docstring

* make style

* add pretrained models

* correct typo

* make examples work

* finalize docs
…ggingface#2991)

* Use tokenizer.num_added_tokens to count number of added special_tokens instead of hardcoded numbers.

Signed-off-by: Morgan Funtowicz <[email protected]>

* run_ner.py - Do not add a label to the labels_ids if word_tokens is empty.

This can happen when using bert-base-multilingual-cased with an input containing an unique space.
In this case, the tokenizer will output just an empty word_tokens thus leading to an non-consistent behavior
over the labels_ids tokens adding one more tokens than tokens vector.

Signed-off-by: Morgan Funtowicz <[email protected]>
* force bleu

* fix wrong file name

* rename file

* different filenames for each example test

* test files should clean up after themselves

* test files should clean up after themselves

* do not force bleu

* correct typo

* fix isort
@shoarora shoarora merged commit daff82d into add-shoarora-model-cards Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.