Skip to content

Conversation

stas00
Copy link
Contributor

@stas00 stas00 commented Jul 12, 2023

**important: The following notes are for my team mates and they won't work for anybody else as the data isn't ready for the public yet. should be made public next week **

Meanwhile to try it out:

$ git clone https://github.com/huggingface/transformers -b add-model-idefics
$ cd transformers

$ cat generate.py
import torch
from transformers import IdeficsForVisionText2Text, AutoProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

checkpoint = "HuggingFaceM4/idefics-9b"
#checkpoint = "HuggingFaceM4/tiny-random-idefics"

model = IdeficsForVisionText2Text.from_pretrained(checkpoint, torch_dtype=torch.bfloat16).to(device)
processor = AutoProcessor.from_pretrained(checkpoint)

prompts = [
    [
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg",
        "Describe this image.",
        
        "Assistant: An image of two kittens in grass.",
        
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/dog-puns-1581708208.jpg",
        "Describe this image.",
        
        "Assistant:",
    ],
    [
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/dog-puns-1581708208.jpg",
        "Describe this image.",
        
        "Assistant: An image of a dog wearing funny glasses.",

        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg",
        "Describe this image.",

        "Assistant:",
    ],
]

# batched mode
inputs = processor(prompts, return_tensors="pt").to(device)
# single sample mode
#inputs = processor(prompts[0], return_tensors="pt").to(device)

generated_ids = model.generate(**inputs, max_length=100)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)
for i,t in enumerate(generated_text):
    print(f"{i}:\n{t}\n")

and then run:

CUDA_VISIBLE_DEVICES=0 PYTHONPATH=src python generate.py

Demos

A PR with examples/demos, including finetuning, is here: huggingface/notebooks#418

TODOs before merging

  • make the models public - which coincides with the announcement/release

@stas00 stas00 changed the title [WIP] new model: IDEFIX via HuggingFaceM4 [WIP] new model: IDEFICS via HuggingFaceM4 Jul 12, 2023
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 12, 2023

The documentation is not available anymore as the PR was closed or merged.

@flozi00
Copy link
Contributor

flozi00 commented Jul 13, 2023

Is it possible to be a private repo ? ;-) The m4 repo from huggingface organisation does not exist

@stas00
Copy link
Contributor Author

stas00 commented Jul 13, 2023

Thank you for your interest, @flozi00 - please give us some time. It says WIP because it's not ready for a public consumption. I edited the OP to clarify that.

@stas00 stas00 force-pushed the add-model-idefics branch from 289a799 to c0fee5f Compare July 19, 2023 22:09
@stas00
Copy link
Contributor Author

stas00 commented Jul 31, 2023

Thank you, @sgugger, @HugoLaurencon and @leot13 for your reviews - I have addressed everything you have raised.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for all your work on this @stas00 !


def __init__(
self,
image_size: int = 224,
Copy link
Contributor

@amyeroberts amyeroberts Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the ambiguity of how size is handled in torchvision transforms (and reflected in our feature extractors), the image size parameter for image processors is a dictionary size, which contains one of:

  • {"height": h, "width": w}
  • {"shortest_edge": x}
  • {"shortest_edge": x, "longest_edge": y}

e.g. like here for PVT or here for CLIP.

@VictorSanh
Copy link
Contributor

prompts = [
    [
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg",
        "Describe this image."
        
        "Assistant: An image of two kittens in grass.",
        
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/dog-puns-1581708208.jpg",
        "Describe this image".
        
        "Assistant:",
    ],
    [
        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/dog-puns-1581708208.jpg",
        "Describe this image."
        
        "Assistant: An image of a dog wearing funny glasses.",

        "User:",
        "https://hips.hearstapps.com/hmg-prod/images/cute-photos-of-cats-in-grass-1593184777.jpg",
        "Describe this image".

        "Assistant:",
    ],
]

For posterity, that part of the OP (i can't edit unfortunately) is missing some "," (commas) at some end of string (for instance "Describe this image". -> "Describe this image",). this is important for the tokenization in particular when we call processor with add_end_of_utterance_token=True.

@sgugger
Copy link
Collaborator

sgugger commented Aug 9, 2023

I can edit if need be. You should also be able to push commits to this branch, since it's in the main fork and you have write permissions @VictorSanh :-)

Copy link
Member

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we have been avoiding using the variable name past in generate-related code, preferring the clearer past_key_values instead.

I've added suggested changes in the related lines :)

@stas00
Copy link
Contributor Author

stas00 commented Aug 18, 2023

Thanks a lot, @gante, for the suggestions - merged

stas00 and others added 5 commits August 18, 2023 07:36
…25442)

* add image_embeddings option in generate-related methods

* style

* rename image_embeddings and allow perceiver embeddings precomputation

* compute embeddings within generate

* make is_encoder_decoder= True the default in config

* nested if else fix

* better triple check

* switch if elif order for pixel values / img embeds

* update model_kwargs perceiver only at the end

* use _prepare_model_inputs instead of encoder_decoder logic

* fix comment typo

* fix config default for is_encoder_decoder

* style

* add typehints

* precompute in forward

* doc builder

* style

* pop instead of get image hidden states

* Trigger CI

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <[email protected]>

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <[email protected]>

* fix * + indentation + style

* simplify a bit the use_resampler logic using comments

* update diocstrings

* Trigger CI

---------

Co-authored-by: Arthur <[email protected]>
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all the work on this!

@stas00 stas00 merged commit 6c811a3 into main Aug 18, 2023
@stas00 stas00 deleted the add-model-idefics branch August 18, 2023 21:12
sgugger added a commit that referenced this pull request Aug 21, 2023
* rename

* restore

* mappings

* unedited tests+docs

* docs

* fixes

* fix auto-sync breakage

* cleanup

* wip

* wip

* add fetch_images

* remove einops dependency

* update

* fix

* fix

* fix

* fix

* fix

* re-add

* add batching

* rework

* fix

* improve

* add Leo as I am extending his work

* cleanup

* fix

* cleanup

* slow-test

* fix

* fix

* fixes

* deal with warning

* rename modified llama classes

* rework fetch_images

* alternative implementation

* cleanup

* strict version

* cleanup

* [`IDEFICS`] Fix idefics ci (#25056)

* Fix IDEFICS CI

* fix test file

* fixup

* some changes to make tests pass

* fix

* fixup

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: Stas Bekman <[email protected]>

---------

Co-authored-by: Stas Bekman <[email protected]>

* remove compat checks

* style

* explain that Idefics is not for training from scratch

* require pt>=2.0

* fix idefics vision config (#25092)

* fix idefics vision config

* fixup

* clean

* Update src/transformers/models/idefics/configuration_idefics.py

---------

Co-authored-by: Stas Bekman <[email protected]>

* cleanup

* style

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* upcase

* sequence of images

* handle the case with no images

* Update src/transformers/image_processing_utils.py

Co-authored-by: Victor SANH <[email protected]>

* support pure lm take 2

* support tokenizer options

* parameterize num_channels

* fix upcase

* s|IdeficsForCausalLM|IdeficsForVisionText2Text|g

* manual to one line

* addressing review

* unbreak

* remove clip dependency

* fix test

* consistency

* PIL import

* Idefics prefix

* Idefics prefix

* hack to make tests work

* style

* fix

* fix

* revert

* try/finally

* cleanup

* clean up

* move

* [`IDEFICS`] Fix idefics config refactor (#25149)

* refactor config

* nuke init weights

* more refactor

* oops

* remove visual question answering pipeline support

* Update src/transformers/models/idefics/clip.py

Co-authored-by: Stas Bekman <[email protected]>

* Update src/transformers/models/idefics/modeling_idefics.py

* cleanup

* mv clip.py vision.py

* tidyup

---------

Co-authored-by: Stas Bekman <[email protected]>
Co-authored-by: Stas Bekman <[email protected]>

* fix

* license

* condition on pt

* fix

* style

* fix

* rm torchvision dependency, allow custom transforms

* address review

* rework device arg

* add_eos_token

* s/transforms/transform/

* fix top level imports

* fix return value

* cleanup

* cleanup

* fix

* style

* license

* license

* Update src/transformers/models/idefics/image_processing_idefics.py

Co-authored-by: Sylvain Gugger <[email protected]>

* add a wrapper to freeze vision layears

* tidyup

* use the correct std/mean settings

* parameterize values from config

* add tests/models/idefics/test_image_processing_idefics.py

* add test_processor_idefics.py

* cleanup

* cleanups

* fix

* fix

* move to the right group

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* add perceiver config

* reset

* missing arg docs

* Apply suggestions from code review

Co-authored-by: Leo Tronchon <[email protected]>

* address review comments

* inject automatic end of utterance tokens (#25218)

* inject automatic end of utterance tokens

* fix

* fix

* fix

* rework to not use the config

* not end_of_utterance_token at the end

* Update src/transformers/models/idefics/processing_idefics.py

Co-authored-by: Sylvain Gugger <[email protected]>

* address review

* Apply suggestions from code review

Co-authored-by: Joao Gante <[email protected]>

* Update src/transformers/image_processing_utils.py

Co-authored-by: Nicolas Patry <[email protected]>

* [`Idefics`] add image_embeddings option in generate-related methods (#25442)

* add image_embeddings option in generate-related methods

* style

* rename image_embeddings and allow perceiver embeddings precomputation

* compute embeddings within generate

* make is_encoder_decoder= True the default in config

* nested if else fix

* better triple check

* switch if elif order for pixel values / img embeds

* update model_kwargs perceiver only at the end

* use _prepare_model_inputs instead of encoder_decoder logic

* fix comment typo

* fix config default for is_encoder_decoder

* style

* add typehints

* precompute in forward

* doc builder

* style

* pop instead of get image hidden states

* Trigger CI

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <[email protected]>

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <[email protected]>

* fix * + indentation + style

* simplify a bit the use_resampler logic using comments

* update diocstrings

* Trigger CI

---------

Co-authored-by: Arthur <[email protected]>

* fix rebase changes

* unbreak #25237 - to be fixed in follow up PRs

* is_composition = False

* no longer needed

---------

Co-authored-by: leot13 <[email protected]>
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Victor SANH <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Nicolas Patry <[email protected]>
Co-authored-by: Arthur <[email protected]>
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
* rename

* restore

* mappings

* unedited tests+docs

* docs

* fixes

* fix auto-sync breakage

* cleanup

* wip

* wip

* add fetch_images

* remove einops dependency

* update

* fix

* fix

* fix

* fix

* fix

* re-add

* add batching

* rework

* fix

* improve

* add Leo as I am extending his work

* cleanup

* fix

* cleanup

* slow-test

* fix

* fix

* fixes

* deal with warning

* rename modified llama classes

* rework fetch_images

* alternative implementation

* cleanup

* strict version

* cleanup

* [`IDEFICS`] Fix idefics ci (huggingface#25056)

* Fix IDEFICS CI

* fix test file

* fixup

* some changes to make tests pass

* fix

* fixup

* Update src/transformers/models/idefics/configuration_idefics.py

Co-authored-by: Stas Bekman <[email protected]>

---------

Co-authored-by: Stas Bekman <[email protected]>

* remove compat checks

* style

* explain that Idefics is not for training from scratch

* require pt>=2.0

* fix idefics vision config (huggingface#25092)

* fix idefics vision config

* fixup

* clean

* Update src/transformers/models/idefics/configuration_idefics.py

---------

Co-authored-by: Stas Bekman <[email protected]>

* cleanup

* style

* cleanup

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* upcase

* sequence of images

* handle the case with no images

* Update src/transformers/image_processing_utils.py

Co-authored-by: Victor SANH <[email protected]>

* support pure lm take 2

* support tokenizer options

* parameterize num_channels

* fix upcase

* s|IdeficsForCausalLM|IdeficsForVisionText2Text|g

* manual to one line

* addressing review

* unbreak

* remove clip dependency

* fix test

* consistency

* PIL import

* Idefics prefix

* Idefics prefix

* hack to make tests work

* style

* fix

* fix

* revert

* try/finally

* cleanup

* clean up

* move

* [`IDEFICS`] Fix idefics config refactor (huggingface#25149)

* refactor config

* nuke init weights

* more refactor

* oops

* remove visual question answering pipeline support

* Update src/transformers/models/idefics/clip.py

Co-authored-by: Stas Bekman <[email protected]>

* Update src/transformers/models/idefics/modeling_idefics.py

* cleanup

* mv clip.py vision.py

* tidyup

---------

Co-authored-by: Stas Bekman <[email protected]>
Co-authored-by: Stas Bekman <[email protected]>

* fix

* license

* condition on pt

* fix

* style

* fix

* rm torchvision dependency, allow custom transforms

* address review

* rework device arg

* add_eos_token

* s/transforms/transform/

* fix top level imports

* fix return value

* cleanup

* cleanup

* fix

* style

* license

* license

* Update src/transformers/models/idefics/image_processing_idefics.py

Co-authored-by: Sylvain Gugger <[email protected]>

* add a wrapper to freeze vision layears

* tidyup

* use the correct std/mean settings

* parameterize values from config

* add tests/models/idefics/test_image_processing_idefics.py

* add test_processor_idefics.py

* cleanup

* cleanups

* fix

* fix

* move to the right group

* style

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <[email protected]>

* add perceiver config

* reset

* missing arg docs

* Apply suggestions from code review

Co-authored-by: Leo Tronchon <[email protected]>

* address review comments

* inject automatic end of utterance tokens (huggingface#25218)

* inject automatic end of utterance tokens

* fix

* fix

* fix

* rework to not use the config

* not end_of_utterance_token at the end

* Update src/transformers/models/idefics/processing_idefics.py

Co-authored-by: Sylvain Gugger <[email protected]>

* address review

* Apply suggestions from code review

Co-authored-by: Joao Gante <[email protected]>

* Update src/transformers/image_processing_utils.py

Co-authored-by: Nicolas Patry <[email protected]>

* [`Idefics`] add image_embeddings option in generate-related methods (huggingface#25442)

* add image_embeddings option in generate-related methods

* style

* rename image_embeddings and allow perceiver embeddings precomputation

* compute embeddings within generate

* make is_encoder_decoder= True the default in config

* nested if else fix

* better triple check

* switch if elif order for pixel values / img embeds

* update model_kwargs perceiver only at the end

* use _prepare_model_inputs instead of encoder_decoder logic

* fix comment typo

* fix config default for is_encoder_decoder

* style

* add typehints

* precompute in forward

* doc builder

* style

* pop instead of get image hidden states

* Trigger CI

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <[email protected]>

* Update src/transformers/models/idefics/modeling_idefics.py

Co-authored-by: Arthur <[email protected]>

* fix * + indentation + style

* simplify a bit the use_resampler logic using comments

* update diocstrings

* Trigger CI

---------

Co-authored-by: Arthur <[email protected]>

* fix rebase changes

* unbreak huggingface#25237 - to be fixed in follow up PRs

* is_composition = False

* no longer needed

---------

Co-authored-by: leot13 <[email protected]>
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: Victor SANH <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Nicolas Patry <[email protected]>
Co-authored-by: Arthur <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.