Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilingual support #15

Open
scr255 opened this issue Sep 14, 2022 · 3 comments
Open

Multilingual support #15

scr255 opened this issue Sep 14, 2022 · 3 comments

Comments

@scr255
Copy link

scr255 commented Sep 14, 2022

Code for English:

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(images, docs)
# Works correctly!

Guide suggests "Use Concept(embedding_model="clip-ViT-B-32-multilingual-v1") to select a model that supports 50+ languages.":

from concept import Concept
# ImportError: cannot import name 'Concept' from 'concept' --> I guess you mean to import ConceptModel

Importing ConceptModel:

from concept import ConceptModel
concept_model = ConceptModel(embedding_model="clip-ViT-B-32-multilingual-v1")
concepts = concept_model.fit_transform(images, docs)
# TypeError: 'JpegImageFile' object is not subscriptable
@MaartenGr
Copy link
Owner

Hmmm, there might be something going wrong with the images that you pass to the model. Did the code for you work with the English version?

@scr255
Copy link
Author

scr255 commented Sep 20, 2022

Hmmm, there might be something going wrong with the images that you pass to the model. Did the code for you work with the English version?

Yes, the English model "clip-ViT-B-32" is working fine, while "clip-ViT-B-32-multilingual-v1" throws the error.

I've tried changing the dataset (all images in .jpeg format), and the same problem happens.

@MaartenGr
Copy link
Owner

Unfortunately, then there seems to be an issue with that specific model processing the images. You could try to embed the images using SentenceTransformers directly and then pass the embeddings to to fit_transform using the parameter image_embeddings. That way, you can also check if there is an issue with a specific image in your dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants