Replies: 3 comments 1 reply
-
Hi @njelicic, I don't think you can draw conclusions based on the differences in cosine similarities without knowing the distribution of cosine similarities. It could be that a cosine similarity of 0.75 for your distilled model is the highest cosine similarity for cross-lingual sentence similarity. What matters more is that more similar sentences get higher cosine similarities, and vice versa. We have run extensive benchmarks which are documented in our results. However, if you have access to real data that reflects the task you want to solve, I would always run benchmarks yourself to see if the performance is good enough. While we have benchmarked on a large number of tasks and datasets, there is no way to know if the model is going to work for your task without testing and benchmarking it. |
Beta Was this translation helpful? Give feedback.
-
I discovered this with testing model2vec in my RAG application (not retrieving any relevant cross lingual documents) and tried to isolate a few examples for discussion. I did some more testing to demonstrate the effect. I created 4 small datasets:
Next, I did a t-test to compare the means for the different datasets within models. For the model2vec approach, all distributions have a statistically significant different mean (p<0.05). However, for the original model does not have statistically significant different means for the Cross-Lingual vs Inter-Lingual (p=0.1251) and Cross-Lingual Negative vs Inter-Lingual Negative (p=0.7561). The plots below show essentially the same: T-test Results Within Model2Vec: T-test Results Within SentenceTransformer: Here's the code to reproduce the results: from sentence_transformers import SentenceTransformer
from model2vec import StaticModel
from scipy.spatial.distance import cosine
from scipy.stats import f_oneway, ttest_ind
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
models = {
"SentenceTransformer": SentenceTransformer("BAAI/bge-m3", device=0),
"model2vec": StaticModel.from_pretrained("m2v_model")
}
cross_lingual = [
{'sentence1': "I am going to the store", 'sentence2': "Voy a la tienda"}, # Spanish
{'sentence1': "She is studying hard", 'sentence2': "Elle étudie dur",}, # French
{'sentence1': "I love programming", 'sentence2': "Ich liebe Programmieren"}, # German
{'sentence1': "Good morning", 'sentence2': "Bom dia"}, # Portuguese
{'sentence1': "How are you?", 'sentence2': "Comment ça va?"}, # French
{'sentence1': "Where is the library?", 'sentence2': "Wo ist die Bibliothek?"}, # German
{'sentence1': "I need help", 'sentence2': "Necesito ayuda"}, # Spanish
{'sentence1': "Thank you very much", 'sentence2': "Muchas gracias"}, # Spanish
{'sentence1': "This is my cat", 'sentence2': "Ceci est mon chat"}, # French
{'sentence1': "My favorite color is blue", 'sentence2': "Mi color favorito es azul"}, # Spanish
{'sentence1': "Let's go out for lunch", 'sentence2': "Vamos a salir a almorzar"}, # Spanish
{'sentence1': "I am happy", 'sentence2': "Sono felice"}, # Italian
{'sentence1': "Good evening", 'sentence2': "Buenas noches"}, # Spanish
{'sentence1': "I am from the USA", 'sentence2': "Je viens des États-Unis"}, # French
{'sentence1': "What time is it?", 'sentence2': "Que hora es?"}, # Spanish
{'sentence1': "I have a car", 'sentence2': "Ich habe ein Auto"}, # German
{'sentence1': "I want to learn", 'sentence2': "Je veux apprendre"}, # French
{'sentence1': "My name is John", 'sentence2': "Meu nome é João"}, # Portuguese
{'sentence1': "I have a brother", 'sentence2': "J'ai un frère"}, # French
{'sentence1': "I am sleepy", 'sentence2': "Tengo sueño"}, # Spanish
{'sentence1': "Please help me", 'sentence2': "Por favor ayúdame"}, # Spanish
{'sentence1': "How old are you?", 'sentence2': "Quanti anni hai?"}, # Italian
{'sentence1': "I like music", 'sentence2': "Ik hou van muziek"}, # Dutch
]
inter_lingual = [
{'sentence1': "I am very tired", 'sentence2': "I feel exhausted"}, # English, same meaning
{'sentence1': "She enjoys reading", 'sentence2': "She likes to read books"}, # English, same meaning
{'sentence1': "I am learning Python", 'sentence2': "I am studying Python programming"}, # English, same meaning
{'sentence1': "The sky is blue", 'sentence2': "The clouds are white"}, # English, same meaning
{'sentence1': "I like coffee", 'sentence2': "I prefer coffee over tea"}, # English, same meaning
{'sentence1': "She is my friend", 'sentence2': "She is one of my best friends"}, # English, same meaning
{'sentence1': "It is very hot today", 'sentence2': "Today is extremely warm outside"}, # English, same meaning
{'sentence1': "I am tired", 'sentence2': "I need rest"}, # English, same meaning
{'sentence1': "The food was delicious", 'sentence2': "The meal was amazing"}, # English, same meaning
{'sentence1': "She sings beautifully", 'sentence2': "She has a lovely voice"}, # English, same meaning
{'sentence1': "Mi chiamo Luca", 'sentence2': "Il mio nome è Luca"}, # Italian, same meaning
{'sentence1': "Jag älskar att läsa", 'sentence2': "Jag tycker om att läsa böcker"}, # Swedish, same meaning
{'sentence1': "C'est un beau jour", 'sentence2': "Il fait beau aujourd'hui"}, # French, same meaning
{'sentence1': "Das Wetter ist schön", 'sentence2': "Es ist sonnig heute"}, # German, same meaning
{'sentence1': "Oggi è una giornata calda", 'sentence2': "Fa caldo oggi"}, # Italian, same meaning
{'sentence1': "今日は暑いです", 'sentence2': "今日はとても暑いです"}, # Japanese, same meaning
{'sentence1': "Быстро бегать полезно", 'sentence2': "Занятия спортом полезны"}, # Russian, same meaning
{'sentence1': "Estoy cansado", 'sentence2': "Tengo sueño"}, # Spanish, same meaning
{'sentence1': "I love pizza", 'sentence2': "I enjoy eating pizza"}, # English, same meaning
{'sentence1': "I like traveling", 'sentence2': "I love to visit new places"}, # English, same meaning
{'sentence1': "She is tired", 'sentence2': "She feels exhausted"}, # English, same meaning
{'sentence1': "I am learning Spanish", 'sentence2': "I am studying Spanish language"}, # English, same meaning
{'sentence1': "It is very cold today", 'sentence2': "The weather is freezing today"}, # English, same meaning
{'sentence1': "He plays football", 'sentence2': "He enjoys playing soccer"}, # English, same meaning
{'sentence1': "I am hungry", 'sentence2': "I want to eat something"}, # English, same meaning
{'sentence1': "I love nature", 'sentence2': "I enjoy the outdoors"}, # English, same meaning
{'sentence1': "Il pleut aujourd'hui", 'sentence2': "Il fait mauvais aujourd'hui"}, # French, same meaning
{'sentence1': "Eu gosto de ler", 'sentence2': "Eu amo livros"}, # Portuguese, same meaning
{'sentence1': "Schöne Blumen", 'sentence2': "Ich mag Blumen"}, # German, same meaning
{'sentence1': "Me gusta nadar", 'sentence2': "Me encanta nadar en el mar"}, # Spanish, same meaning
{'sentence1': "今日は暑い", 'sentence2': "今日は非常に暑い"}, # Japanese, same meaning
{'sentence1': "J'aime le chocolat", 'sentence2': "Le chocolat est délicieux"},
]
inter_lingual_negative = [
{'sentence1': "I am very tired", 'sentence2': "The car is parked outside"}, # English, different meaning
{'sentence1': "I love programming", 'sentence2': "The dog is barking loudly"}, # English, different meaning
{'sentence1': "I enjoy reading books", 'sentence2': "The sky is cloudy today"}, # English, different meaning
{'sentence1': "She is my friend", 'sentence2': "My cat is sleeping peacefully"}, # English, different meaning
{'sentence1': "I am learning Python", 'sentence2': "My favorite sport is basketball"}, # English, different meaning
{'sentence1': "I feel sad", 'sentence2': "She is making dinner for everyone"}, # English, different meaning
{'sentence1': "I want to go for a walk", 'sentence2': "The movie starts at 8pm"}, # English, different meaning
{'sentence1': "I am happy", 'sentence2': "My phone is on the table"}, # English, different meaning
{'sentence1': "It is raining", 'sentence2': "The sun is shining brightly"}, # English, different meaning
{'sentence1': "She sings beautifully", 'sentence2': "The train is leaving soon"}, # English, different meaning
{'sentence1': "Mi chiamo Luca", 'sentence2': "La pizza è deliziosa"}, # Italian, different meaning
{'sentence1': "Jag älskar att läsa", 'sentence2': "Fiskarna simmar i sjön"}, # Swedish, different meaning
{'sentence1': "C'est un beau jour", 'sentence2': "J'ai acheté une nouvelle voiture"}, # French, different meaning
{'sentence1': "Das Wetter ist schön", 'sentence2': "Ich fahre nach Berlin"}, # German, different meaning
{'sentence1': "Oggi è una giornata calda", 'sentence2': "Sto mangiando una mela"}, # Italian, different meaning
{'sentence1': "今日は暑いです", 'sentence2': "私は昨日本を読んだ"}, # Japanese, different meaning
{'sentence1': "Быстро бегать полезно", 'sentence2': "Мы поехали на дачу"}, # Russian, different meaning
{'sentence1': "Estoy cansado", 'sentence2': "Mi casa está cerca del parque"}, # Spanish, different meaning
{'sentence1': "I want some water", 'sentence2': "I like to swim in the ocean"}, # English, different meaning
{'sentence1': "It is raining", 'sentence2': "The sun is shining"}, # English, different meaning
{'sentence1': "I am learning French", 'sentence2': "She is cooking dinner"}, # English, different meaning
{'sentence1': "She loves ice cream", 'sentence2': "He loves to play basketball"}, # English, different meaning
{'sentence1': "I am so happy today", 'sentence2': "It is snowing outside"}, # English, different meaning
{'sentence1': "She is reading a book", 'sentence2': "He is running in the park"}, # English, different meaning
{'sentence1': "I want to watch a movie", 'sentence2': "My friend is visiting me"}, # English, different meaning
{'sentence1': "I am traveling to Paris", 'sentence2': "I am going to the supermarket"}, # English, different meaning
{'sentence1': "Je suis fatigué", 'sentence2': "Je mange une pomme"}, # French, different meaning
{'sentence1': "Ich spiele Gitarre", 'sentence2': "Ich koche Abendessen"}, # German, different meaning
{'sentence1': "Estoy cansado", 'sentence2': "Estoy comiendo pizza"}, # Spanish, different meaning
{'sentence1': "今日は暑い", 'sentence2': "私は旅行に行きます"}, # Japanese, different meaning
{'sentence1': "J'aime les chats", 'sentence2': "Je travaille demain"},
]
cross_lingual_negative = [
{'sentence1': "I am going to work", 'sentence2': "Ik hou van aardbeien"}, # English to Dutch (work vs. strawberries)
{'sentence1': "She is studying hard", 'sentence2': "Ik ben aan het zwemmen"}, # English to Dutch (studying vs. swimming)
{'sentence1': "I love programming", 'sentence2': "Me gusta bailar salsa"}, # English to Spanish (programming vs. salsa dancing)
{'sentence1': "Good morning", 'sentence2': "Je suis fatigué"}, # English to French (morning vs. tired)
{'sentence1': "How are you?", 'sentence2': "C'est mon anniversaire"}, # English to French (how are you? vs. birthday)
{'sentence1': "Where is the library?", 'sentence2': "Ich liebe Schokolade"}, # English to German (library vs. chocolate)
{'sentence1': "I need help", 'sentence2': "Du bist mein bester Freund"}, # English to German (help vs. best friend)
{'sentence1': "Thank you very much", 'sentence2': "Estoy enojado"}, # English to Spanish (thank you vs. angry)
{'sentence1': "This is my cat", 'sentence2': "Mon chien est très gentil"}, # English to French (cat vs. dog)
{'sentence1': "My favorite color is blue", 'sentence2': "Voy a la playa"}, # English to Spanish (blue vs. going to the beach)
{'sentence1': "Let's go out for lunch", 'sentence2': "Je vais courir au parc"}, # English to French (lunch vs. running in the park)
{'sentence1': "I am happy", 'sentence2': "Estoy triste y solo"} # English to Spanish (happy vs. sad and alone)
]
results = {model: {'res': {'Cross-Lingual': [], 'Cross-Lingual Negative': [], 'Inter-Lingual': [], 'Inter-Lingual Negative': []}} for model in models}
def evaluate_similarity(model_name, model, dataset, dataset_name):
for d in dataset:
sent1 = model.encode(d['sentence1'])
sent2 = model.encode(d['sentence2'])
dist = (1 - cosine(sent1, sent2))
results[model_name]['res'][dataset_name].append(dist)
def plot_distributions(results):
plt.figure(figsize=(12, 8))
for model, data in results.items():
plt.figure(figsize=(12, 8))
for category, scores in data['res'].items():
sns.histplot(scores, kde=True, label=f"{model} - {category}", bins=20, alpha=0.5)
plt.legend()
plt.xlabel("Cosine Similarity")
plt.ylabel("Frequency")
plt.title(f"Distribution of Cosine Similarities for {model}")
plt.savefig(f'{model}.png')
for name, model in models.items():
evaluate_similarity(name, model, cross_lingual, "Cross-Lingual")
evaluate_similarity(name, model, cross_lingual_negative, "Cross-Lingual Negative")
evaluate_similarity(name, model, inter_lingual, "Inter-Lingual")
evaluate_similarity(name, model, inter_lingual_negative, "Inter-Lingual Negative")
plot_distributions(results)
def f_test_between_models(results):
# Create a matrix to store F-test results (between models)
f_test_matrix_models = {}
# Create a matrix to store t-test results (between models)
t_test_matrix_models = {}
# Create a matrix to store standard deviations comparison
std_dev_comparison = {}
# Iterate through each category (cross-lingual, inter-lingual, etc.)
categories = list(results['SentenceTransformer']['res'].keys())
for category in categories:
scores_model1 = results['SentenceTransformer']['res'][category]
scores_model2 = results['model2vec']['res'][category]
# Calculate standard deviations for each model
std_dev_model1 = np.std(scores_model1)
std_dev_model2 = np.std(scores_model2)
# Compare standard deviations
if std_dev_model1 > std_dev_model2:
std_dev_comparison[category] = f"SentenceTransformer has a larger standard deviation (SD = {std_dev_model1:.4f})"
else:
std_dev_comparison[category] = f"model2vec has a larger standard deviation (SD = {std_dev_model2:.4f})"
# Perform F-test between models for each category
f_stat, p_val_f = f_oneway(scores_model1, scores_model2)
f_test_matrix_models[category] = (f_stat, p_val_f)
# Perform t-test between models for each category
t_stat, p_val_t = ttest_ind(scores_model1, scores_model2)
t_test_matrix_models[category] = (t_stat, p_val_t)
return f_test_matrix_models, t_test_matrix_models, std_dev_comparison
def f_test_within_models(results):
# Create a matrix to store F-test results (within models)
f_test_matrix_within = {}
# Create a matrix to store t-test results (within models)
t_test_matrix_within = {}
# Iterate through each model
for model, data in results.items():
f_test_matrix_within[model] = {}
t_test_matrix_within[model] = {}
# Extract the different categories for each model
categories = list(data['res'].keys())
scores = [data['res'][category] for category in categories]
# Perform pairwise F-tests and t-tests within the same model
for i, cat1 in enumerate(categories):
for j, cat2 in enumerate(categories):
if i < j:
# Perform F-test
f_stat, p_val_f = f_oneway(scores[i], scores[j])
f_test_matrix_within[model][(cat1, cat2)] = (f_stat, p_val_f)
# Perform t-test
t_stat, p_val_t = ttest_ind(scores[i], scores[j])
t_test_matrix_within[model][(cat1, cat2)] = (t_stat, p_val_t)
return f_test_matrix_within, t_test_matrix_within
f_test_results_models, t_test_results_models, std_dev_comparison = f_test_between_models(results)
f_test_results_within, t_test_results_within = f_test_within_models(results)
print("\nF-test Results Between Models:")
for category, (f_stat, p_val) in f_test_results_models.items():
print(f" {category} | F-statistic: {f_stat:.4f}, p-value: {p_val:.4f}")
print("\nT-test Results Between Models:")
for category, (t_stat, p_val) in t_test_results_models.items():
print(f" {category} | T-statistic: {t_stat:.4f}, p-value: {p_val:.4f}")
print("\nStandard Deviation Comparison Between Models:")
for category, comparison in std_dev_comparison.items():
print(f" {category} | {comparison}")
for model, test_results in f_test_results_within.items():
print(f"\nF-test Results Within {model}:")
for (cat1, cat2), (f_stat, p_val) in test_results.items():
print(f" {cat1} vs {cat2} | F-statistic: {f_stat:.4f}, p-value: {p_val:.4f}")
for model, test_results in t_test_results_within.items():
print(f"\nT-test Results Within {model}:")
for (cat1, cat2), (t_stat, p_val) in test_results.items():
print(f" {cat1} vs {cat2} | T-statistic: {t_stat:.4f}, p-value: {p_val:.4f}") |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
After converting
BAAI/bge-m3
the cross-lingual performance of the model drops significantly. I converted the model with:I translated a few sentences I compared the cosine similarity between them. The average cosine similarity between the sentences drops from 0.92 to 0.751 between the two models.
output
I think that contextualization of the embeddings is necessary for this tasks. I can imagine that this would also impact other tasks such as code retrieval? Perhaps you could include some more benchmarks to the repo so it is clear when model2vec should be used with caution?
Beta Was this translation helpful? Give feedback.
All reactions