pythonadvanced

Fine-Tune Embeddings with SetFit

Fine-tune a sentence embedding model on a small labelled dataset using the SetFit framework.

python
from setfit import SetFitModel, SetFitTrainer
from datasets import Dataset

texts  = ['great product', 'loved it', 'terrible quality', 'awful experience', 'excellent service', 'very bad', 'highly recommend', 'waste of money']
labels = [1, 1, 0, 0, 1, 0, 1, 0]

dataset = Dataset.from_dict({'text': texts, 'label': labels})

model   = SetFitModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2', labels=['negative','positive'])
trainer = SetFitTrainer(
    model=model,
    train_dataset=dataset,
    eval_dataset=dataset,
    metric='accuracy',
    num_iterations=20,
)
trainer.train()

metrics = trainer.evaluate()
print('Accuracy:', metrics['accuracy'])

predictions = model.predict(['amazing!', 'complete disaster'])
print('Predictions:', predictions)

Use Cases

  • few-shot classification
  • custom embeddings
  • domain adaptation

Tags

Related Snippets

Similar patterns you can reuse in the same workflow.