Fine-tuning Embeddings with NUDGE: A Practical Implementation
Fine-tuning Embeddings with NUDGE: Implementation Guide
The NUDGE technique (covered in detail in our separate article) enables you to adapt embedding models to your domain without full model retraining. This guide walks through practical implementation.
When to Use NUDGE
Perfect for:
- Domain-specific terminology where pre-trained embeddings under-perform
- Proprietary or closed-source embedding models you can't fine-tune
- Scenarios where you need rapid iteration on retrieval performance
- Systems where re-indexing existing embeddings is expensive
Less ideal for:
- Very small datasets (< 50 query-answer pairs)
- When full model fine-tuning is feasible and budget allows
Implementation Architecture
1. Collect training data
↓
2. Embed queries and documents with base model
↓
3. Run MaxS-EFT optimization (find movement directions)
↓
4. Run MaxA-EFT optimization (find optimal step size)
↓
5. Apply optimized movements to production embeddings
↓
6. Update your retrieval system (no re-embedding needed!)
Data Preparation
You need pairs of (query, relevant_document) examples:
Query: "How do I integrate Stripe payments?"
Document: "Payment integration with Stripe involves..."
Query: "What's the latency of API responses?"
Document: "API response times are optimized..."
Minimal viable dataset: 50-100 pairs Ideal dataset: 200-500 pairs
Higher quality is more important than quantity—each pair should represent real user needs.
Training Process
Step 1: Encode Your Data
Using your base embedding model (e.g., OpenAI's text-embedding-3-small):
query_embeddings = [embed(q) for q in queries]
doc_embeddings = [embed(d) for d in documents]
Step 2: MaxS-EFT Optimization
Calculate how much each embedding should move toward its matching query:
For each (query, doc) pair:
direction = normalize(query_embedding - doc_embedding)
movement_proposal[doc] += direction
This phase ensures correct documents get closer to their queries.
Step 3: MaxA-EFT Optimization
Find the scaling factor (γ) that maximizes accuracy on a validation set:
For each candidate γ value:
Apply: new_embeddings = embeddings + γ * movement_directions
Evaluate: How many queries now retrieve the right document?
Track: Best γ so far
Use grid search for NUDGE-N or closed-form solution for NUDGE-M.
Step 4: Apply Optimized Embeddings
Update your production embeddings with the final movement adjustments:
production_embeddings = original_embeddings + γ* * movement_directions
Key advantage: Your original indexed documents don't need re-embedding—only the vectors change.
Integration with Retrieval Systems
Once you have optimized embeddings:
- Replace vectors in your vector store (Pinecone, Weaviate, Milvus, etc.)
- No need to re-process documents or re-run your embedding model
- Immediate retrieval improvement on next query
Practical Considerations
Computing cost:
- Minimal—just matrix operations
- No GPU required (runs on CPU)
- Usually completes in < 1 second for 10K documents
Validation strategy:
- Split data: 80% training, 20% validation
- Use validation set only for MaxA-EFT to find optimal γ
- Test final performance on hold-out test examples
Monitoring:
- Track mean reciprocal rank (MRR) before/after
- Monitor nDCG@10 for ranking quality
- Check if queries now retrieve expected documents
Advanced: Streaming Adaptation
NUDGE excels at incremental updates:
1. Start with NUDGE-optimized embeddings
2. New domain data arrives
3. Run NUDGE again on new pairs
4. Update γ and movement directions
5. Apply to production embeddings
Unlike full model fine-tuning, you can repeat this weekly or daily without expensive retraining.
Apertis AI Integration
Through Apertis AI, you can:
- Access embedding models (e.g., OpenAI's text-embedding models)
- Build retrieval-augmented generation systems with optimized embeddings
- Combine NUDGE fine-tuning with Apertis's unified API for complete RAG pipelines
This gives you flexibility: use Apertis for production embeddings while applying NUDGE techniques to boost domain-specific retrieval.
Reference: LlamaIndex Fine-tuning Documentation