Text Embeddings
Models
DS1 currently provides one embedding model with the following specifications:
| Model Name | Dimension | Context Length | Tokenizer | Description |
|---|---|---|---|---|
| DS1-EN-V1 | 512 (L2 normalized) | 512 tokens | 30k WordPiece | High-performance English text retrieval model |
Tokenization
The DS1 embedding model uses the WordPiece tokenizer with the following characteristics:
| Tokenizer Type | Vocabulary Size | Special Tokens |
|---|---|---|
| WordPiece | 30,000 | [PAD], [UNK], [CLS] |
Modality
DS1 is a text-only embedding model optimized for English language text retrieval and semantic search applications.
Want to know about additional models? Check out our FAQ.