DS1 Text Embeddings Inference

Scroll down for code samples, example requests and responses. Select a language for code samples from the tabs above or the mobile navigation menu.

Text Embedding Webserver

License: Proprietary - Takara.AI

Text Embeddings Inference

decode

POST /decode

Decode input ids

Body parameter

json

{
  "ids": [
    0
  ],
  "skip_special_tokens": true
}

Parameters

Name	In	Type	Required	Description
body	body	DecodeRequest	true	none

Example responses

200 Response

json

[
  "test"
]

400 Response

json

{
  "error": "Batch is empty",
  "error_type": "empty"
}

413 Response

json

{
  "error": "Batch size error",
  "error_type": "validation"
}

422 Response

json

{
  "error": "Tokenization error",
  "error_type": "tokenizer"
}

Responses

Status	Meaning	Description	Schema
200	OK	Decoded ids	DecodeResponse
400	Bad Request	Batch is empty	ErrorResponse
413	Payload Too Large	Batch size error	ErrorResponse
422	Unprocessable Entity	Tokenization error	ErrorResponse

embed

POST /embed

Get Embeddings. Returns a 424 status code if the model is not an embedding model.

Body parameter

json

{
  "dimensions": 0,
  "inputs": "string",
  "normalize": true,
  "prompt_name": "string",
  "truncate": false,
  "truncation_direction": "Left"
}

Parameters

Name	In	Type	Required	Description
body	body	EmbedRequest	true	none

Example responses

200 Response

json

400 Response

json

{
  "error": "Batch is empty",
  "error_type": "empty"
}

413 Response

json

{
  "error": "Batch size error",
  "error_type": "validation"
}

422 Response

json

{
  "error": "Tokenization error",
  "error_type": "tokenizer"
}

424 Response

json

{
  "error": "Inference failed",
  "error_type": "backend"
}

429 Response

json

{
  "error": "Model is overloaded",
  "error_type": "overloaded"
}

Responses

Status	Meaning	Description	Schema
200	OK	Embeddings	EmbedResponse
400	Bad Request	Batch is empty	ErrorResponse
413	Payload Too Large	Batch size error	ErrorResponse
422	Unprocessable Entity	Tokenization error	ErrorResponse
424	Failed Dependency	Embedding Error	ErrorResponse
429	Too Many Requests	Model is overloaded	ErrorResponse

health

GET /health

Health check method

Example responses

503 Response

json

{
  "error": "unhealthy",
  "error_type": "unhealthy"
}

Responses

Status	Meaning	Description	Schema
200	OK	Everything is working fine	None
503	Service Unavailable	Text embeddings Inference is down	ErrorResponse

get_model_info

GET /info

Text Embeddings Inference endpoint info

Example responses

200 Response

json

{
  "auto_truncate": true,
  "docker_label": "null",
  "max_batch_requests": 0,
  "max_batch_tokens": "2048",
  "max_client_batch_size": "32",
  "max_concurrent_requests": "128",
  "max_input_length": "512",
  "model_dtype": "float16",
  "model_id": "thenlper/gte-base",
  "model_sha": "fca14538aa9956a46526bd1d0d11d69e19b5a101",
  "model_type": {
    "classifier": {
      "id2label": {
        "0": "LABEL"
      },
      "label2id": {
        "LABEL": 0
      }
    }
  },
  "sha": "null",
  "tokenization_workers": "4",
  "version": "0.5.0"
}

Responses

Status	Meaning	Description	Schema
200	OK	Served model info	Info

metrics

GET /metrics

Prometheus metrics scrape endpoint

Example responses

200 Response

"string"

Responses

Status	Meaning	Description	Schema
200	OK	Prometheus Metrics	string

similarity

POST /similarity

Get Sentence Similarity. Returns a 424 status code if the model is not an embedding model.

Body parameter

json

{
  "inputs": {
    "sentences": [
      "What is Machine Learning?"
    ],
    "source_sentence": "What is Deep Learning?"
  },
  "parameters": {
    "prompt_name": "string",
    "truncate": false,
    "truncation_direction": "Left"
  }
}

Parameters

Name	In	Type	Required	Description
body	body	SimilarityRequest	true	none

Example responses

200 Response

json

[
  0,
  1,
  0.5
]

400 Response

json

{
  "error": "Batch is empty",
  "error_type": "empty"
}

413 Response

json

{
  "error": "Batch size error",
  "error_type": "validation"
}

422 Response

json

{
  "error": "Tokenization error",
  "error_type": "tokenizer"
}

424 Response

json

{
  "error": "Inference failed",
  "error_type": "backend"
}

429 Response

json

{
  "error": "Model is overloaded",
  "error_type": "overloaded"
}

Responses

Status	Meaning	Description	Schema
200	OK	Sentence Similarity	SimilarityResponse
400	Bad Request	Batch is empty	ErrorResponse
413	Payload Too Large	Batch size error	ErrorResponse
422	Unprocessable Entity	Tokenization error	ErrorResponse
424	Failed Dependency	Embedding Error	ErrorResponse
429	Too Many Requests	Model is overloaded	ErrorResponse

tokenize

POST /tokenize

Tokenize inputs

Body parameter

json

{
  "add_special_tokens": true,
  "inputs": "string",
  "prompt_name": "string"
}

Parameters

Name	In	Type	Required	Description
body	body	TokenizeRequest	true	none

Example responses

200 Response

json

[
  [
    {
      "id": 0,
      "special": false,
      "start": 0,
      "stop": 2,
      "text": "test"
    }
  ]
]

400 Response

json

{
  "error": "Batch is empty",
  "error_type": "empty"
}

413 Response

json

{
  "error": "Batch size error",
  "error_type": "validation"
}

422 Response

json

{
  "error": "Tokenization error",
  "error_type": "tokenizer"
}

Responses

Status	Meaning	Description	Schema
200	OK	Tokenized ids	TokenizeResponse
400	Bad Request	Batch is empty	ErrorResponse
413	Payload Too Large	Batch size error	ErrorResponse
422	Unprocessable Entity	Tokenization error	ErrorResponse

openai_embed

POST /v1/embeddings

OpenAI compatible route. Returns a 424 status code if the model is not an embedding model.

Body parameter

json

{
  "dimensions": 0,
  "encoding_format": "float",
  "input": "string",
  "model": "string",
  "user": "string"
}

Parameters

Name	In	Type	Required	Description
body	body	OpenAICompatRequest	true	none

Example responses

200 Response

json

{
  "data": [
    {
      "embedding": [
        0.1
      ],
      "index": "0",
      "object": "embedding"
    }
  ],
  "model": "thenlper/gte-base",
  "object": "list",
  "usage": {
    "prompt_tokens": "512",
    "total_tokens": "512"
  }
}

400 Response

json

{
  "message": "Batch is empty",
  "type": "empty"
}

413 Response

json

{
  "message": "Batch size error",
  "type": "validation"
}

422 Response

json

{
  "message": "Tokenization error",
  "type": "tokenizer"
}

424 Response

json

{
  "message": "Inference failed",
  "type": "backend"
}

429 Response

json

{
  "message": "Model is overloaded",
  "type": "overloaded"
}

Responses

Status	Meaning	Description	Schema
200	OK	Embeddings	OpenAICompatResponse
400	Bad Request	Batch is empty	OpenAICompatErrorResponse
413	Payload Too Large	Batch size error	OpenAICompatErrorResponse
422	Unprocessable Entity	Tokenization error	OpenAICompatErrorResponse
424	Failed Dependency	Embedding Error	OpenAICompatErrorResponse
429	Too Many Requests	Model is overloaded	OpenAICompatErrorResponse

Schemas

ClassifierModel

json

{
  "id2label": {
    "0": "LABEL"
  },
  "label2id": {
    "LABEL": 0
  }
}

Properties

Name	Type	Required	Restrictions	Description
id2label	object	true	none	none
» additionalProperties	string	false	none	none
label2id	object	true	none	none
» additionalProperties	integer	false	none	none

DecodeRequest

json

{
  "ids": [
    0
  ],
  "skip_special_tokens": true
}

Properties

Name	Type	Required	Restrictions	Description
ids	InputIds	true	none	none
skip_special_tokens	boolean	false	none	Whether to skip special tokens (defaults to true if not provided)

DecodeResponse

json

[
  "test"
]

Properties

None

EmbedRequest

json

{
  "dimensions": 0,
  "inputs": "string",
  "normalize": true,
  "prompt_name": "string",
  "truncate": false,
  "truncation_direction": "Left"
}

Properties

Name	Type	Required	Restrictions	Description
dimensions	integer¦null	false	none	The number of dimensions that the output embeddings should have. If not set, the original shape of the representation will be returned instead.
inputs	Input	true	none	none
normalize	boolean	false	none	Whether to normalize embeddings (defaults to true if not provided)
prompt_name	string¦null	false	none	The name of the prompt that should be used by for encoding. If not set, no prompt will be applied. Must be a key in the `sentence-transformers` configuration `prompts` dictionary. For example if `prompt_name` is "query" and the `prompts` is {"query": "query: ", ...}, then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode.
truncate	boolean¦null	false	none	none
truncation_direction	TruncationDirection	false	none	none

EmbedResponse

json

Properties

None

Embedding

json

[
  0.1
]

Properties

oneOf

Name	Type	Required	Restrictions	Description
anonymous	[number]	false	none	none

xor

Name	Type	Required	Restrictions	Description
anonymous	string	false	none	none

EmbeddingModel

json

{
  "pooling": "cls"
}

Properties

Name	Type	Required	Restrictions	Description
pooling	string	true	none	none

EncodingFormat

json

"float"

Properties

Name	Type	Required	Restrictions	Description
anonymous	string	false	none	none

Enumerated Values

Property	Value
anonymous	float
anonymous	base64

ErrorResponse

json

{
  "error": "string",
  "error_type": "Unhealthy"
}

Properties

Name	Type	Required	Restrictions	Description
error	string	true	none	none
error_type	ErrorType	true	none	none

ErrorType

json

"Unhealthy"

Properties

Name	Type	Required	Restrictions	Description
anonymous	string	false	none	none

Enumerated Values

Property	Value
anonymous	Unhealthy
anonymous	Backend
anonymous	Overloaded
anonymous	Validation
anonymous	Tokenizer
anonymous	Empty

Info

json

{
  "auto_truncate": true,
  "docker_label": "null",
  "max_batch_requests": 0,
  "max_batch_tokens": "2048",
  "max_client_batch_size": "32",
  "max_concurrent_requests": "128",
  "max_input_length": "512",
  "model_dtype": "float16",
  "model_id": "thenlper/gte-base",
  "model_sha": "fca14538aa9956a46526bd1d0d11d69e19b5a101",
  "model_type": {
    "classifier": {
      "id2label": {
        "0": "LABEL"
      },
      "label2id": {
        "LABEL": 0
      }
    }
  },
  "sha": "null",
  "tokenization_workers": "4",
  "version": "0.5.0"
}

Properties

Name	Type	Required	Restrictions	Description
auto_truncate	boolean	true	none	none
docker_label	string¦null	false	none	none
max_batch_requests	integer¦null	false	none	none
max_batch_tokens	integer	true	none	none
max_client_batch_size	integer	true	none	none
max_concurrent_requests	integer	true	none	Router Parameters
max_input_length	integer	true	none	none
model_dtype	string	true	none	none
model_id	string	true	none	Model info
model_sha	string¦null	false	none	none
model_type	ModelType	true	none	none
sha	string¦null	false	none	none
tokenization_workers	integer	true	none	none
version	string	true	none	Router Info

Input

json

"string"

Properties

oneOf

Name	Type	Required	Restrictions	Description
anonymous	InputType	false	none	none

xor

Name	Type	Required	Restrictions	Description
anonymous	[InputType]	false	none	none

InputIds

json

[
  0
]

Properties

oneOf

Name	Type	Required	Restrictions	Description
anonymous	[integer]	false	none	none

xor

Name	Type	Required	Restrictions	Description
anonymous	[array]	false	none	none

InputType

json

"string"

Properties

oneOf

Name	Type	Required	Restrictions	Description
anonymous	string	false	none	none

xor

Name	Type	Required	Restrictions	Description
anonymous	[integer]	false	none	none

ModelType

json

{
  "classifier": {
    "id2label": {
      "0": "LABEL"
    },
    "label2id": {
      "LABEL": 0
    }
  }
}

Properties

oneOf

Name	Type	Required	Restrictions	Description
anonymous	object	false	none	none
» classifier	ClassifierModel	true	none	none

xor

Name	Type	Required	Restrictions	Description
anonymous	object	false	none	none
» embedding	EmbeddingModel	true	none	none

xor

Name	Type	Required	Restrictions	Description
anonymous	object	false	none	none
» reranker	ClassifierModel	true	none	none

OpenAICompatEmbedding

json

{
  "embedding": [
    0.1
  ],
  "index": "0",
  "object": "embedding"
}

Properties

Name	Type	Required	Restrictions	Description
embedding	Embedding	true	none	none
index	integer	true	none	none
object	string	true	none	none

OpenAICompatErrorResponse

json

{
  "code": 0,
  "error_type": "Unhealthy",
  "message": "string"
}

Properties

Name	Type	Required	Restrictions	Description
code	integer(int32)	true	none	none
error_type	ErrorType	true	none	none
message	string	true	none	none

OpenAICompatRequest

json

{
  "dimensions": 0,
  "encoding_format": "float",
  "input": "string",
  "model": "string",
  "user": "string"
}

Properties

Name	Type	Required	Restrictions	Description
dimensions	integer¦null	false	none	none
encoding_format	EncodingFormat	false	none	none
input	Input	true	none	none
model	string¦null	false	none	none
user	string¦null	false	none	none

OpenAICompatResponse

json

{
  "data": [
    {
      "embedding": [
        0.1
      ],
      "index": "0",
      "object": "embedding"
    }
  ],
  "model": "thenlper/gte-base",
  "object": "list",
  "usage": {
    "prompt_tokens": "512",
    "total_tokens": "512"
  }
}

Properties

Name	Type	Required	Restrictions	Description
data	[OpenAICompatEmbedding]	true	none	none
model	string	true	none	none
object	string	true	none	none
usage	OpenAICompatUsage	true	none	none

OpenAICompatUsage

json

{
  "prompt_tokens": "512",
  "total_tokens": "512"
}

Properties

Name	Type	Required	Restrictions	Description
prompt_tokens	integer	true	none	none
total_tokens	integer	true	none	none

SimilarityInput

json

{
  "sentences": [
    "What is Machine Learning?"
  ],
  "source_sentence": "What is Deep Learning?"
}

Properties

Name	Type	Required	Restrictions	Description
sentences	[string]	true	none	A list of strings which will be compared against the source_sentence.
source_sentence	string	true	none	The string that you wish to compare the other strings with. This can be a phrase, sentence, or longer passage, depending on the model being used.

SimilarityParameters

json

{
  "prompt_name": "string",
  "truncate": false,
  "truncation_direction": "Left"
}

Properties

Name	Type	Required	Restrictions	Description
prompt_name	string¦null	false	none	The name of the prompt that should be used by for encoding. If not set, no prompt will be applied. Must be a key in the `sentence-transformers` configuration `prompts` dictionary. For example if `prompt_name` is "query" and the `prompts` is {"query": "query: ", ...}, then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode.
truncate	boolean¦null	false	none	none
truncation_direction	TruncationDirection	false	none	none

SimilarityRequest

json

{
  "inputs": {
    "sentences": [
      "What is Machine Learning?"
    ],
    "source_sentence": "What is Deep Learning?"
  },
  "parameters": {
    "prompt_name": "string",
    "truncate": false,
    "truncation_direction": "Left"
  }
}

Properties

Name	Type	Required	Restrictions	Description
inputs	SimilarityInput	true	none	none
parameters	SimilarityParameters¦null	false	none	none

SimilarityResponse

json

[
  0,
  1,
  0.5
]

Properties

None

SimpleToken

json

{
  "id": 0,
  "special": false,
  "start": 0,
  "stop": 2,
  "text": "test"
}

Properties

Name	Type	Required	Restrictions	Description
id	integer(int32)	true	none	none
special	boolean	true	none	none
start	integer¦null	false	none	none
stop	integer¦null	false	none	none
text	string	true	none	none

TokenizeInput

json

"string"

Properties

oneOf

Name	Type	Required	Restrictions	Description
anonymous	string	false	none	none

xor

Name	Type	Required	Restrictions	Description
anonymous	[string]	false	none	none

TokenizeRequest

json

{
  "add_special_tokens": true,
  "inputs": "string",
  "prompt_name": "string"
}

Properties

Name	Type	Required	Restrictions	Description
add_special_tokens	boolean	false	none	Whether to add special tokens (defaults to true if not provided)
inputs	TokenizeInput	true	none	none
prompt_name	string¦null	false	none	The name of the prompt that should be used by for encoding. If not set, no prompt will be applied. Must be a key in the `sentence-transformers` configuration `prompts` dictionary. For example if `prompt_name` is "query" and the `prompts` is {"query": "query: ", ...}, then the sentence "What is the capital of France?" will be encoded as "query: What is the capital of France?" because the prompt text will be prepended before any text to encode.

TokenizeResponse

json

[
  [
    {
      "id": 0,
      "special": false,
      "start": 0,
      "stop": 2,
      "text": "test"
    }
  ]
]

Properties

None

TruncationDirection

json

"Left"

Properties

Name	Type	Required	Restrictions	Description
anonymous	string	false	none	none

Enumerated Values

Property	Value
anonymous	Left
anonymous	Right

DS1 Text Embeddings Inference

Text Embeddings Inference

decode ​

Parameters

Responses

embed ​

Parameters

Responses

health ​

Responses

get_model_info ​

Responses

metrics ​

Responses

similarity ​

Parameters

Responses

tokenize ​

Parameters

Responses

openai_embed ​

Parameters

Responses

Schemas ​

ClassifierModel

Properties ​

DecodeRequest

Properties ​

DecodeResponse

Properties ​

EmbedRequest

Properties ​

EmbedResponse

Properties ​

Embedding

Properties ​

EmbeddingModel

Properties ​

EncodingFormat

Properties ​

Enumerated Values ​

ErrorResponse

Properties ​

ErrorType

Properties ​

Enumerated Values ​

Info

Properties ​

Input

Properties ​

InputIds

Properties ​

InputType

Properties ​

ModelType

Properties ​

OpenAICompatEmbedding

Properties ​

OpenAICompatErrorResponse

Properties ​

OpenAICompatRequest

Properties ​

OpenAICompatResponse

Properties ​

OpenAICompatUsage

Properties ​

SimilarityInput

Properties ​

SimilarityParameters

Properties ​

SimilarityRequest

Properties ​

SimilarityResponse

Properties ​

SimpleToken

Properties ​

TokenizeInput

Properties ​

TokenizeRequest

Properties ​

decode

embed

health

get_model_info

metrics

similarity

tokenize

openai_embed

Schemas

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Enumerated Values

Properties

Properties

Enumerated Values

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Properties

Enumerated Values