Skip to content

DS1 Text Embeddings Inference

Scroll down for code samples, example requests and responses. Select a language for code samples from the tabs above or the mobile navigation menu.

Text Embedding Webserver

License: Proprietary - Takara.AI

Text Embeddings Inference

decode

POST /decode

Decode input ids

Body parameter

json
{
  "ids": [
    0
  ],
  "skip_special_tokens": true
}

Parameters

NameInTypeRequiredDescription
bodybodyDecodeRequesttruenone

Example responses

200 Response

json
[
  "test"
]

400 Response

json
{
  "error": "Batch is empty",
  "error_type": "empty"
}

413 Response

json
{
  "error": "Batch size error",
  "error_type": "validation"
}

422 Response

json
{
  "error": "Tokenization error",
  "error_type": "tokenizer"
}

Responses

StatusMeaningDescriptionSchema
200OKDecoded idsDecodeResponse
400Bad RequestBatch is emptyErrorResponse
413Payload Too LargeBatch size errorErrorResponse
422Unprocessable EntityTokenization errorErrorResponse

embed

POST /embed

Get Embeddings. Returns a 424 status code if the model is not an embedding model.

Body parameter

json
{
  "dimensions": 0,
  "inputs": "string",
  "normalize": true,
  "prompt_name": "string",
  "truncate": false,
  "truncation_direction": "Left"
}

Parameters

NameInTypeRequiredDescription
bodybodyEmbedRequesttruenone

Example responses

200 Response

json
[
  [
    0,
    1,
    2
  ]
]

400 Response

json
{
  "error": "Batch is empty",
  "error_type": "empty"
}

413 Response

json
{
  "error": "Batch size error",
  "error_type": "validation"
}

422 Response

json
{
  "error": "Tokenization error",
  "error_type": "tokenizer"
}

424 Response

json
{
  "error": "Inference failed",
  "error_type": "backend"
}

429 Response

json
{
  "error": "Model is overloaded",
  "error_type": "overloaded"
}

Responses

StatusMeaningDescriptionSchema
200OKEmbeddingsEmbedResponse
400Bad RequestBatch is emptyErrorResponse
413Payload Too LargeBatch size errorErrorResponse
422Unprocessable EntityTokenization errorErrorResponse
424Failed DependencyEmbedding ErrorErrorResponse
429Too Many RequestsModel is overloadedErrorResponse

health

GET /health

Health check method

Example responses

503 Response

json
{
  "error": "unhealthy",
  "error_type": "unhealthy"
}

Responses

StatusMeaningDescriptionSchema
200OKEverything is working fineNone
503Service UnavailableText embeddings Inference is downErrorResponse

get_model_info

GET /info

Text Embeddings Inference endpoint info

Example responses

200 Response

json
{
  "auto_truncate": true,
  "docker_label": "null",
  "max_batch_requests": 0,
  "max_batch_tokens": "2048",
  "max_client_batch_size": "32",
  "max_concurrent_requests": "128",
  "max_input_length": "512",
  "model_dtype": "float16",
  "model_id": "thenlper/gte-base",
  "model_sha": "fca14538aa9956a46526bd1d0d11d69e19b5a101",
  "model_type": {
    "classifier": {
      "id2label": {
        "0": "LABEL"
      },
      "label2id": {
        "LABEL": 0
      }
    }
  },
  "sha": "null",
  "tokenization_workers": "4",
  "version": "0.5.0"
}

Responses

StatusMeaningDescriptionSchema
200OKServed model infoInfo

metrics

GET /metrics

Prometheus metrics scrape endpoint

Example responses

200 Response

"string"

Responses

StatusMeaningDescriptionSchema
200OKPrometheus Metricsstring

similarity

POST /similarity

Get Sentence Similarity. Returns a 424 status code if the model is not an embedding model.

Body parameter

json
{
  "inputs": {
    "sentences": [
      "What is Machine Learning?"
    ],
    "source_sentence": "What is Deep Learning?"
  },
  "parameters": {
    "prompt_name": "string",
    "truncate": false,
    "truncation_direction": "Left"
  }
}

Parameters

NameInTypeRequiredDescription
bodybodySimilarityRequesttruenone

Example responses

200 Response

json
[
  0,
  1,
  0.5
]

400 Response

json
{
  "error": "Batch is empty",
  "error_type": "empty"
}

413 Response

json
{
  "error": "Batch size error",
  "error_type": "validation"
}

422 Response

json
{
  "error": "Tokenization error",
  "error_type": "tokenizer"
}

424 Response

json
{
  "error": "Inference failed",
  "error_type": "backend"
}

429 Response

json
{
  "error": "Model is overloaded",
  "error_type": "overloaded"
}

Responses

StatusMeaningDescriptionSchema
200OKSentence SimilaritySimilarityResponse
400Bad RequestBatch is emptyErrorResponse
413Payload Too LargeBatch size errorErrorResponse
422Unprocessable EntityTokenization errorErrorResponse
424Failed DependencyEmbedding ErrorErrorResponse
429Too Many RequestsModel is overloadedErrorResponse

tokenize

POST /tokenize

Tokenize inputs

Body parameter

json
{
  "add_special_tokens": true,
  "inputs": "string",
  "prompt_name": "string"
}

Parameters

NameInTypeRequiredDescription
bodybodyTokenizeRequesttruenone

Example responses

200 Response

json
[
  [
    {
      "id": 0,
      "special": false,
      "start": 0,
      "stop": 2,
      "text": "test"
    }
  ]
]

400 Response

json
{
  "error": "Batch is empty",
  "error_type": "empty"
}

413 Response

json
{
  "error": "Batch size error",
  "error_type": "validation"
}

422 Response

json
{
  "error": "Tokenization error",
  "error_type": "tokenizer"
}

Responses

StatusMeaningDescriptionSchema
200OKTokenized idsTokenizeResponse
400Bad RequestBatch is emptyErrorResponse
413Payload Too LargeBatch size errorErrorResponse
422Unprocessable EntityTokenization errorErrorResponse

openai_embed

POST /v1/embeddings

OpenAI compatible route. Returns a 424 status code if the model is not an embedding model.

Body parameter

json
{
  "dimensions": 0,
  "encoding_format": "float",
  "input": "string",
  "model": "string",
  "user": "string"
}

Parameters

NameInTypeRequiredDescription
bodybodyOpenAICompatRequesttruenone

Example responses

200 Response

json
{
  "data": [
    {
      "embedding": [
        0.1
      ],
      "index": "0",
      "object": "embedding"
    }
  ],
  "model": "thenlper/gte-base",
  "object": "list",
  "usage": {
    "prompt_tokens": "512",
    "total_tokens": "512"
  }
}

400 Response

json
{
  "message": "Batch is empty",
  "type": "empty"
}

413 Response

json
{
  "message": "Batch size error",
  "type": "validation"
}

422 Response

json
{
  "message": "Tokenization error",
  "type": "tokenizer"
}

424 Response

json
{
  "message": "Inference failed",
  "type": "backend"
}

429 Response

json
{
  "message": "Model is overloaded",
  "type": "overloaded"
}

Responses

StatusMeaningDescriptionSchema
200OKEmbeddingsOpenAICompatResponse
400Bad RequestBatch is emptyOpenAICompatErrorResponse
413Payload Too LargeBatch size errorOpenAICompatErrorResponse
422Unprocessable EntityTokenization errorOpenAICompatErrorResponse
424Failed DependencyEmbedding ErrorOpenAICompatErrorResponse
429Too Many RequestsModel is overloadedOpenAICompatErrorResponse

Schemas

ClassifierModel

json
{
  "id2label": {
    "0": "LABEL"
  },
  "label2id": {
    "LABEL": 0
  }
}

Properties

NameTypeRequiredRestrictionsDescription
id2labelobjecttruenonenone
» additionalPropertiesstringfalsenonenone
label2idobjecttruenonenone
» additionalPropertiesintegerfalsenonenone

DecodeRequest

json
{
  "ids": [
    0
  ],
  "skip_special_tokens": true
}

Properties

NameTypeRequiredRestrictionsDescription
idsInputIdstruenonenone
skip_special_tokensbooleanfalsenoneWhether to skip special tokens (defaults to true if not provided)

DecodeResponse

json
[
  "test"
]

Properties

None

EmbedRequest

json
{
  "dimensions": 0,
  "inputs": "string",
  "normalize": true,
  "prompt_name": "string",
  "truncate": false,
  "truncation_direction": "Left"
}

Properties

NameTypeRequiredRestrictionsDescription
dimensionsinteger¦nullfalsenoneThe number of dimensions that the output embeddings should have. If not set, the original
shape of the representation will be returned instead.
inputsInputtruenonenone
normalizebooleanfalsenoneWhether to normalize embeddings (defaults to true if not provided)
prompt_namestring¦nullfalsenoneThe name of the prompt that should be used by for encoding. If not set, no prompt
will be applied.

Must be a key in the sentence-transformers configuration prompts dictionary.

For example if prompt_name is "query" and the prompts is {"query": "query: ", ...},
then the sentence "What is the capital of France?" will be encoded as
"query: What is the capital of France?" because the prompt text will be prepended before
any text to encode.
truncateboolean¦nullfalsenonenone
truncation_directionTruncationDirectionfalsenonenone

EmbedResponse

json
[
  [
    0,
    1,
    2
  ]
]

Properties

None

Embedding

json
[
  0.1
]

Properties

oneOf

NameTypeRequiredRestrictionsDescription
anonymous[number]falsenonenone

xor

NameTypeRequiredRestrictionsDescription
anonymousstringfalsenonenone

EmbeddingModel

json
{
  "pooling": "cls"
}

Properties

NameTypeRequiredRestrictionsDescription
poolingstringtruenonenone

EncodingFormat

json
"float"

Properties

NameTypeRequiredRestrictionsDescription
anonymousstringfalsenonenone

Enumerated Values

PropertyValue
anonymousfloat
anonymousbase64

ErrorResponse

json
{
  "error": "string",
  "error_type": "Unhealthy"
}

Properties

NameTypeRequiredRestrictionsDescription
errorstringtruenonenone
error_typeErrorTypetruenonenone

ErrorType

json
"Unhealthy"

Properties

NameTypeRequiredRestrictionsDescription
anonymousstringfalsenonenone

Enumerated Values

PropertyValue
anonymousUnhealthy
anonymousBackend
anonymousOverloaded
anonymousValidation
anonymousTokenizer
anonymousEmpty

Info

json
{
  "auto_truncate": true,
  "docker_label": "null",
  "max_batch_requests": 0,
  "max_batch_tokens": "2048",
  "max_client_batch_size": "32",
  "max_concurrent_requests": "128",
  "max_input_length": "512",
  "model_dtype": "float16",
  "model_id": "thenlper/gte-base",
  "model_sha": "fca14538aa9956a46526bd1d0d11d69e19b5a101",
  "model_type": {
    "classifier": {
      "id2label": {
        "0": "LABEL"
      },
      "label2id": {
        "LABEL": 0
      }
    }
  },
  "sha": "null",
  "tokenization_workers": "4",
  "version": "0.5.0"
}

Properties

NameTypeRequiredRestrictionsDescription
auto_truncatebooleantruenonenone
docker_labelstring¦nullfalsenonenone
max_batch_requestsinteger¦nullfalsenonenone
max_batch_tokensintegertruenonenone
max_client_batch_sizeintegertruenonenone
max_concurrent_requestsintegertruenoneRouter Parameters
max_input_lengthintegertruenonenone
model_dtypestringtruenonenone
model_idstringtruenoneModel info
model_shastring¦nullfalsenonenone
model_typeModelTypetruenonenone
shastring¦nullfalsenonenone
tokenization_workersintegertruenonenone
versionstringtruenoneRouter Info

Input

json
"string"

Properties

oneOf

NameTypeRequiredRestrictionsDescription
anonymousInputTypefalsenonenone

xor

NameTypeRequiredRestrictionsDescription
anonymous[InputType]falsenonenone

InputIds

json
[
  0
]

Properties

oneOf

NameTypeRequiredRestrictionsDescription
anonymous[integer]falsenonenone

xor

NameTypeRequiredRestrictionsDescription
anonymous[array]falsenonenone

InputType

json
"string"

Properties

oneOf

NameTypeRequiredRestrictionsDescription
anonymousstringfalsenonenone

xor

NameTypeRequiredRestrictionsDescription
anonymous[integer]falsenonenone

ModelType

json
{
  "classifier": {
    "id2label": {
      "0": "LABEL"
    },
    "label2id": {
      "LABEL": 0
    }
  }
}

Properties

oneOf

NameTypeRequiredRestrictionsDescription
anonymousobjectfalsenonenone
» classifierClassifierModeltruenonenone

xor

NameTypeRequiredRestrictionsDescription
anonymousobjectfalsenonenone
» embeddingEmbeddingModeltruenonenone

xor

NameTypeRequiredRestrictionsDescription
anonymousobjectfalsenonenone
» rerankerClassifierModeltruenonenone

OpenAICompatEmbedding

json
{
  "embedding": [
    0.1
  ],
  "index": "0",
  "object": "embedding"
}

Properties

NameTypeRequiredRestrictionsDescription
embeddingEmbeddingtruenonenone
indexintegertruenonenone
objectstringtruenonenone

OpenAICompatErrorResponse

json
{
  "code": 0,
  "error_type": "Unhealthy",
  "message": "string"
}

Properties

NameTypeRequiredRestrictionsDescription
codeinteger(int32)truenonenone
error_typeErrorTypetruenonenone
messagestringtruenonenone

OpenAICompatRequest

json
{
  "dimensions": 0,
  "encoding_format": "float",
  "input": "string",
  "model": "string",
  "user": "string"
}

Properties

NameTypeRequiredRestrictionsDescription
dimensionsinteger¦nullfalsenonenone
encoding_formatEncodingFormatfalsenonenone
inputInputtruenonenone
modelstring¦nullfalsenonenone
userstring¦nullfalsenonenone

OpenAICompatResponse

json
{
  "data": [
    {
      "embedding": [
        0.1
      ],
      "index": "0",
      "object": "embedding"
    }
  ],
  "model": "thenlper/gte-base",
  "object": "list",
  "usage": {
    "prompt_tokens": "512",
    "total_tokens": "512"
  }
}

Properties

NameTypeRequiredRestrictionsDescription
data[OpenAICompatEmbedding]truenonenone
modelstringtruenonenone
objectstringtruenonenone
usageOpenAICompatUsagetruenonenone

OpenAICompatUsage

json
{
  "prompt_tokens": "512",
  "total_tokens": "512"
}

Properties

NameTypeRequiredRestrictionsDescription
prompt_tokensintegertruenonenone
total_tokensintegertruenonenone

SimilarityInput

json
{
  "sentences": [
    "What is Machine Learning?"
  ],
  "source_sentence": "What is Deep Learning?"
}

Properties

NameTypeRequiredRestrictionsDescription
sentences[string]truenoneA list of strings which will be compared against the source_sentence.
source_sentencestringtruenoneThe string that you wish to compare the other strings with. This can be a phrase, sentence,
or longer passage, depending on the model being used.

SimilarityParameters

json
{
  "prompt_name": "string",
  "truncate": false,
  "truncation_direction": "Left"
}

Properties

NameTypeRequiredRestrictionsDescription
prompt_namestring¦nullfalsenoneThe name of the prompt that should be used by for encoding. If not set, no prompt
will be applied.

Must be a key in the sentence-transformers configuration prompts dictionary.

For example if prompt_name is "query" and the prompts is {"query": "query: ", ...},
then the sentence "What is the capital of France?" will be encoded as
"query: What is the capital of France?" because the prompt text will be prepended before
any text to encode.
truncateboolean¦nullfalsenonenone
truncation_directionTruncationDirectionfalsenonenone

SimilarityRequest

json
{
  "inputs": {
    "sentences": [
      "What is Machine Learning?"
    ],
    "source_sentence": "What is Deep Learning?"
  },
  "parameters": {
    "prompt_name": "string",
    "truncate": false,
    "truncation_direction": "Left"
  }
}

Properties

NameTypeRequiredRestrictionsDescription
inputsSimilarityInputtruenonenone
parametersSimilarityParameters¦nullfalsenonenone

SimilarityResponse

json
[
  0,
  1,
  0.5
]

Properties

None

SimpleToken

json
{
  "id": 0,
  "special": false,
  "start": 0,
  "stop": 2,
  "text": "test"
}

Properties

NameTypeRequiredRestrictionsDescription
idinteger(int32)truenonenone
specialbooleantruenonenone
startinteger¦nullfalsenonenone
stopinteger¦nullfalsenonenone
textstringtruenonenone

TokenizeInput

json
"string"

Properties

oneOf

NameTypeRequiredRestrictionsDescription
anonymousstringfalsenonenone

xor

NameTypeRequiredRestrictionsDescription
anonymous[string]falsenonenone

TokenizeRequest

json
{
  "add_special_tokens": true,
  "inputs": "string",
  "prompt_name": "string"
}

Properties

NameTypeRequiredRestrictionsDescription
add_special_tokensbooleanfalsenoneWhether to add special tokens (defaults to true if not provided)
inputsTokenizeInputtruenonenone
prompt_namestring¦nullfalsenoneThe name of the prompt that should be used by for encoding. If not set, no prompt
will be applied.

Must be a key in the sentence-transformers configuration prompts dictionary.

For example if prompt_name is "query" and the prompts is {"query": "query: ", ...},
then the sentence "What is the capital of France?" will be encoded as
"query: What is the capital of France?" because the prompt text will be prepended before
any text to encode.

TokenizeResponse

json
[
  [
    {
      "id": 0,
      "special": false,
      "start": 0,
      "stop": 2,
      "text": "test"
    }
  ]
]

Properties

None

TruncationDirection

json
"Left"

Properties

NameTypeRequiredRestrictionsDescription
anonymousstringfalsenonenone

Enumerated Values

PropertyValue
anonymousLeft
anonymousRight