Skip to content

API Examples

This page demonstrates practical examples of using the DS1 embedding model through different interfaces and languages.

Exploring the API

Viewing the OpenAPI Schema

DS1 provides a complete OpenAPI specification at /api-doc/openapi.json on your deployed endpoint. You can retrieve and explore it using standard HTTP requests.

Using wget:

bash
wget https://your-sagemaker-endpoint-url/api-doc/openapi.json

Using curl:

bash
curl https://your-sagemaker-endpoint-url/api-doc/openapi.json | jq

Using Python:

python
import requests

endpoint_url = "https://your-sagemaker-endpoint-url"
schema = requests.get(f"{endpoint_url}/api-doc/openapi.json").json()

print(schema)

This schema documents all available endpoints, request formats, and response structures for DS1.

Python with Boto3 (AWS SageMaker)

The most common way to interact with DS1 is through AWS SageMaker endpoints using the Boto3 library.

Basic Usage

python
import boto3
import json

# Initialize SageMaker runtime client
sagemaker_runtime = boto3.client('sagemaker-runtime', region_name='eu-west-2')

# Your DS1 endpoint name
ENDPOINT_NAME = 'your-ds1-endpoint-name'

# Sample text to embed
texts = ["What is machine learning?", "How does deep learning work?"]

# Create request payload
payload = {
    "inputs": texts
}

# Invoke the endpoint
response = sagemaker_runtime.invoke_endpoint(
    EndpointName=ENDPOINT_NAME,
    ContentType='application/json',
    Body=json.dumps(payload)
)

# Parse the response
result = json.loads(response['Body'].read().decode())
embeddings = result if isinstance(result, list) else result.get("embeddings", [])

print(f"Generated {len(embeddings)} embeddings")
print(f"Embedding dimension: {len(embeddings[0])}")

Batch Processing

python
def embed_texts_batch(texts, batch_size=32):
    """
    Embed multiple texts in batches for optimal performance
    """
    all_embeddings = []
    
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        payload = {"inputs": batch}
        
        response = sagemaker_runtime.invoke_endpoint(
            EndpointName=ENDPOINT_NAME,
            ContentType='application/json',
            Body=json.dumps(payload)
        )
        
        result = json.loads(response['Body'].read().decode())
        embeddings = result if isinstance(result, list) else result.get("embeddings", [])
        all_embeddings.extend(embeddings)
        
        print(f"Processed batch {i//batch_size + 1}/{(len(texts)-1)//batch_size + 1}")
    
    return all_embeddings

# Example usage
documents = [
    "Document 1 content here...",
    "Document 2 content here...",
    "Document 3 content here...",
]

embeddings = embed_texts_batch(documents, batch_size=32)

OpenAI Compatible API

DS1 provides an OpenAI-compatible endpoint at POST /v1/embeddings for seamless integration with existing OpenAI clients.

Using OpenAI Python Library

python
from openai import OpenAI

# Initialize client pointing to your SageMaker endpoint
client = OpenAI(
    api_key="your-api-key",
    base_url="https://your-sagemaker-endpoint-url"
)

# Create embeddings using OpenAI-compatible interface
response = client.embeddings.create(
    model="DS1-EN-V1",
    input=["What is artificial intelligence?", "Explain machine learning"]
)

# Extract embeddings
embeddings = [item.embedding for item in response.data]
print(f"Generated {len(embeddings)} embeddings")

Direct HTTP Requests

For language-agnostic integration, you can make direct HTTP requests to the SageMaker endpoint.

cURL Example

bash
curl -X POST https://your-sagemaker-endpoint-url/embed \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": ["Hello world", "How are you?"]
  }'

Response Format

json
[
  [0.123, -0.456, 0.789, ...],
  [0.234, -0.567, 0.890, ...]
]

Error Handling

DS1 returns standard HTTP error codes with descriptive error messages:

Status CodeMeaningExample
200SuccessValid embeddings returned
400Bad RequestEmpty batch or invalid input
413Payload Too LargeBatch size exceeds limits
422Unprocessable EntityTokenization error
424Failed DependencyModel inference failed
429Too Many RequestsService overloaded

Handling Errors in Python

python
import json
from botocore.exceptions import ClientError

try:
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=ENDPOINT_NAME,
        ContentType='application/json',
        Body=json.dumps({"inputs": texts})
    )
    result = json.loads(response['Body'].read().decode())
except ClientError as e:
    error_code = e.response['Error']['Code']
    error_message = e.response['Error']['Message']
    print(f"Error {error_code}: {error_message}")
except json.JSONDecodeError:
    print("Failed to parse response")

Performance Tips

  • Batch Requests: Group multiple texts into a single request rather than calling individually
  • Connection Reuse: Keep the SageMaker client connection open across multiple requests
  • Batch Size: Test sizes between 16-32 texts per batch for optimal throughput
  • Caching: Store computed embeddings to avoid redundant API calls

For detailed API specifications, see the OpenAPI Reference.