Skip to content

SageMaker Notebook: DS1 Model Deployment Guide

Overview

This Jupyter notebook provides a comprehensive guide for deploying DS1 model packages from the AWS Marketplace using Amazon SageMaker. It demonstrates two primary deployment methods for machine learning inference workloads.

Purpose

The notebook walks users through the complete process of subscribing to, configuring, and deploying DS1 embedding models on AWS infrastructure, with support for both on-demand and batch processing scenarios.

Major Topics Covered

1. Deployment Methods

Real-Time Inference Endpoints

  • Persistent, fully managed APIs for on-demand inference
  • Handles individual requests with low latency
  • Suitable for production applications requiring immediate responses

Batch Transform Jobs

  • Finite execution runs for processing large datasets
  • Outputs predictions to files in bulk
  • Cost-effective for processing data at scale

2. Prerequisites and Setup

  • AWS Marketplace subscription process for DS1 model packages
  • AWS CLI SSO configuration for secure authentication
  • SageMaker quota verification for target regions
  • Instance type selection and compatibility checking

3. Model Configuration

  • Model ID specification (DS1-EN-V1)
  • Instance type selection for different deployment methods
  • Compute-optimised instances (ml.c5 family) recommended as DS1 doesn't require GPU
  • Amazon Resource Name (ARN) management across multiple AWS regions

4. AWS Integration

  • Boto3 session initialisation with SSO authentication
  • SageMaker runtime client setup
  • IAM role and permission management
  • Multi-region support configuration

5. Real-Time Endpoint Deployment

  • Model deployment to SageMaker endpoints
  • Endpoint configuration and initialisation
  • Testing with sample documents
  • Endpoint invocation and response handling
  • Endpoint lifecycle management (creation and deletion)

6. Batch Transform Jobs

  • S3 bucket setup for input/output data
  • JSONL input file format preparation
  • Each record contains document text and unique batch request ID
  • Transformer initialisation with instance configuration
  • Batch job execution and monitoring
  • Output matching with source documents using request IDs

7. Data Handling

  • Sample document preparation for testing
  • Input data formatting for both deployment methods
  • Response parsing and embedding extraction
  • Statistical analysis of embedding outputs (mean, standard deviation)

Technical Requirements

  • Python environment with boto3 and sagemaker libraries
  • AWS credentials configured via SSO or environment variables
  • Sufficient SageMaker quota in target AWS region
  • S3 bucket access for batch transform operations

Key Features

  • Support for multiple AWS regions across US, EU, APAC, and SA
  • Detailed error handling and authentication validation
  • Comprehensive output formatting and analysis
  • Cost-optimised instance recommendations
  • Production-ready deployment patterns

Use Cases

This notebook is ideal for teams looking to:

  • Deploy text embedding models for semantic search applications
  • Process large document collections in batch mode
  • Build real-time embedding services for production applications
  • Integrate ML models from AWS Marketplace into existing workflows