News

SageMaker adds OpenAI-compatible APIs for self-hosted inference

Marc KarpMay 21, 2026 · about 2 months ago

Amazon SageMaker AI now supports OpenAI-compatible APIs for real-time inference endpoints, allowing developers to invoke models by simply changing the endpoint URL without custom clients or code rewrites. The feature exposes a /openai/v1 path that accepts Chat Completions requests and works with OpenAI SDK, LangChain, and Strands Agents. SageMaker routes requests based on endpoint name and supports time-limited bearer tokens, enabling multi-model hosting, agentic workflows on owned infrastructure, and deployment of fine-tuned models without application changes.

TL;DR

SageMaker AI endpoints now expose OpenAI-compatible /openai/v1 API paths that work with standard OpenAI clients and SDKs
Developers can invoke models by changing only the endpoint URL, eliminating need for custom SigV4 signing or client libraries
Bearer token authentication enables time-limited access and integration with LLM gateways and standard OpenAI tooling
Multi-model deployments via inference components allow hosting multiple models under a single interface with independent resource allocation

Why It Matters

This move reduces friction for teams running inference on owned infrastructure by eliminating the need to maintain separate API clients or custom authentication wrappers. It enables broader adoption of SageMaker for agentic workflows and multi-model deployments by making the platform compatible with the de facto standard OpenAI API contract that most AI frameworks and tools already support.

Business Impact

For operators and founders, this lowers the operational burden of self-hosted inference by removing code rewrites and custom integrations when migrating from OpenAI to SageMaker. It also enables cost optimization and compliance benefits of running models on dedicated infrastructure while maintaining compatibility with existing applications and frameworks built around OpenAI APIs.

Key Implications

SageMaker becomes a more viable drop-in replacement for OpenAI API calls, reducing vendor lock-in and enabling cost arbitrage between cloud providers
Multi-model hosting on a single endpoint with independent resource allocation simplifies infrastructure management for teams running diverse model portfolios
Bearer token support enables secure, time-limited access patterns suitable for LLM gateways and multi-tenant applications without AWS credential management overhead
Agentic workflows can now run entirely on owned infrastructure using standard frameworks like LangChain and Strands Agents without custom adapters

What to Watch

Monitor adoption patterns among teams currently using LangChain and Strands Agents to see if this accelerates migration from OpenAI to self-hosted inference. Watch for ecosystem tooling around SageMaker OpenAI compatibility, including whether other LLM gateway projects and frameworks add native SageMaker support. Track pricing and performance comparisons between SageMaker and OpenAI API to understand when the economics favor self-hosting.

AI Agents Infrastructure Generative AI AWS

Subscribe to the newsletter

The latest stories and analysis, delivered to your inbox.

Free. No spam. Unsubscribe any time.

Alibaba cuts agent token use 99% with smarter tool routing

Alibaba researchers developed SkillWeaver, a framework that reduces token consumption by over 99% when routing AI agents to the correct tools from large libraries. The system uses a three-stage process (decompose, retrieve, compose) combined with Skill-Aware Decomposition to iteratively fetch and evaluate relevant tools rather than exposing agents to entire tool catalogs. This addresses a core challenge in enterprise AI systems where agents must orchestrate multiple tools to complete complex, multi-step workflows.

by bendee983@gmail.com (Ben Dickson)2 days ago· VentureBeat AI

AI AgentsTrendingNews

Meta Launches Pocket App for AI-Generated Interactive Experiences

Meta has launched a new app called Pocket that lets users create and share interactive AI-generated experiences called 'gizmos' built from prompts. The app shares only a name with Mozilla's defunct read-it-later service Pocket, which shut down last year. The launch reflects CEO Mark Zuckerberg's stated vision of AI as the next evolution of social media, where users can build and distribute interactive AI-powered content.

by Jay Peters2 days ago· The Verge AI

AI AgentsTrendingNews

Zuckerberg: Meta's AI agents developing slower than expected

Mark Zuckerberg told Meta staff at an internal meeting that the company's AI development efforts, particularly around AI agents, are progressing slower than he had anticipated. The statement signals a recalibration of expectations around a technology area Meta has invested heavily in. The disclosure comes as the AI industry broadly grapples with the gap between near-term capabilities and longer-term ambitions.

by Lucas Ropek2 days ago· TechCrunch AI

AI AgentsTrendingNews

Z.ai launches ZCode to undercut Cursor and Claude Code

Z.ai, a Beijing-based AI lab, launched ZCode, a free desktop application designed as an agent-first development environment for its GLM-5.2 model. The tool competes directly with Cursor, Claude Code, GitHub Copilot, and Google's Antigravity in the AI coding market. ZCode's pricing undercuts competitors significantly, with plans starting at $16.20 per month, and includes features like remote control via WeChat and Feishu, reflecting the company's focus on the Chinese developer market.

by michael.nunez@venturebeat.com (Michael Nuñez)3 days ago· VentureBeat AI

TL;DR

Why It Matters

Business Impact

Key Implications

What to Watch

Subscribe to the newsletter

Related stories

Alibaba cuts agent token use 99% with smarter tool routing

Meta Launches Pocket App for AI-Generated Interactive Experiences

Zuckerberg: Meta's AI agents developing slower than expected

Z.ai launches ZCode to undercut Cursor and Claude Code