AI Client
Overview
The AI Client package is a multi-provider AI integration system for the Token Ring ecosystem. It provides a unified interface for accessing various AI models across different providers, including chat models, image generation, embeddings, speech synthesis, speech recognition, and reranking.
Key Features
- Multi-Provider Support: Integrates with 16 AI providers including Anthropic, OpenAI, Google, Groq, Cerebras, DeepSeek, ElevenLabs, Fal, xAI, xAI Responses, OpenRouter, Perplexity, Azure, Ollama, llama.cpp, and Qwen
- Model Registry: Centralized model management with availability checking and pricing information
- Auto-Configuration: Automatic provider setup from environment variables
- Model Type Specialization: Separate registries for chat, image generation, embeddings, speech, transcription, and reranking models
- RPC API: JSON-RPC endpoints for model listing and management
- Model Filtering: Query models by requirements including cost, context length, and capabilities
- Online Status Monitoring: Real-time availability checking for models
- Feature System: Rich feature specification system supporting boolean, number, string, enum, and array types with validation
- Streaming Support: Real-time streaming responses with delta handling
- Agent Integration: Seamless integration with Token Ring agent system through services
Core Components
Model Registries
The AI Client provides specialized registries for different AI model types:
- ChatModelRegistry: Manages chat completion models with reasoning, tools, and streaming support
- ImageGenerationModelRegistry: Manages image generation models
- EmbeddingModelRegistry: Manages text embedding models
- SpeechModelRegistry: Manages text-to-speech models
- TranscriptionModelRegistry: Manages speech-to-text models
- RerankingModelRegistry: Manages text reranking models
Providers
The package supports integration with the following AI providers:
- Anthropic - Claude models with reasoning and web search capabilities
- OpenAI - GPT models, DALL-E, Whisper, TTS with multimodal support
- Azure - Azure OpenAI services for enterprise deployment
- Google - Google Generative AI models including Gemini and Imagen
- Groq - Fast inference models based on LLaMA
- Cerebras - High-performance LLaMA-based inference
- DeepSeek - DeepSeek models with advanced reasoning
- ElevenLabs - Professional text-to-speech synthesis
- Fal - Fast image generation with Fal.ai
- OpenRouter - Multi-provider router for aggregated access
- Perplexity - Perplexity models with web search integration
- xAI - xAI models (Grok) with reasoning capabilities
- xAI Responses - xAI responses API for advanced reasoning and search
- Ollama - Self-hosted models via Ollama integration
- llama.cpp - Local inference via llama.cpp API
- Qwen - Alibaba Qwen models with Chinese language support
API Reference
Model Requirements
ChatModelRequirements
interface ChatModelRequirements {
nameLike?: string;
contextLength?: number;
maxCompletionTokens?: number;
research?: number;
reasoningText?: number;
intelligence?: number;
speed?: number;
webSearch?: number;
}
EmbeddingModelRequirements
interface EmbeddingModelRequirements {
nameLike?: string;
contextLength?: number;
}
ImageModelRequirements
interface ImageModelRequirements {
nameLike?: string;
contextLength?: number;
}
RerankingModelRequirements
interface RerankingModelRequirements {
nameLike?: string;
}
SpeechModelRequirements
interface SpeechModelRequirements {
nameLike?: string;
}
TranscriptionModelRequirements
interface TranscriptionModelRequirements {
nameLike?: string;
}
Registry Methods
ChatModelRegistry
class ChatModelRegistry extends ModelTypeRegistry<ChatModelSpec, AIChatClient, ChatModelRequirements>
Methods:
registerAllModelSpecs(specs: ChatModelSpec[]): Register multiple chat model specificationsgetModelSpecsByRequirements(requirements: ChatModelRequirements): Get models matching specific requirementsgetModelsByProvider(): Get all registered models grouped by providergetAllModelsWithOnlineStatus(): Get all models with their online statusgetClient(name: string): Get a client instance matching the model namegetCheapestModelByRequirements(requirements: string, estimatedContextLength = 10000): Find the cheapest model matching requirements
Model Specification:
Each model specification includes:
modelId: Unique identifier for the modelproviderDisplayName: Display name of the providerimpl: Model implementation interfacecostPerMillionInputTokens: Cost per million input tokens (default: 600)costPerMillionOutputTokens: Cost per million output tokens (default: 600)costPerMillionCachedInputTokens: Cost per million cached input tokens (optional)costPerMillionReasoningTokens: Cost per million reasoning tokens (optional)contextLength: Maximum context length in tokensisAvailable(): Async function to check model availabilityisHot(): Async function to check if model is warmed upmangleRequest(): Optional function to modify the request before sendingfeatures: Optional feature specifications for query parametersspeed: Speed capability score (0-infinity)research: Research ability (0-infinity)reasoningText: Reasoning capability score (0-infinity)tools: Tools capability score (0-infinity)intelligence: Intelligence capability score (0-infinity)maxCompletionTokens: Maximum output tokens (optional)
Example:
const registry = app.requireService(ChatModelRegistry);
const models = registry.getModelSpecsByRequirements({
nameLike: "gpt-4.1"
});
const client = await registry.getClient("OpenAI:gpt-5");
const [text, response] = await client.textChat(
{
messages: [
{ role: "user", content: "Hello" }
]
},
agent
);
Other Registries
The other registries (Embedding, ImageGeneration, Speech, Transcription, Reranking) extend the base ModelTypeRegistry and provide similar model management functionality.
Configuration
Plugin Configuration Schema
const pluginConfigSchema = z.object({
ai: AIClientConfigSchema
});
interface AIClientConfigSchema {
autoConfigure?: boolean;
providers?: Record<string, AIProviderConfig>;
}
Auto-Configuration
When autoConfigure is set to true or providers is not specified, the system automatically loads providers from environment variables:
ANTHROPIC_API_KEY- Anthropic API keyOPENAI_API_KEY- OpenAI API keyGOOGLE_GENERATIVE_AI_API_KEY- Google Generative AI keyGROQ_API_KEY- Groq API keyCEREBRAS_API_KEY- Cerebras API keyDEEPSEEK_API_KEY- DeepSeek API keyELEVENLABS_API_KEY- ElevenLabs API keyFAL_API_KEY- Fal.ai API keyXAI_API_KEY- xAI API keyXAI_RESPONSES_API_KEY- xAI Responses API keyOPENROUTER_API_KEY- OpenRouter API keyPERPLEXITY_API_KEY- Perplexity API keyAZURE_API_KEY- Azure OpenAI API keyAZURE_API_ENDPOINT- Azure OpenAI endpoint (optional)OLLAMA_BASE_URL- Ollama base URL (default: http://127.0.0.1:11434/v1)OLLAMA_API_KEY- Ollama API key (optional)LLAMA_BASE_URL- llama.cpp base URL (default: http://127.0.0.1:8080/v1)LLAMA_API_KEY- llama.cpp API key (optional)DASHSCOPE_API_KEY- Alibaba Qwen API keyZAI_API_KEY- Zhipu AI API key
Provider Configuration
Each provider has specific configuration requirements:
Anthropic
{
provider: "anthropic",
apiKey: string
}
OpenAI
{
provider: "openai",
apiKey: string
}
Azure
{
provider: "azure",
apiKey: string,
baseURL: string
}
OpenAI-Compatible Providers
{
provider: "openaiCompatible",
apiKey?: string,
baseURL: string,
defaultContextLength?: number
}
Ollama
{
provider: "ollama",
baseURL?: string,
apiKey?: string
}
llama.cpp
{
provider: "llama",
baseURL?: string,
apiKey?: string
}
RPC Endpoints
The AI Client provides the following JSON-RPC endpoints:
Model Listing Endpoints
| Method | Input | Result |
|---|---|---|
listChatModels | {} | { models: Record<string, {status, available, hot, modelSpec}> } |
listChatModelsByProvider | {} | { modelsByProvider: Record<string, Record<string, {status, available, hot}> } |
listEmbeddingModels | {} | { models: Record<string, {status, available, hot, modelSpec}> } |
listEmbeddingModelsByProvider | {} | { modelsByProvider: Record<string, Record<string, {status, available, hot}> } |
listImageGenerationModels | {} | { models: Record<string, {status, available, hot, modelSpec}> } |
listImageGenerationModelsByProvider | {} | { modelsByProvider: Record<string, Record<string, {status, available, hot}> } |
listSpeechModels | {} | { models: Record<string, {status, available, hot, modelSpec}> } |
listSpeechModelsByProvider | {} | { modelsByProvider: Record<string, Record<string, {status, available, hot}> } |
listTranscriptionModels | {} | { models: Record<string, {status, available, hot, modelSpec}> } |
listTranscriptionModelsByProvider | {} | { modelsByProvider: Record<string, Record<string, {status, available, hot}> } |
listRerankingModels | {} | { models: Record<string, {status, available, hot, modelSpec}> } |
listRerankingModelsByProvider | {} | { modelsByProvider: Record<string, Record<string, {status, available, hot}> } |
Response Structure:
{
models: {
"provider:model": {
status: "online" | "cold" | "offline",
available: boolean,
hot: boolean,
modelSpec: ModelSpec
}
}
}
ByProvider Response Structure:
{
modelsByProvider: {
"Provider Name": {
"provider:model": {
status: "online" | "cold" | "offline",
available: boolean,
hot: boolean
}
}
}
}
Example Usage:
// Call via RPC
const result = await rpcService.call("listChatModels", {});
console.log(result.models);
Usage Examples
Basic Setup with Auto-Configuration
import { TokenRingApp } from "@tokenring-ai/app";
const app = new TokenRingApp({
plugins: {
"@tokenring-ai/ai-client": {
ai: {
autoConfigure: true
}
}
}
});
Manual Provider Configuration
const app = new TokenRingApp({
plugins: {
"@tokenring-ai/ai-client": {
ai: {
providers: {
OpenAI: {
provider: "openai",
apiKey: "sk-..."
},
Anthropic: {
provider: "anthropic",
apiKey: "sk-ant-..."
}
}
}
}
}
});
Finding Cheapest Model
const chatRegistry = app.requireService(ChatModelRegistry);
// Find cheapest model with reasoning and intelligence >= 5
const modelId = chatRegistry.getCheapestModelByRequirements(
"reasoningText>=5,intelligence>=5",
80000 // Estimated context length
);
console.log(`Recommended model: ${modelId}`);
Accessing Model Specs
const chatRegistry = app.requireService(ChatModelRegistry);
const models = await chatRegistry.getAllModelsWithOnlineStatus();
for (const [modelId, spec] of Object.entries(models)) {
console.log(`Model: ${modelId}`);
console.log(` Status: ${spec.status}`);
console.log(` Available: ${spec.available}`);
console.log(` Hot: ${spec.hot}`);
console.log(` Context Length: ${spec.modelSpec.contextLength}`);
console.log(` Cost per Million Input: $${spec.modelSpec.costPerMillionInputTokens}`);
console.log(` Cost per Million Output: $${spec.modelSpec.costPerMillionOutputTokens}`);
}
Using RPC to List Models
// Get all chat models via RPC
const result = await rpcService.call("listChatModels", {});
const models = result.models;
for (const [modelId, info] of Object.entries(models)) {
if (info.available) {
console.log(`Available: ${modelId}`);
}
}
Provider-Specific Model Listing
// Get models by provider
const result = await rpcService.call("listChatModelsByProvider", {});
const modelsByProvider = result.modelsByProvider;
for (const [provider, providerModels] of Object.entries(modelsByProvider)) {
console.log(`Provider: ${provider}`);
for (const [modelId, info] of Object.entries(providerModels)) {
console.log(` ${modelId}: ${info.status}`);
}
}
Chat Model Usage
const chatRegistry = app.requireService(ChatModelRegistry);
const client = await chatRegistry.getClient("OpenAI:gpt-5");
// Text chat
const [text, response] = await client.textChat(
{
messages: [
{ role: "user", content: "Hello" }
]
},
agent
);
// Streaming chat
const streamResponse = await client.streamChat(
{
messages: [
{ role: "user", content: "Tell me a story" }
]
},
agent
);
// Structured output
const [result, structuredResponse] = await client.generateObject(
{
messages: [
{ role: "user", content: "Extract the following information" }
],
schema: z.object({
name: z.string(),
age: z.number(),
email: z.string().email()
})
},
agent
);
// Rerank documents
const rankings = await client.rerank({
query: "What is machine learning?",
documents: [
"Machine learning is a subset of AI...",
"AI is a broad field...",
"Deep learning is a type of ML..."
],
topN: 3
}, agent);
// Calculate cost
const cost = client.calculateCost({
inputTokens: 100,
outputTokens: 50
});
// Calculate timing
const timing = client.calculateTiming(1500, {
inputTokens: 100,
outputTokens: 50
});
Embedding Model Usage
const embeddingRegistry = app.requireService(EmbeddingModelRegistry);
const client = embeddingRegistry.getClient("OpenAI:text-embedding-3-small");
const embeddings = await client.getEmbeddings([
"Hello world",
"Machine learning is great"
]);
Image Generation Model Usage
const imageRegistry = app.requireService(ImageGenerationModelRegistry);
const client = imageRegistry.getClient("OpenAI:dall-e-3");
const [image, result] = await client.generateImage({
prompt: "A beautiful sunset over the ocean",
size: "1024x1024",
quality: "high"
}, agent);
Speech Model Usage
const speechRegistry = app.requireService(SpeechModelRegistry);
const client = speechRegistry.getClient("ElevenLabs:text");
const [audio, result] = await client.generateSpeech({
text: "Hello, world!",
voice: "alloy",
speed: 1.0
}, agent);
Transcription Model Usage
const transcriptionRegistry = app.requireService(TranscriptionModelRegistry);
const client = transcriptionRegistry.getClient("OpenAI:whisper-1");
const [text, result] = await client.transcribe({
audio: audioFile,
language: "en",
prompt: "Transcribe this audio"
}, agent);
Using Feature Queries
// Get model with specific configuration
const client = await chatRegistry.getClient("OpenAI:gpt-5?websearch=1");
// Use the client
const [result, response] = await client.textChat(
{
messages: [
{ role: "user", content: "Search the web for the latest AI news" }
]
},
agent
);
Using Feature System
// Get model with multiple features
const client = await chatRegistry.getClient("OpenAI:gpt-5?websearch=1&reasoningEffort=high&serviceTier=priority");
// Set features on client instance
client.setFeatures({
websearch: true,
reasoningEffort: "high",
serviceTier: "priority"
});
// Get current features
const features = client.getFeatures();
Integration with Agent System
The AI Client integrates seamlessly with the Token Ring agent system:
import { TokenRingAgent } from "@tokenring-ai/agent";
const agent = new TokenRingAgent({
name: "Research Agent",
tools: [/* ... */],
config: {
chatModel: "gpt-4.1", // Model name from registry
}
});
// Agent automatically uses the configured chat model
const response = await agent.chat("Analyze this data...");
Model Capabilities
Chat Models
Chat models support various capabilities:
- Reasoning: Text-based reasoning capabilities (0-8 scale)
- Intelligence: Overall intelligence score (0-8 scale)
- Speed: Response speed (0-5 scale)
- Tools: Tool use capability (0-7 scale)
- Context Length: Maximum tokens in context
- Cost: Input/output token costs
- Streaming: Real-time streaming responses
- Structured Output: JSON schema-based output generation
- Reranking: Document relevance ranking
Image Generation Models
Image models support:
- Multiple Quality Levels: High, medium, low quality variants
- Cost Calculation: Dynamic pricing based on image size
- Variant Models: Different quality options for same base model
Speech Models
Speech models include:
- TTS Models: Text-to-speech (e.g., tts-1, tts-1-hd)
- Transcription Models: Speech-to-text (e.g., whisper-1)
- Character/Minute Pricing: Cost based on usage
Embedding Models
Embedding models provide:
- Text Vectorization: Convert text to embeddings
- Context Length: Maximum input token length
- Semantic Search: Support for similarity-based search
Best Practices
- Use Appropriate Models: Choose models based on your specific use case (reasoning, speed, cost)
- Monitor Costs: Use
getCheapestModelByRequirementsto optimize costs - Check Availability: Always check
availablestatus before using models - Auto-Configure: Use environment variables for easy deployment
- Provider Diversity: Configure multiple providers for redundancy
- Cache Model Lists: Model lists are cached; refresh when needed
- Use Feature Queries: Leverage query parameters for flexible model selection
- Reuse Clients: Create client instances once and reuse for multiple requests
- Check Model Hot Status: Use
isHot()to determine if a model needs to be warmed up - Calculate Costs: Use
calculateCost()to estimate expenses before making requests - Use Streaming for Long Responses: Use
streamChat()for better user experience with long responses - Set Features: Use
setFeatures()on client instances to enable specific features - Monitor Model Status: Check model status before expensive operations to avoid failed requests
Testing
The package includes unit tests using Vitest:
# Run all tests
bun run test
# Run tests in watch mode
bun run test:watch
# Run tests with coverage
bun run test:coverage
Dependencies
Runtime Dependencies
- @tokenring-ai/app: Core application framework with service and plugin system
- @tokenring-ai/agent: Agent framework for tool execution
- @tokenring-ai/utility: Shared utilities and registry functionality
- @tokenring-ai/rpc: RPC service for remote procedure calls
- zod: Runtime schema validation
- ai: Vercel AI SDK for streaming and client functionality
- axios: HTTP client for API requests
AI SDK Dependencies
- @ai-sdk/anthropic: Anthropic AI SDK for Claude models
- @ai-sdk/azure: Azure OpenAI SDK for Azure hosting
- @ai-sdk/cerebras: Cerebras AI SDK for LLaMA models
- @ai-sdk/deepseek: DeepSeek AI SDK for DeepSeek models
- @ai-sdk/elevenlabs: ElevenLabs SDK for speech synthesis
- @ai-sdk/fal: Fal AI SDK for image generation
- @ai-sdk/google: Google Generative AI SDK for Gemini models
- @ai-sdk/groq: Groq AI SDK for LLaMA inference
- @ai-sdk/openai: OpenAI AI SDK for GPT, Whisper, TTS models
- @ai-sdk/openai-compatible: OpenAI-compatible API SDK
- @ai-sdk/perplexity: Perplexity AI SDK for Perplexity models
- @ai-sdk/xai: xAI SDK for Grok models
- @ai-sdk/provider: Core AI SDK provider interface
- @openrouter/ai-sdk-provider: OpenRouter SDK for provider aggregation
- ollama-ai-provider-v2: Ollama SDK for local model hosting
Development Dependencies
- @vitest/coverage-v8: Code coverage
- typescript: TypeScript compiler
- vitest: Unit testing framework
Development
The package follows the Token Ring plugin pattern:
- Install Phase: Registers six service instances (registries) and optionally registers RPC endpoint
- Start Phase: Initializes providers and registers models through the provider initialization chain
The package does not include streaming client implementations directly. Streaming clients are provided by the individual provider SDKs and accessed through the registries.
License
MIT License - see LICENSE file for details.
Related Components
- @tokenring-ai/agent: Agent system for using AI models
- @tokenring-ai/app: Application framework
- @tokenring-ai/rpc: RPC service integration