Supported AI Models

ThinkEx uses Google’s Gemini models via the Vercel AI SDK Gateway for intelligent assistance and content processing.

Primary Models

Gemini 2.5 Flash

Model ID: google/gemini-2.5-flash (default) Google’s latest fast model optimized for speed and efficiency. Characteristics:

Speed: Very fast response times
Context Window: 1M tokens
Multimodal: Text, images, video, audio
Thinking: Dynamic reasoning budget
Grounding: Google Search integration

Best For:

General chat conversations
Quick content analysis
Real-time assistance
Web search synthesis

Configuration:

const result = await streamText({
  model: gateway("google/gemini-2.5-flash"),
  temperature: 1.0,
  providerOptions: {
    google: {
      thinkingConfig: {
        includeThoughts: true,
      },
    },
  },
});

Gemini 2.5 Flash Lite

Model ID: google/gemini-2.5-flash-lite Lightweight version optimized for simple tasks. Characteristics:

Speed: Fastest response times
Context Window: 1M tokens
Cost: Most economical
Multimodal: Text, images, video, audio

Best For:

File processing and analysis
Web search queries
Simple content extraction
Background processing tasks

Usage in ThinkEx:

// Web search tool
const { text } = await generateText({
  model: google('gemini-2.5-flash-lite'),
  tools: {
    googleSearch: google.tools.googleSearch({ mode: 'MODE_UNSPECIFIED' }),
  },
  prompt: `Search for: ${query}`,
});

// File analysis
const { text } = await generateText({
  model: google("gemini-2.5-flash-lite"),
  messages: [{
    role: "user",
    content: [
      { type: "text", text: "Analyze this file..." },
      { type: "file", data: fileUrl, mediaType: "application/pdf" },
    ],
  }],
});

Gemini 3 Flash Preview

Model ID: google/gemini-3-flash-preview Next-generation Gemini model with enhanced reasoning. Characteristics:

Thinking: Explicit thinking levels (minimal, standard, deep)
Context Window: 1M+ tokens
Reasoning: Enhanced multi-step reasoning
Multimodal: Advanced vision and audio understanding

Best For:

Complex problem solving
Multi-step reasoning tasks
Advanced content analysis
Research and synthesis

Configuration:

providerOptions: {
  google: {
    thinkingConfig: {
      includeThoughts: true,
      thinkingLevel: "minimal", // "minimal" | "standard" | "deep"
    },
  },
}

Thinking Levels:

minimal: Quick reasoning for simple tasks
standard: Balanced reasoning for most tasks
deep: Extended reasoning for complex problems

Model Selection

The chat API accepts a modelId parameter:

POST /api/chat
{
  "modelId": "google/gemini-2.5-flash",
  "messages": [...],
  ...
}

Auto-Prefixing: If you provide a model ID without a provider prefix (e.g., gemini-2.5-flash), it’s automatically prefixed with google/:

// These are equivalent:
"gemini-2.5-flash" → "google/gemini-2.5-flash"

Default Model: If no modelId is specified, the default is google/gemini-2.5-flash.

Model Capabilities

Multimodal Support

All Gemini models support multiple content types: Text:

{ type: "text", text: "Analyze this content..." }

Images:

{
  type: "file",
  data: imageUrl, // or base64 data URL
  mediaType: "image/jpeg",
  filename: "photo.jpg",
}

Videos:

{
  type: "file",
  data: "https://youtube.com/watch?v=...",
  mediaType: "video/mp4",
}

PDFs:

{
  type: "file",
  data: pdfUrl,
  mediaType: "application/pdf",
  filename: "document.pdf",
}

Audio:

{
  type: "file",
  data: audioUrl,
  mediaType: "audio/mpeg",
  filename: "audio.mp3",
}

Tool Calling

All models support function calling:

tools: {
  createNote: tool({
    description: "Create a note card",
    inputSchema: z.object({
      title: z.string(),
      content: z.string(),
    }),
    execute: async ({ title, content }) => {
      // Implementation
    },
  }),
}

Grounding

Gemini models support web grounding:

providerOptions: {
  google: {
    grounding: {
      // Google Search integration
    },
  },
}

ThinkEx uses explicit webSearch tool instead of automatic grounding for better control and source attribution.

Provider Configuration

Google AI Studio

Setup:

Get API key from Google AI Studio
Add to environment:

GOOGLE_GENERATIVE_AI_API_KEY=AIza...

Rate Limits:

Free tier: 15 requests/minute
Paid tier: Higher limits based on plan

AI Gateway

Optional: Use Vercel AI Gateway for enhanced routing:

AI_GATEWAY_API_KEY=your-gateway-key

Benefits:

Automatic failover between providers
Load balancing across models
Centralized logging and monitoring
Cost optimization

Model Usage in Tools

Web Search

// src/lib/ai/tools/web-search.ts
const { text } = await generateText({
  model: google('gemini-2.5-flash-lite'),
  tools: {
    googleSearch: google.tools.googleSearch({ mode: 'MODE_UNSPECIFIED' }),
  },
  prompt: query,
});

File Processing

// src/lib/ai/tools/process-files.ts
const { text } = await generateText({
  model: google("gemini-2.5-flash-lite"),
  messages: [{
    role: "user",
    content: [
      { type: "text", text: batchPrompt },
      ...fileInfos.map(f => ({
        type: "file",
        data: f.fileUrl,
        mediaType: f.mediaType,
        filename: f.filename,
      })),
    ],
  }],
});

URL Processing

// src/lib/ai/tools/process-urls.ts
const { text } = await generateText({
  model: google("gemini-2.5-flash"),
  prompt: `Analyze content from: ${url}...`,
});

Performance Optimization

Context Caching

Long context is automatically cached:

onFinish: ({ usage }) => {
  console.log({
    cachedInputTokens: usage?.cachedInputTokens,
    inputTokens: usage?.inputTokens,
  });
}

Message Pruning

Reduce token usage by pruning old messages:

const prunedMessages = pruneMessages({
  messages: convertedMessages,
  reasoning: "before-last-message",
  toolCalls: "before-last-5-messages",
  emptyMessages: "remove",
});

Streaming

Use streaming for better perceived performance:

const result = streamText({
  model,
  messages,
  experimental_transform: smoothStream({
    chunking: "word",
    delayInMs: 15,
  }),
});

Token Usage Tracking

Per-Step Tracking

onStepFinish: (result) => {
  const { usage, finishReason } = result;
  console.log({
    stepType: result.stepType,
    inputTokens: usage?.inputTokens,
    outputTokens: usage?.outputTokens,
    reasoningTokens: usage?.reasoningTokens,
  });
}

Final Usage

onFinish: ({ usage, finishReason }) => {
  console.log({
    totalTokens: usage?.totalTokens,
    cachedInputTokens: usage?.cachedInputTokens,
    finishReason,
  });
}

Experimental Features

Claude Support (Experimental)

ThinkEx has experimental support for Anthropic’s Claude:

// Special mapping: Claude Sonnet 4.5 → Gemini 3 Flash Preview
if (modelId === "anthropic/claude-sonnet-4.5") {
  modelId = "google/gemini-3-flash-preview";
}

Claude support is experimental and not fully tested. Stick with Gemini models for production use.

Cost Optimization

Model Selection Strategy

Simple tasks → gemini-2.5-flash-lite
- File analysis
- Web search
- Content extraction
General chat → gemini-2.5-flash
- User conversations
- Content generation
- Tool orchestration
Complex reasoning → gemini-3-flash-preview
- Multi-step problems
- Research synthesis
- Advanced analysis

Caching Strategy

PDFs: Cache OCR results after first extraction
Messages: Use context caching for long conversations
Files: Store processed results in database

Error Handling

Rate Limit Errors

try {
  const result = await streamText({ model, ... });
} catch (error) {
  if (error.status === 429) {
    // Rate limit exceeded
    // Implement exponential backoff
  }
}

Timeout Protection

const result = await streamText({
  model,
  messages,
  stopWhen: stepCountIs(25), // Prevent infinite loops
});

Workspace API

AI Tools

File Management

Supported AI Models

Primary Models

Gemini 2.5 Flash

Gemini 2.5 Flash Lite

Gemini 3 Flash Preview

Model Selection

Model Capabilities

Multimodal Support

Tool Calling

Grounding

Provider Configuration

Google AI Studio

AI Gateway

Model Usage in Tools

Web Search

File Processing

URL Processing

Performance Optimization

Context Caching

Message Pruning

Streaming

Token Usage Tracking

Per-Step Tracking

Final Usage

Experimental Features

Claude Support (Experimental)

Cost Optimization

Model Selection Strategy

Caching Strategy

Error Handling

Rate Limit Errors

Timeout Protection

Next Steps

AI Overview

AI Tools

Workspace API

AI Tools

File Management

Documentation Index

​Primary Models

​Gemini 2.5 Flash

​Gemini 2.5 Flash Lite

​Gemini 3 Flash Preview

​Model Selection

​Model Capabilities

​Multimodal Support

​Tool Calling

​Grounding

​Provider Configuration

​Google AI Studio

​AI Gateway

​Model Usage in Tools

​Web Search

​File Processing

​URL Processing

​Performance Optimization

​Context Caching

​Message Pruning

​Streaming

​Token Usage Tracking

​Per-Step Tracking

​Final Usage

​Experimental Features

​Claude Support (Experimental)

​Cost Optimization

​Model Selection Strategy

​Caching Strategy

​Error Handling

​Rate Limit Errors

​Timeout Protection

​Next Steps

AI Overview

AI Tools

Primary Models

Gemini 2.5 Flash

Gemini 2.5 Flash Lite

Gemini 3 Flash Preview

Model Selection

Model Capabilities

Multimodal Support

Tool Calling

Grounding

Provider Configuration

Google AI Studio

AI Gateway

Model Usage in Tools

Web Search

File Processing

URL Processing

Performance Optimization

Context Caching

Message Pruning

Streaming

Token Usage Tracking

Per-Step Tracking

Final Usage

Experimental Features

Claude Support (Experimental)

Cost Optimization

Model Selection Strategy

Caching Strategy

Error Handling

Rate Limit Errors

Timeout Protection

Next Steps