Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/thinkex-oss/thinkex/llms.txt

Use this file to discover all available pages before exploring further.

ThinkEx uses Google’s Gemini models via the Vercel AI SDK Gateway for intelligent assistance and content processing.

Primary Models

Gemini 2.5 Flash

Model ID: google/gemini-2.5-flash (default) Google’s latest fast model optimized for speed and efficiency. Characteristics:
  • Speed: Very fast response times
  • Context Window: 1M tokens
  • Multimodal: Text, images, video, audio
  • Thinking: Dynamic reasoning budget
  • Grounding: Google Search integration
Best For:
  • General chat conversations
  • Quick content analysis
  • Real-time assistance
  • Web search synthesis
Configuration:
const result = await streamText({
  model: gateway("google/gemini-2.5-flash"),
  temperature: 1.0,
  providerOptions: {
    google: {
      thinkingConfig: {
        includeThoughts: true,
      },
    },
  },
});

Gemini 2.5 Flash Lite

Model ID: google/gemini-2.5-flash-lite Lightweight version optimized for simple tasks. Characteristics:
  • Speed: Fastest response times
  • Context Window: 1M tokens
  • Cost: Most economical
  • Multimodal: Text, images, video, audio
Best For:
  • File processing and analysis
  • Web search queries
  • Simple content extraction
  • Background processing tasks
Usage in ThinkEx:
// Web search tool
const { text } = await generateText({
  model: google('gemini-2.5-flash-lite'),
  tools: {
    googleSearch: google.tools.googleSearch({ mode: 'MODE_UNSPECIFIED' }),
  },
  prompt: `Search for: ${query}`,
});

// File analysis
const { text } = await generateText({
  model: google("gemini-2.5-flash-lite"),
  messages: [{
    role: "user",
    content: [
      { type: "text", text: "Analyze this file..." },
      { type: "file", data: fileUrl, mediaType: "application/pdf" },
    ],
  }],
});

Gemini 3 Flash Preview

Model ID: google/gemini-3-flash-preview Next-generation Gemini model with enhanced reasoning. Characteristics:
  • Thinking: Explicit thinking levels (minimal, standard, deep)
  • Context Window: 1M+ tokens
  • Reasoning: Enhanced multi-step reasoning
  • Multimodal: Advanced vision and audio understanding
Best For:
  • Complex problem solving
  • Multi-step reasoning tasks
  • Advanced content analysis
  • Research and synthesis
Configuration:
providerOptions: {
  google: {
    thinkingConfig: {
      includeThoughts: true,
      thinkingLevel: "minimal", // "minimal" | "standard" | "deep"
    },
  },
}
Thinking Levels:
  • minimal: Quick reasoning for simple tasks
  • standard: Balanced reasoning for most tasks
  • deep: Extended reasoning for complex problems

Model Selection

The chat API accepts a modelId parameter:
POST /api/chat
{
  "modelId": "google/gemini-2.5-flash",
  "messages": [...],
  ...
}
Auto-Prefixing: If you provide a model ID without a provider prefix (e.g., gemini-2.5-flash), it’s automatically prefixed with google/:
// These are equivalent:
"gemini-2.5-flash""google/gemini-2.5-flash"
Default Model: If no modelId is specified, the default is google/gemini-2.5-flash.

Model Capabilities

Multimodal Support

All Gemini models support multiple content types: Text:
{ type: "text", text: "Analyze this content..." }
Images:
{
  type: "file",
  data: imageUrl, // or base64 data URL
  mediaType: "image/jpeg",
  filename: "photo.jpg",
}
Videos:
{
  type: "file",
  data: "https://youtube.com/watch?v=...",
  mediaType: "video/mp4",
}
PDFs:
{
  type: "file",
  data: pdfUrl,
  mediaType: "application/pdf",
  filename: "document.pdf",
}
Audio:
{
  type: "file",
  data: audioUrl,
  mediaType: "audio/mpeg",
  filename: "audio.mp3",
}

Tool Calling

All models support function calling:
tools: {
  createNote: tool({
    description: "Create a note card",
    inputSchema: z.object({
      title: z.string(),
      content: z.string(),
    }),
    execute: async ({ title, content }) => {
      // Implementation
    },
  }),
}

Grounding

Gemini models support web grounding:
providerOptions: {
  google: {
    grounding: {
      // Google Search integration
    },
  },
}
ThinkEx uses explicit webSearch tool instead of automatic grounding for better control and source attribution.

Provider Configuration

Google AI Studio

Setup:
  1. Get API key from Google AI Studio
  2. Add to environment:
GOOGLE_GENERATIVE_AI_API_KEY=AIza...
Rate Limits:
  • Free tier: 15 requests/minute
  • Paid tier: Higher limits based on plan

AI Gateway

Optional: Use Vercel AI Gateway for enhanced routing:
AI_GATEWAY_API_KEY=your-gateway-key
Benefits:
  • Automatic failover between providers
  • Load balancing across models
  • Centralized logging and monitoring
  • Cost optimization

Model Usage in Tools

// src/lib/ai/tools/web-search.ts
const { text } = await generateText({
  model: google('gemini-2.5-flash-lite'),
  tools: {
    googleSearch: google.tools.googleSearch({ mode: 'MODE_UNSPECIFIED' }),
  },
  prompt: query,
});

File Processing

// src/lib/ai/tools/process-files.ts
const { text } = await generateText({
  model: google("gemini-2.5-flash-lite"),
  messages: [{
    role: "user",
    content: [
      { type: "text", text: batchPrompt },
      ...fileInfos.map(f => ({
        type: "file",
        data: f.fileUrl,
        mediaType: f.mediaType,
        filename: f.filename,
      })),
    ],
  }],
});

URL Processing

// src/lib/ai/tools/process-urls.ts
const { text } = await generateText({
  model: google("gemini-2.5-flash"),
  prompt: `Analyze content from: ${url}...`,
});

Performance Optimization

Context Caching

Long context is automatically cached:
onFinish: ({ usage }) => {
  console.log({
    cachedInputTokens: usage?.cachedInputTokens,
    inputTokens: usage?.inputTokens,
  });
}

Message Pruning

Reduce token usage by pruning old messages:
const prunedMessages = pruneMessages({
  messages: convertedMessages,
  reasoning: "before-last-message",
  toolCalls: "before-last-5-messages",
  emptyMessages: "remove",
});

Streaming

Use streaming for better perceived performance:
const result = streamText({
  model,
  messages,
  experimental_transform: smoothStream({
    chunking: "word",
    delayInMs: 15,
  }),
});

Token Usage Tracking

Per-Step Tracking

onStepFinish: (result) => {
  const { usage, finishReason } = result;
  console.log({
    stepType: result.stepType,
    inputTokens: usage?.inputTokens,
    outputTokens: usage?.outputTokens,
    reasoningTokens: usage?.reasoningTokens,
  });
}

Final Usage

onFinish: ({ usage, finishReason }) => {
  console.log({
    totalTokens: usage?.totalTokens,
    cachedInputTokens: usage?.cachedInputTokens,
    finishReason,
  });
}

Experimental Features

Claude Support (Experimental)

ThinkEx has experimental support for Anthropic’s Claude:
// Special mapping: Claude Sonnet 4.5 → Gemini 3 Flash Preview
if (modelId === "anthropic/claude-sonnet-4.5") {
  modelId = "google/gemini-3-flash-preview";
}
Claude support is experimental and not fully tested. Stick with Gemini models for production use.

Cost Optimization

Model Selection Strategy

  1. Simple tasksgemini-2.5-flash-lite
    • File analysis
    • Web search
    • Content extraction
  2. General chatgemini-2.5-flash
    • User conversations
    • Content generation
    • Tool orchestration
  3. Complex reasoninggemini-3-flash-preview
    • Multi-step problems
    • Research synthesis
    • Advanced analysis

Caching Strategy

  • PDFs: Cache OCR results after first extraction
  • Messages: Use context caching for long conversations
  • Files: Store processed results in database

Error Handling

Rate Limit Errors

try {
  const result = await streamText({ model, ... });
} catch (error) {
  if (error.status === 429) {
    // Rate limit exceeded
    // Implement exponential backoff
  }
}

Timeout Protection

const result = await streamText({
  model,
  messages,
  stopWhen: stepCountIs(25), // Prevent infinite loops
});

Next Steps

AI Overview

Learn about AI architecture and features

AI Tools

Explore available AI tools