Chatbot — Gemini 2.5 Flash

Vercel AI SDK v5+ over @google/genai (unified Google GenAI SDK). Streaming SSE. Provider-agnostic — can swap to Anthropic/OpenAI without rewriting tool calls. Hardcoded PO whitelist becomes a feature flag.

Phase 5

Corrected by audit (May 26):

@google/generative-ai is DEPRECATED (Nov 30, 2025). SDK releases after June 24, 2026 strip deprecated modules. Use @google/genai.
Pin Vercel AI SDK ≥ v5 (shipped July 2025). v5 has UIMessage vs ModelMessage split, native SSE, tools use inputSchema/outputSchema (not parameters/result). v6 in late 2025.
Plan for Gemini retry/fallback (Gemini 3 Pro INVALID_ARGUMENT reports in production).
OpenRouter as fallback proxy is cheap insurance — multi-provider hedging recommended.

Library pick

Vercel AI SDK (ai + @ai-sdk/google) over @google/generative-ai:

Provider-agnostic streamText / tool() — fallback to Anthropic on Gemini rate-limit.
First-class Next 15 RSC streaming via useChat + toDataStreamResponse().
Known issue: vercel/ai #6589 — Gemini 2.5 thinking-token shape sometimes breaks streaming function calls. Pin a known-good minor version + test on every bump. Downgrade individual tool to @google/generative-ai direct if it bites.

# Vercel AI SDK v5+ + the unified Google GenAI SDK
# (NOT @google/generative-ai — that is deprecated as of Nov 30, 2025)
pnpm --filter @spacehub/web add "ai@^5" "@ai-sdk/google@^5" "@google/genai" zod
pnpm --filter @spacehub/api add "ai@^5" "@ai-sdk/google@^5" "@google/genai" zod

Architecture

Chat lives in two places:

Next.js route handler /app/api/chat/route.ts — handles streaming for web users (best DX with useChat).
Hono /v2/chat — same logic, for Flutter mobile (SSE). Shares tool definitions via packages/chatbot.

Tool implementations import @spacehub/db directly and run inside RLS context (set app.user_id + app.owner_id from the session before query).

Route handler — Next.js

// apps/web/src/app/api/chat/route.ts
import { streamText } from "ai";
import { google } from "@ai-sdk/google";
import { getSession } from "@/lib/session";
import { tools } from "@spacehub/chatbot/tools";
import { persistTurn } from "@/lib/chat-persist";

export async function POST(req: Request) {
  const session = await getSession();
  if (!session) return new Response("Unauthorized", { status: 401 });

  const { messages, conversationId } = await req.json();

  const result = streamText({
    model: google("gemini-2.5-flash"),
    system: `You are a Spacehub assistant for ${session.user.name},
             operating on behalf of property owner ${session.org.name}.
             Use the provided tools to answer with real data. Reply in Mongolian by default,
             switch to English if the user does.`,
    messages,
    tools: tools(session),       // tools closed over the session for RLS scoping
    maxTokens: 2048,
    temperature: 0.3,
    onFinish: ({ text, usage }) => persistTurn(conversationId, session.user.id, text, usage),
  });

  return result.toDataStreamResponse();
}

Tools (port the 10 from v1)

// packages/chatbot/src/tools.ts
import { tool } from "ai";
import { z } from "zod";
import { withRls } from "@spacehub/db/rls";

export function tools(session: Session) {
  const rls = { userId: session.user.id, ownerId: session.org.id, role: session.user.role };

  return {
    getActiveContracts: tool({
      description: "List currently active contracts (rentals) for the user's organization. Filter by property optional.",
      parameters: z.object({
        propertyId: z.string().optional(),
        limit: z.number().min(1).max(50).default(20),
      }),
      execute: async ({ propertyId, limit }) => withRls(db, rls, (tx) =>
        tx.query.contracts.findMany({
          where: and(
            eq(contracts.status, "active"),
            propertyId ? eq(contracts.propertyId, propertyId) : undefined,
          ),
          limit,
        })),
    }),

    checkRentPayments: tool({
      description: "Summarize rent payment status: paid, overdue, total amounts for a period.",
      parameters: z.object({
        periodStart: z.string().describe("YYYY-MM-DD"),
        periodEnd: z.string().describe("YYYY-MM-DD"),
      }),
      execute: async ({ periodStart, periodEnd }) => withRls(db, rls, async (tx) => {
        const rows = await tx.select({
          status: invoices.status,
          count: sql<number>`count(*)`,
          total: sql<string>`sum(total)`,
          paid: sql<string>`sum(paid_amount)`,
        }).from(invoices)
          .where(and(
            eq(invoices.kind, "rent"),
            gte(invoices.periodStart, periodStart),
            lte(invoices.periodEnd, periodEnd),
          ))
          .groupBy(invoices.status);
        return rows;
      }),
    }),

    getOccupancyInfo: tool({ /* property occupancy stats */ }),
    listOverdueInvoices: tool({ /* overdue list */ }),
    getContractDetail: tool({ /* by id */ }),
    getCustomerHistory: tool({ /* by customer id */ }),
    listPendingEbarimtPushes: tool({ /* outbox status */ }),
    getMonthlyRevenue: tool({ /* aggregated */ }),
    findRoom: tool({ /* search by unit number or feature */ }),
    summarizeBankRecon: tool({ /* unmatched txn count */ }),
  };
}

Persist history

// packages/db/src/schema/chat.ts
export const chatConversations = pgTable("chat_conversations", {
  id: uuid().primaryKey().defaultRandom(),
  userId: uuid("user_id").notNull(),
  ownerId: uuid("owner_id").notNull(),
  title: text(),
  createdAt: timestamp({ withTimezone: true }).notNull().defaultNow(),
  lastMessageAt: timestamp("last_message_at", { withTimezone: true }).notNull().defaultNow(),
});

export const chatMessages = pgTable("chat_messages", {
  id: uuid().primaryKey().defaultRandom(),
  conversationId: uuid("conversation_id").notNull().references(() => chatConversations.id, { onDelete: "cascade" }),
  role: text({ enum: ["user","assistant","tool"] }).notNull(),
  content: jsonb().notNull(),       // text or tool-call/tool-result payloads
  tokensIn: integer("tokens_in"),
  tokensOut: integer("tokens_out"),
  modelVersion: text("model_version"),
  createdAt: timestamp({ withTimezone: true }).notNull().defaultNow(),
});

Cost control

Implicit caching (free in Gemini 2.5): keep system prompt + tool definitions at the start of every request, user input at end. 90% input-cost reduction on cache hits. Min cache 1024 tokens.
Explicit caching for >5k-token preambles (when context includes "tenant + property + contract" snapshot). ~1h TTL, cache-read = 10% base input price.
maxTokens: 2048 for chat, 4096 for summarization tools.
Fallback chain: gemini-2.5-flash → gemini-2.5-flash-lite on rate-limit → scripted reply on 429-after-retry.

Whitelist → feature flag

v1 has hardcoded PO whitelist for chatbot. v2: feature_flags table + Postgres-cached env-overrides. Per-owner toggle, default off. Surface in admin UI.

API surface

POST /v2/chat                     # streaming SSE (Hono); { messages, conversationId? }
GET  /v2/chat/conversations       # user's history
GET  /v2/chat/conversations/{id}/messages
DELETE /v2/chat/conversations/{id}
# Next.js web uses /api/chat (RSC integration with useChat hook)

Build steps

Schemas + RLS (chat_conversations, chat_messages).
packages/chatbot with shared tool definitions.
Hono /v2/chat SSE handler (port for Flutter).
Next.js /api/chat route handler + <Chatbot/> client component using useChat.
Persist on stream finish (onFinish callback).
Feature flag for owner allow-list.
Cost monitoring: dashboard panel per owner (tokens in/out, $ estimate).

Open questions

Flutter SSE vs polling? SSE works on Flutter via eventsource; easier than WS. Use SSE.
Tool execution scope: any user can ask "show overdue invoices" — RLS already prevents cross-owner data. Confirm no extra guardrail needed.
System prompt customization per owner? Defer. Single prompt for v2 launch.
Cost cap per owner per month? Hard cutoff or soft alert? Recommend alert at 80%, soft cap at 100% with admin override.