Bank reconciliation — Khan / Golomt / TDB

Pull bank transactions, match to outstanding invoices by amount + date + memo. Indexed in-memory lookup kills v1's O(N·M) leak.

Phase 4

Library picks

ClusterPickWhy
Excel/CSV parseexceljs ^4SheetJS (xlsx) has unresolved prototype-pollution CVEs through 0.20.x in npm registry
Fuzzy text matchfastest-levenshteinfuse.js indexes per-query (slow at batch); fuzzysort is command-palette-optimized; string-similarity unmaintained
Khan APIfetch + OAuth2 client-credentialsPublic dev portal (developer.khanbank.com) — sandbox is gated, fall back to XLSX
Golomt OpenBankingfetch + OAuth2Public dev portal (dev.golomtbank.com); contract-gated
TDBXLSX upload onlyNo public Open Banking API in 2026
pnpm --filter @spacehub/integrations add exceljs fastest-levenshtein

Data model

// packages/db/src/schema/bank-statements.ts
export const bankAccounts = pgTable("bank_accounts", {
  id: uuid().primaryKey().defaultRandom(),
  ownerId: uuid("owner_id").notNull(),
  bank: text({ enum: ["khan","golomt","tdb","other"] }).notNull(),
  accountNumber: text("account_number").notNull(),
  accountName: text("account_name").notNull(),
  currency: text().notNull().default("MNT"),
  importMode: text("import_mode", { enum: ["api","upload"] }).notNull(),
  apiCredentialsId: uuid("api_credentials_id"),   // ref to encrypted secrets
});

export const bankStatements = pgTable("bank_statements", {
  id: uuid().primaryKey().defaultRandom(),
  ownerId: uuid("owner_id").notNull(),
  bankAccountId: uuid("bank_account_id").notNull().references(() => bankAccounts.id),
  periodStart: date("period_start").notNull(),
  periodEnd: date("period_end").notNull(),
  source: text({ enum: ["api","upload"] }).notNull(),
  uploadedFileUrl: text("uploaded_file_url"),
  importedAt: timestamp({ withTimezone: true }).notNull().defaultNow(),
  importedByUserId: uuid("imported_by_user_id"),
  txnCount: integer("txn_count"),
});

export const bankTransactions = pgTable("bank_transactions", {
  id: uuid().primaryKey().defaultRandom(),
  ownerId: uuid("owner_id").notNull(),
  bankAccountId: uuid("bank_account_id").notNull(),
  statementId: uuid("statement_id").references(() => bankStatements.id),
  externalId: text("external_id"),               // bank's txn id when available
  postedAt: timestamp("posted_at", { withTimezone: true }).notNull(),
  amount: bigint({ mode: "bigint" }).notNull(),  // positive = credit/inflow
  currency: text().notNull().default("MNT"),
  counterpartyName: text("counterparty_name"),
  counterpartyAccount: text("counterparty_account"),
  memo: text(),
  rawJson: jsonb("raw_json"),                    // full original row for audit
  matchStatus: text("match_status", { enum: ["unmatched","auto_matched","manually_matched","ignored","ambiguous"] }).notNull().default("unmatched"),
  matchedInvoiceId: uuid("matched_invoice_id").references(() => invoices.id),
  matchedAt: timestamp("matched_at", { withTimezone: true }),
  matchScore: numeric("match_score", { precision: 5, scale: 4 }),  // 0..1
  matchedByUserId: uuid("matched_by_user_id"),
}, (t) => [
  unique("bt_external_unique").on(t.bankAccountId, t.externalId).where(sql`external_id is not null`),
  index("bt_owner_unmatched").on(t.ownerId, t.matchStatus).where(sql`match_status = 'unmatched'`),
  index("bt_amount_date").on(t.amount, t.postedAt),
]);

The matching algorithm (fixes v1's O(N·M))

v1 scanned 749K weak refs linearly per txn. v2 pre-indexes once per batch.

// packages/integrations/src/recon/match.ts
import { distance } from "fastest-levenshtein";

interface Candidate {
  invoiceId: string;
  amount: bigint;
  dueDate: Date;
  searchableMemo: string;  // pre-normalized
}

interface MatchResult { invoiceId: string | null; score: number; ambiguous: boolean; }

export function matchTransaction(
  txn: { amount: bigint; postedAt: Date; memo: string | null; counterpartyName: string | null },
  byAmount: Map<bigint, Candidate[]>   // pre-built once per batch
): MatchResult {
  const candidates = byAmount.get(txn.amount) ?? [];
  if (candidates.length === 0) return { invoiceId: null, score: 0, ambiguous: false };

  const memo = normalize(`${txn.memo ?? ""} ${txn.counterpartyName ?? ""}`);
  const scored = candidates.map(c => ({
    candidate: c,
    score: scoreCandidate(memo, c, txn.postedAt),
  })).sort((a, b) => b.score - a.score);

  const best = scored[0];
  const second = scored[1];
  const ambiguous = !!second && (best.score - second.score < 0.15);

  if (best.score < 0.6) return { invoiceId: null, score: best.score, ambiguous: false };
  if (ambiguous) return { invoiceId: best.candidate.invoiceId, score: best.score, ambiguous: true };
  return { invoiceId: best.candidate.invoiceId, score: best.score, ambiguous: false };
}

function scoreCandidate(memo: string, c: Candidate, txnDate: Date): number {
  // 3-factor: amount (already matched), date proximity, memo similarity
  const daysDiff = Math.abs((txnDate.getTime() - c.dueDate.getTime()) / 86_400_000);
  const dateScore = Math.exp(-daysDiff / 7);                       // gaussian falloff, sigma ~7 days
  const memoScore = 1 - distance(memo, c.searchableMemo) / Math.max(memo.length, c.searchableMemo.length, 1);
  return 0.6 * memoScore + 0.4 * dateScore;
}

function normalize(s: string): string {
  return s.toLowerCase()
    .normalize("NFD").replace(/[̀-ͯ]/g, "")  // strip diacritics
    .replace(/\s+/g, " ").trim();
}

// Caller (in worker):
async function reconcileBatch(statementId: string) {
  const openInvoices = await db.query.invoices.findMany({
    where: inArray(invoices.status, ["issued","partially_paid","overdue"]),
  });

  // Pre-build amount index — single O(M) pass, memoize normalized memo per invoice
  const byAmount = new Map<bigint, Candidate[]>();
  for (const inv of openInvoices) {
    const remaining = inv.total - inv.paidAmount;
    const list = byAmount.get(remaining) ?? [];
    list.push({
      invoiceId: inv.id, amount: remaining, dueDate: inv.dueDate,
      searchableMemo: normalize(`${inv.code} ${inv.customerNameSnapshot} ${inv.contractId ?? ""}`),
    });
    byAmount.set(remaining, list);
  }

  const txns = await db.query.bankTransactions.findMany({
    where: and(eq(bankTransactions.statementId, statementId), eq(bankTransactions.matchStatus, "unmatched")),
  });

  for (const t of txns) {
    if (t.amount <= 0n) continue;     // outflows can't match incoming invoices
    const r = matchTransaction(t, byAmount);
    await db.transaction(async (tx) => {
      if (r.invoiceId && !r.ambiguous && r.score > 0.75) {
        await tx.update(bankTransactions).set({
          matchStatus: "auto_matched", matchedInvoiceId: r.invoiceId, matchScore: r.score.toFixed(4),
          matchedAt: new Date(),
        }).where(eq(bankTransactions.id, t.id));
        await applyPayment(tx, r.invoiceId, t.amount, "bank_recon");
      } else if (r.invoiceId && r.ambiguous) {
        await tx.update(bankTransactions).set({
          matchStatus: "ambiguous", matchScore: r.score.toFixed(4),
        }).where(eq(bankTransactions.id, t.id));
        // queue for human review
      }
    });
  }
}

Bank-specific adapters

Khan Bank

Has public developer portal (developer.khanbank.com) with OAuth2 + Transaction APIs. Sandbox is contract-gated. Fallback path: download XLSX from Khan's web banking, upload via UI, parse with exceljs. Recommend supporting both via importMode on bank_accounts.

Golomt

Public dev portal at dev.golomtbank.com. "Open Banking since 2020" — OAuth2 client-credentials, sandbox available. More developer-friendly than Khan. Same interface as Khan adapter.

TDB

No public Open Banking. Upload-only. Excel parser identical pattern.

API surface

GET    /v2/bank-accounts                              # list owner's accounts
POST   /v2/bank-accounts                              # add (admin/owner)
PATCH  /v2/bank-accounts/{id}
DELETE /v2/bank-accounts/{id}

POST   /v2/bank-statements/upload                     # multipart: file + bank-account-id
POST   /v2/bank-accounts/{id}/poll                    # trigger API import (cron also fires)
GET    /v2/bank-statements?account={id}&period=
GET    /v2/bank-statements/{id}/transactions

GET    /v2/bank-transactions?status=ambiguous&cursor=&limit=
POST   /v2/bank-transactions/{id}/match                # body: { invoiceId }
POST   /v2/bank-transactions/{id}/ignore               # mark non-payment (refund, salary, etc.)
POST   /v2/bank-transactions/{id}/unmatch              # undo auto-match

Build steps

  1. Schemas + RLS (account, statement, transaction).
  2. Excel parser (exceljs) — one normalizer per bank (Khan, Golomt, TDB export formats differ).
  3. OAuth2 client + REST adapters for Khan + Golomt; gracefully fall back if credentials missing.
  4. Reconciliation worker (BullMQ chained: import → recon).
  5. BullMQ scheduler: hourly poll per account with importMode: "api".
  6. Web: bank accounts CRUD, statement upload UI, transaction queue with manual-match modal (search invoices by code/customer).
  7. Run-as-untrusted-input safety: parser inside worker with per-job memory limit; never render parsed cells unescaped.

Open questions