Bank reconciliation — Khan / Golomt / TDB
Pull bank transactions, match to outstanding invoices by amount + date + memo. Indexed in-memory lookup kills v1's O(N·M) leak.
Library picks
| Cluster | Pick | Why |
|---|---|---|
| Excel/CSV parse | exceljs ^4 | SheetJS (xlsx) has unresolved prototype-pollution CVEs through 0.20.x in npm registry |
| Fuzzy text match | fastest-levenshtein | fuse.js indexes per-query (slow at batch); fuzzysort is command-palette-optimized; string-similarity unmaintained |
| Khan API | fetch + OAuth2 client-credentials | Public dev portal (developer.khanbank.com) — sandbox is gated, fall back to XLSX |
| Golomt OpenBanking | fetch + OAuth2 | Public dev portal (dev.golomtbank.com); contract-gated |
| TDB | XLSX upload only | No public Open Banking API in 2026 |
pnpm --filter @spacehub/integrations add exceljs fastest-levenshtein
Data model
// packages/db/src/schema/bank-statements.ts
export const bankAccounts = pgTable("bank_accounts", {
id: uuid().primaryKey().defaultRandom(),
ownerId: uuid("owner_id").notNull(),
bank: text({ enum: ["khan","golomt","tdb","other"] }).notNull(),
accountNumber: text("account_number").notNull(),
accountName: text("account_name").notNull(),
currency: text().notNull().default("MNT"),
importMode: text("import_mode", { enum: ["api","upload"] }).notNull(),
apiCredentialsId: uuid("api_credentials_id"), // ref to encrypted secrets
});
export const bankStatements = pgTable("bank_statements", {
id: uuid().primaryKey().defaultRandom(),
ownerId: uuid("owner_id").notNull(),
bankAccountId: uuid("bank_account_id").notNull().references(() => bankAccounts.id),
periodStart: date("period_start").notNull(),
periodEnd: date("period_end").notNull(),
source: text({ enum: ["api","upload"] }).notNull(),
uploadedFileUrl: text("uploaded_file_url"),
importedAt: timestamp({ withTimezone: true }).notNull().defaultNow(),
importedByUserId: uuid("imported_by_user_id"),
txnCount: integer("txn_count"),
});
export const bankTransactions = pgTable("bank_transactions", {
id: uuid().primaryKey().defaultRandom(),
ownerId: uuid("owner_id").notNull(),
bankAccountId: uuid("bank_account_id").notNull(),
statementId: uuid("statement_id").references(() => bankStatements.id),
externalId: text("external_id"), // bank's txn id when available
postedAt: timestamp("posted_at", { withTimezone: true }).notNull(),
amount: bigint({ mode: "bigint" }).notNull(), // positive = credit/inflow
currency: text().notNull().default("MNT"),
counterpartyName: text("counterparty_name"),
counterpartyAccount: text("counterparty_account"),
memo: text(),
rawJson: jsonb("raw_json"), // full original row for audit
matchStatus: text("match_status", { enum: ["unmatched","auto_matched","manually_matched","ignored","ambiguous"] }).notNull().default("unmatched"),
matchedInvoiceId: uuid("matched_invoice_id").references(() => invoices.id),
matchedAt: timestamp("matched_at", { withTimezone: true }),
matchScore: numeric("match_score", { precision: 5, scale: 4 }), // 0..1
matchedByUserId: uuid("matched_by_user_id"),
}, (t) => [
unique("bt_external_unique").on(t.bankAccountId, t.externalId).where(sql`external_id is not null`),
index("bt_owner_unmatched").on(t.ownerId, t.matchStatus).where(sql`match_status = 'unmatched'`),
index("bt_amount_date").on(t.amount, t.postedAt),
]);
The matching algorithm (fixes v1's O(N·M))
v1 scanned 749K weak refs linearly per txn. v2 pre-indexes once per batch.
// packages/integrations/src/recon/match.ts
import { distance } from "fastest-levenshtein";
interface Candidate {
invoiceId: string;
amount: bigint;
dueDate: Date;
searchableMemo: string; // pre-normalized
}
interface MatchResult { invoiceId: string | null; score: number; ambiguous: boolean; }
export function matchTransaction(
txn: { amount: bigint; postedAt: Date; memo: string | null; counterpartyName: string | null },
byAmount: Map<bigint, Candidate[]> // pre-built once per batch
): MatchResult {
const candidates = byAmount.get(txn.amount) ?? [];
if (candidates.length === 0) return { invoiceId: null, score: 0, ambiguous: false };
const memo = normalize(`${txn.memo ?? ""} ${txn.counterpartyName ?? ""}`);
const scored = candidates.map(c => ({
candidate: c,
score: scoreCandidate(memo, c, txn.postedAt),
})).sort((a, b) => b.score - a.score);
const best = scored[0];
const second = scored[1];
const ambiguous = !!second && (best.score - second.score < 0.15);
if (best.score < 0.6) return { invoiceId: null, score: best.score, ambiguous: false };
if (ambiguous) return { invoiceId: best.candidate.invoiceId, score: best.score, ambiguous: true };
return { invoiceId: best.candidate.invoiceId, score: best.score, ambiguous: false };
}
function scoreCandidate(memo: string, c: Candidate, txnDate: Date): number {
// 3-factor: amount (already matched), date proximity, memo similarity
const daysDiff = Math.abs((txnDate.getTime() - c.dueDate.getTime()) / 86_400_000);
const dateScore = Math.exp(-daysDiff / 7); // gaussian falloff, sigma ~7 days
const memoScore = 1 - distance(memo, c.searchableMemo) / Math.max(memo.length, c.searchableMemo.length, 1);
return 0.6 * memoScore + 0.4 * dateScore;
}
function normalize(s: string): string {
return s.toLowerCase()
.normalize("NFD").replace(/[̀-ͯ]/g, "") // strip diacritics
.replace(/\s+/g, " ").trim();
}
// Caller (in worker):
async function reconcileBatch(statementId: string) {
const openInvoices = await db.query.invoices.findMany({
where: inArray(invoices.status, ["issued","partially_paid","overdue"]),
});
// Pre-build amount index — single O(M) pass, memoize normalized memo per invoice
const byAmount = new Map<bigint, Candidate[]>();
for (const inv of openInvoices) {
const remaining = inv.total - inv.paidAmount;
const list = byAmount.get(remaining) ?? [];
list.push({
invoiceId: inv.id, amount: remaining, dueDate: inv.dueDate,
searchableMemo: normalize(`${inv.code} ${inv.customerNameSnapshot} ${inv.contractId ?? ""}`),
});
byAmount.set(remaining, list);
}
const txns = await db.query.bankTransactions.findMany({
where: and(eq(bankTransactions.statementId, statementId), eq(bankTransactions.matchStatus, "unmatched")),
});
for (const t of txns) {
if (t.amount <= 0n) continue; // outflows can't match incoming invoices
const r = matchTransaction(t, byAmount);
await db.transaction(async (tx) => {
if (r.invoiceId && !r.ambiguous && r.score > 0.75) {
await tx.update(bankTransactions).set({
matchStatus: "auto_matched", matchedInvoiceId: r.invoiceId, matchScore: r.score.toFixed(4),
matchedAt: new Date(),
}).where(eq(bankTransactions.id, t.id));
await applyPayment(tx, r.invoiceId, t.amount, "bank_recon");
} else if (r.invoiceId && r.ambiguous) {
await tx.update(bankTransactions).set({
matchStatus: "ambiguous", matchScore: r.score.toFixed(4),
}).where(eq(bankTransactions.id, t.id));
// queue for human review
}
});
}
}
Bank-specific adapters
Khan Bank
Has public developer portal (developer.khanbank.com) with OAuth2 + Transaction APIs. Sandbox is contract-gated. Fallback path: download XLSX from Khan's web banking, upload via UI, parse with exceljs. Recommend supporting both via importMode on bank_accounts.
Golomt
Public dev portal at dev.golomtbank.com. "Open Banking since 2020" — OAuth2 client-credentials, sandbox available. More developer-friendly than Khan. Same interface as Khan adapter.
TDB
No public Open Banking. Upload-only. Excel parser identical pattern.
API surface
GET /v2/bank-accounts # list owner's accounts
POST /v2/bank-accounts # add (admin/owner)
PATCH /v2/bank-accounts/{id}
DELETE /v2/bank-accounts/{id}
POST /v2/bank-statements/upload # multipart: file + bank-account-id
POST /v2/bank-accounts/{id}/poll # trigger API import (cron also fires)
GET /v2/bank-statements?account={id}&period=
GET /v2/bank-statements/{id}/transactions
GET /v2/bank-transactions?status=ambiguous&cursor=&limit=
POST /v2/bank-transactions/{id}/match # body: { invoiceId }
POST /v2/bank-transactions/{id}/ignore # mark non-payment (refund, salary, etc.)
POST /v2/bank-transactions/{id}/unmatch # undo auto-match
Build steps
- Schemas + RLS (account, statement, transaction).
- Excel parser (exceljs) — one normalizer per bank (Khan, Golomt, TDB export formats differ).
- OAuth2 client + REST adapters for Khan + Golomt; gracefully fall back if credentials missing.
- Reconciliation worker (BullMQ chained: import → recon).
- BullMQ scheduler: hourly poll per account with
importMode: "api". - Web: bank accounts CRUD, statement upload UI, transaction queue with manual-match modal (search invoices by code/customer).
- Run-as-untrusted-input safety: parser inside worker with per-job memory limit; never render parsed cells unescaped.
Open questions
- Tolerance window: ±7 days default; per-owner configurable?
- Partial payments: if txn amount < invoice remaining by a small %, count as partial or queue for review? Recommend ≤5% → auto-partial, >5% → review.
- Memo translation: tenant types "rent for July 101" → normalize Mongolian + English variants. Build small synonym table or rely on Levenshtein alone?