Decision log
Architectural decisions — locked vs pending. Pending ones need your call before the relevant feature slice can start.
Decisions from audit (pending your call)
A. Web framework — Next.js 15 vs TanStack Start
Audit recommendation: swap to TanStack Start.
Reasons: Next.js 16 already shipping (15 mid-life); App Router gotchas burned production users (async params, caching flip, RSC boundary errors); TanStack Start is fastest SSR React framework now (per Platformatic 2026 benchmark); end-to-end type-safe routes without codegen; no Vercel lock-in; better fit for SaaS dashboards (LogRocket explicit recommendation).
Keep Next.js 15 only if you specifically need: PPR, Vercel-integrated ISR, marketing site SEO surface — Spacehub doesn't.
B. Job queue — pg-boss+BullMQ hybrid vs graphile-worker
Audit recommendation: drop the hybrid; use graphile-worker (or Hatchet).
Reasons: hybrid was wrong shape (two delivery semantics, two dashboards); BullMQ has recurring Redis OOM + stalled-job production footguns; pg-boss is single-maintainer; graphile-worker is battle-tested at GitHub, Postgres-native, outbox + cron in one. Hatchet is newer alternative with durable workflows.
Drops Redis from queue requirement. Redis stays only for OTP + rate-limit cache.
C. PDF library — @react-pdf/renderer vs Playwright PDF
Audit recommendation: swap to Playwright PDF.
Reasons: React-PDF has unresolved memory leaks in long-running batch generation (open issues #2217, #2848, #378). Cyrillic font rendering is a literal open bug #1366 — exactly our case. Playwright PDF is faster (147ms→42ms cold), HTML+CSS Paged Media trivially handles Cyrillic via web fonts.
Gotenberg as alternative if you want infrastructure separation.
D. Hosting region — Hetzner Helsinki/Falkenstein vs Singapore
Audit recommendation: Hetzner Singapore (live since August 2024).
Mongolia→Singapore is ~80-100 ms; Helsinki is ~200-260 ms. Singapore uses colocation, not Hetzner's own DC — for our scale, this doesn't matter.
E. Gemini SDK — already obsolete
@google/generative-ai was deprecated November 30, 2025. Use @google/genai (unified SDK) + Vercel AI SDK v5+. Not a decision — just correct.
F. Logs backend — Grafana Loki vs Axiom
Audit recommends Axiom for logs (500GB free vs Loki's 50GB), keeping Sentry + OTel + Grafana Cloud for traces/metrics. Loki has cardinality footguns at scale.
Locked
| Decision | Choice | Why |
|---|---|---|
| Language | TypeScript end-to-end | One language, shared types, faster iteration |
| Mobile | Flutter (existing app, deferred priority) | Stays on Dart; consumes shared OpenAPI |
| API framework | Hono 4 | Lightweight, runs anywhere, OpenAPI-first, Zod-native |
| Web framework | Next.js 15 App Router + React 19 | SSR for SEO + perf, server components fetch API via hc |
| ORM | Drizzle | SQL-transparent, RLS first-class, no codegen daemon, BigInt support |
| DB | Postgres 16 | RLS, jsonb, listen/notify, modern. Migrate from SQL Server. |
| Monorepo | Turborepo + pnpm workspaces | Best-in-class task caching + isolation |
| Strategy | Strangler fig per domain | Low risk, demoable slice each phase, no big-bang flag day |
| Aggression | Aggressive simplification | Cut dead enums + dual hierarchies + post/community where buy-in exists |
| API + Mobile contract | Single API serves both, OpenAPI spec shared | One source of truth, no per-client divergence |
| Money | BigInt minor units (mungo for MNT) | Kills KNOWN_ISSUES #2 decimal equality bug by construction |
| Side effects | Outbox pattern (table + worker) | Retryable, observable, no fire-and-forget races |
| IDs | UUID v7 | Sortable, no composite-key dance, mobile/offline-friendly |
Pending (you decide)
1. Auth library
Recommended: better-auth v1.4+.
Framework-agnostic, Drizzle adapter, plugins for JWT + bearer + organization (= PropertyOwner) + phone-number OTP + multi-session + admin. Lucia is deprecated (March 2025). Auth.js is Next-shaped, awkward for mobile.
Alternative: roll-your-own with jose + argon2. Replicates what better-auth gives but you own the bugs.
Detail: auth page.
2. RLS bridging pattern
Recommended: SET LOCAL + set_config('app.user_id', ...) inside a per-request transaction.
Pooler-safe (works with PgBouncer transaction mode), one extra round-trip per request (~sub-ms locally). Alternative is per-tenant Postgres roles — operationally impossible past ~50 tenants.
Detail: identity page.
3. Money type
Recommended: keep the 50-LOC custom @spacehub/shared/money impl.
BigInt + Currency tag is enough. dinero.js v2 adds allocation/distribution helpers we might never need. Re-evaluate if we hit a real use case.
4. Job queue
Recommended: BullMQ + Redis (current default).
Best-in-class TS support, repeatable jobs, priority queues, dashboard via bull-board. Alternative pg-boss drops the Redis dependency — attractive for ops simplicity. Pick BullMQ unless you want one less moving part.
Detail: jobs page.
5. PDF library
Recommended: @react-pdf/renderer for invoices and statements.
Declarative, server-side, embeds custom fonts (need Mongolian Cyrillic-supporting font like Inter or Noto Sans Mongolian). Puppeteer wins on visual fidelity but adds 200MB Chromium dep and slower startup. Pick Puppeteer only if accountants reject React-PDF output.
Detail: reports page.
6. File storage
Recommended: Cloudflare R2 ($0.015/GB-month, zero egress).
S3-compatible, predictable pricing, no egress lock-in. MinIO self-host is fine if you already have ops capacity, but R2 saves ~3 hours/month of babysitting.
Detail: files page.
7. Hosting
Recommended: Hetzner CCX13 + Coolify for API + Redis + Postgres (~€13/mo), Vercel for Next.js (Hobby/Pro).
Mongolia latency: Tokyo region preferable (Fly.io nrt or Neon Tokyo) but Hetzner Helsinki/Falkenstein (~200-260ms) is acceptable for non-realtime ops. Migrate Postgres to Neon Tokyo if latency complaints arrive.
Detail: devops page.
8. Strangler approach: CDC vs dual-write
Recommended: start with dual-write only; add Debezium CDC only if XAF UI must show data from a migrated domain.
Dual-write is one less moving part. CDC adds operational complexity (Debezium connectors, Kafka Connect or Redpanda) for a few months of dual-system runtime. Skip unless required.
Detail: strangler page.
9. Feature cuts
Need stakeholder input:
- Post / community / UserPost / Rating / ImageReportRecord (26 BO files in v1). Reportedly low-use. Cut entirely?
- 85 enums. Confirm we can collapse to ~25 enums + reference tables for Bank/Region/ServiceType.
- Chatbot whitelist. Currently hardcoded PO list — replace with feature flag or roll out to all owners?
10. Auth bridge during strangler
Need decision: SSO between XAF (cookie-based) and v2 (better-auth)?
- Option A: shared session table; both apps read/write Postgres
sessions. Requires XAF code changes. - Option B: JWT bridge; XAF issues a one-time token on logout, v2 accepts. Less coupling.
- Option C: users re-login on v2 first time. Simplest, slight friction. Recommended for small user base.
How to decide
Skim each card, pick. When you've decided, I'll move the item to "Locked" and start the build slice on the relevant feature page.