Decision log

Architectural decisions — locked vs pending. Pending ones need your call before the relevant feature slice can start.

Updated after May 26 audit. See audit findings for the change-log. 5 picks overturned. New "Decisions from audit" section below — pick before Phase 1 starts.

Decisions from audit (pending your call)

A. Web framework — Next.js 15 vs TanStack Start

Audit recommendation: swap to TanStack Start.

Reasons: Next.js 16 already shipping (15 mid-life); App Router gotchas burned production users (async params, caching flip, RSC boundary errors); TanStack Start is fastest SSR React framework now (per Platformatic 2026 benchmark); end-to-end type-safe routes without codegen; no Vercel lock-in; better fit for SaaS dashboards (LogRocket explicit recommendation).

Keep Next.js 15 only if you specifically need: PPR, Vercel-integrated ISR, marketing site SEO surface — Spacehub doesn't.

B. Job queue — pg-boss+BullMQ hybrid vs graphile-worker

Audit recommendation: drop the hybrid; use graphile-worker (or Hatchet).

Reasons: hybrid was wrong shape (two delivery semantics, two dashboards); BullMQ has recurring Redis OOM + stalled-job production footguns; pg-boss is single-maintainer; graphile-worker is battle-tested at GitHub, Postgres-native, outbox + cron in one. Hatchet is newer alternative with durable workflows.

Drops Redis from queue requirement. Redis stays only for OTP + rate-limit cache.

C. PDF library — @react-pdf/renderer vs Playwright PDF

Audit recommendation: swap to Playwright PDF.

Reasons: React-PDF has unresolved memory leaks in long-running batch generation (open issues #2217, #2848, #378). Cyrillic font rendering is a literal open bug #1366 — exactly our case. Playwright PDF is faster (147ms→42ms cold), HTML+CSS Paged Media trivially handles Cyrillic via web fonts.

Gotenberg as alternative if you want infrastructure separation.

D. Hosting region — Hetzner Helsinki/Falkenstein vs Singapore

Audit recommendation: Hetzner Singapore (live since August 2024).

Mongolia→Singapore is ~80-100 ms; Helsinki is ~200-260 ms. Singapore uses colocation, not Hetzner's own DC — for our scale, this doesn't matter.

E. Gemini SDK — already obsolete

@google/generative-ai was deprecated November 30, 2025. Use @google/genai (unified SDK) + Vercel AI SDK v5+. Not a decision — just correct.

F. Logs backend — Grafana Loki vs Axiom

Audit recommends Axiom for logs (500GB free vs Loki's 50GB), keeping Sentry + OTel + Grafana Cloud for traces/metrics. Loki has cardinality footguns at scale.

Locked

DecisionChoiceWhy
LanguageTypeScript end-to-endOne language, shared types, faster iteration
MobileFlutter (existing app, deferred priority)Stays on Dart; consumes shared OpenAPI
API frameworkHono 4Lightweight, runs anywhere, OpenAPI-first, Zod-native
Web frameworkNext.js 15 App Router + React 19SSR for SEO + perf, server components fetch API via hc
ORMDrizzleSQL-transparent, RLS first-class, no codegen daemon, BigInt support
DBPostgres 16RLS, jsonb, listen/notify, modern. Migrate from SQL Server.
MonorepoTurborepo + pnpm workspacesBest-in-class task caching + isolation
StrategyStrangler fig per domainLow risk, demoable slice each phase, no big-bang flag day
AggressionAggressive simplificationCut dead enums + dual hierarchies + post/community where buy-in exists
API + Mobile contractSingle API serves both, OpenAPI spec sharedOne source of truth, no per-client divergence
MoneyBigInt minor units (mungo for MNT)Kills KNOWN_ISSUES #2 decimal equality bug by construction
Side effectsOutbox pattern (table + worker)Retryable, observable, no fire-and-forget races
IDsUUID v7Sortable, no composite-key dance, mobile/offline-friendly

Pending (you decide)

1. Auth library

Recommended: better-auth v1.4+.

Framework-agnostic, Drizzle adapter, plugins for JWT + bearer + organization (= PropertyOwner) + phone-number OTP + multi-session + admin. Lucia is deprecated (March 2025). Auth.js is Next-shaped, awkward for mobile.

Alternative: roll-your-own with jose + argon2. Replicates what better-auth gives but you own the bugs.

Detail: auth page.

2. RLS bridging pattern

Recommended: SET LOCAL + set_config('app.user_id', ...) inside a per-request transaction.

Pooler-safe (works with PgBouncer transaction mode), one extra round-trip per request (~sub-ms locally). Alternative is per-tenant Postgres roles — operationally impossible past ~50 tenants.

Detail: identity page.

3. Money type

Recommended: keep the 50-LOC custom @spacehub/shared/money impl.

BigInt + Currency tag is enough. dinero.js v2 adds allocation/distribution helpers we might never need. Re-evaluate if we hit a real use case.

4. Job queue

Recommended: BullMQ + Redis (current default).

Best-in-class TS support, repeatable jobs, priority queues, dashboard via bull-board. Alternative pg-boss drops the Redis dependency — attractive for ops simplicity. Pick BullMQ unless you want one less moving part.

Detail: jobs page.

5. PDF library

Recommended: @react-pdf/renderer for invoices and statements.

Declarative, server-side, embeds custom fonts (need Mongolian Cyrillic-supporting font like Inter or Noto Sans Mongolian). Puppeteer wins on visual fidelity but adds 200MB Chromium dep and slower startup. Pick Puppeteer only if accountants reject React-PDF output.

Detail: reports page.

6. File storage

Recommended: Cloudflare R2 ($0.015/GB-month, zero egress).

S3-compatible, predictable pricing, no egress lock-in. MinIO self-host is fine if you already have ops capacity, but R2 saves ~3 hours/month of babysitting.

Detail: files page.

7. Hosting

Recommended: Hetzner CCX13 + Coolify for API + Redis + Postgres (~€13/mo), Vercel for Next.js (Hobby/Pro).

Mongolia latency: Tokyo region preferable (Fly.io nrt or Neon Tokyo) but Hetzner Helsinki/Falkenstein (~200-260ms) is acceptable for non-realtime ops. Migrate Postgres to Neon Tokyo if latency complaints arrive.

Detail: devops page.

8. Strangler approach: CDC vs dual-write

Recommended: start with dual-write only; add Debezium CDC only if XAF UI must show data from a migrated domain.

Dual-write is one less moving part. CDC adds operational complexity (Debezium connectors, Kafka Connect or Redpanda) for a few months of dual-system runtime. Skip unless required.

Detail: strangler page.

9. Feature cuts

Need stakeholder input:

  • Post / community / UserPost / Rating / ImageReportRecord (26 BO files in v1). Reportedly low-use. Cut entirely?
  • 85 enums. Confirm we can collapse to ~25 enums + reference tables for Bank/Region/ServiceType.
  • Chatbot whitelist. Currently hardcoded PO list — replace with feature flag or roll out to all owners?

10. Auth bridge during strangler

Need decision: SSO between XAF (cookie-based) and v2 (better-auth)?

  • Option A: shared session table; both apps read/write Postgres sessions. Requires XAF code changes.
  • Option B: JWT bridge; XAF issues a one-time token on logout, v2 accepts. Less coupling.
  • Option C: users re-login on v2 first time. Simplest, slight friction. Recommended for small user base.

How to decide

Skim each card, pick. When you've decided, I'll move the item to "Locked" and start the build slice on the relevant feature page.