E-commerce microservices in Go. Now with traces, metrics, mTLS, real auth, an outbox, and a Helm chart.
make up # generates dev certs, builds + runs the compose stack
make smoke # full end-to-end check (login → saga → outbox → mTLS → metrics → traces)
- Public API at
http://localhost:8080(POST/api/v1/auth/loginfirst, then/api/v1/orders) - Jaeger UI at
http://localhost:16686 - Prometheus scrape at
http://localhost:8889/metrics(via the OTel Collector) - NATS monitoring at
http://localhost:8222
+---------------+
| user/UI |
+-------+-------+
| HTTPS + user JWT (HS256, aud=shopapi.user)
+-------v-------+
| gateway | :8080
+-------+-------+
| gRPC over mTLS + service JWT + ACL + trace_ctx
+-------v-------+
+---publish-->| orders | :9001 → ordersdb (+ outbox)
| +-------+-------+
| | gRPC over mTLS + service JWT + ACL + trace_ctx
| v
| +---------------+
| | inventory | :9002 → inventorydb (+ processed_releases)
| +---------------+
|
+-------------+-------+ +-----------------+ +----------+
| NATS JetStream | | OTel Collector |---> | Jaeger | :16686
| ORDERS, PAYMENTS | | :4317/:4318 | +----------+
+----+-----------+----+ | :8889 | → Prometheus
| ^ +-------+---------+
v | ^
+---------+ publish | OTLP traces + metrics from every service
| payments| ---/ ---------------------'
+---------+ → paymentsdb (+ outbox)
Every span across HTTP, gRPC, NATS, and pgx is exported to the Collector and fanned out to Jaeger (traces) and Prometheus (metrics). A single POST /orders is one trace, end to end.
- gRPC contracts, code generation, status code propagation
- Database per service, no cross-DB FKs, atomic stock with CHECK constraints
- Event-driven payments with NATS JetStream, durable consumers, idempotent handlers
- Saga compensation: failed payment → release stock + cancel order
- Inter-service JWT auth via gRPC unary interceptors
- Distributed tracing: HTTP, gRPC, pgx, NATS — all stitched into one trace
- Transactional outbox (internal/outbox). Events are written to a per-service
outboxtable inside the same DB transaction as the domain change; a drainer goroutine publishes them to NATS. Trace context is preserved across the outbox so consumer spans link to the original request. - Idempotent
inventory.Release(internal/inventory/store.go). Caller-suppliedrelease_id+ aprocessed_releasesdedupe table. Repeated saga compensation no longer double-credits stock.
- Method-level ACL (internal/auth/acl.go). The gRPC server interceptor now also checks a per-method allow-list. Inventory will only accept calls from
orders, orders only fromgateway. - Real gateway authentication (internal/auth/userjwt.go, internal/gateway/auth_middleware.go). Users log in at
POST /api/v1/auth/login, get ashopapi.user-audienced JWT, and use it on every/orderscall. The gateway derivescustomer_idfrom the token subject and rejects mismatched client-supplied values (closes a tenant-mixing hole). - mTLS between services (internal/tlsutil, scripts/gen-certs.sh). All gRPC links use TLS 1.3 with
RequireAndVerifyClientCert. Dev CA + per-service leaf certs are generated by an idempotent openssl script.
- OTel Collector + per-service metrics (deploy/otel-collector-config.yaml,
internal/{gateway,orders,inventory,payments}/metrics.go). Each service emits OTLP metrics; the Collector fans out traces to Jaeger and metrics to a Prometheus scrape endpoint on:8889. Counters:http_requests_total,orders_created_total{result},inventory_reserve_total{result},inventory_release_total{result},payments_total{result}, plus pgx-emitted DB latency histograms. - Production trace sampling (internal/telemetry/tracer.go). Replaced
AlwaysSamplewithParentBased(TraceIDRatioBased(env(OTEL_TRACES_SAMPLER_ARG, 1.0))). Default keeps all traces for dev/smoke; production overlays set the ratio (e.g.0.05) and turn on the Collector'stail_samplingprocessor for richer policies.
- Helm chart (deploy/helm/shopapi). Full stack — Postgres-per-service StatefulSets, NATS JetStream, Jaeger, OTel Collector, four service Deployments, three pre-install migrate hooks, cert-manager
Issuer+Certificates for mTLS Secrets.make k8s-upbrings up a kind cluster and installs the chart end-to-end;make smoke-k8sruns the K8s equivalent of the smoke test.
cmd/{gateway,orders,inventory,payments} service main packages
internal/{gateway,orders,inventory,payments} service code
internal/auth service JWT + user JWT + gRPC interceptors + ACL
internal/events event types, NATS bus, envelope helpers (carrier inject/extract)
internal/outbox transactional outbox writer + drainer (raw pgx, schema-identical across services)
internal/tlsutil mTLS server/client TransportCredentials
internal/telemetry OTel SDK setup (traces + metrics, configurable sampler)
internal/dbpool pgx pool with otelpgx wired in
internal/pb/... generated gRPC code
proto/ service contracts
migrations/{orders,inventory,payments} golang-migrate files (incl. outbox + processed_releases)
sql/{orders,inventory,payments}/queries.sql sqlc input
scripts/smoke.sh docker-compose smoke
scripts/smoke-k8s.sh K8s smoke (port-forward + kubectl exec)
scripts/gen-certs.sh idempotent dev CA + per-service certs
deploy/otel-collector-config.yaml collector config (used by compose AND embedded in the chart)
deploy/helm/shopapi/ full Helm chart
- Open Jaeger at http://localhost:16686 → Service:
gateway→ Find Traces. - Click any trace to see the flame graph across all services.
- Search by
tags: order.id=Nto find every span for a given order. - Click any span → Tags pane shows DB query text, gRPC method, NATS subject.
- Span events (the small markers on the timeline) capture interesting branches like
already_paid_skippedandconcurrent_duplicate.
curl -s http://localhost:8889/metrics | grep -E '^shopapi_(orders|inventory|payments|http)' | head
Counters arrive ~5s after each request (the OTel SDK's PeriodicReader interval). For real-time observability, point Prometheus / Grafana at the Collector's :8889 and build dashboards from the per-service counters.
- Local (compose):
make up && make smoke. Single host, hot iteration. - Local kind:
make k8s-up && make smoke-k8s. Same chart you'd ship to staging/prod, running on a single-node cluster. - Cluster:
helm upgrade --install shopapi deploy/helm/shopapi --values prod.yaml --namespace shopapi. Production overlay should swap demo Secrets for ExternalSecrets / SOPS / Vault, lower the sampler ratio, point Postgres at an operator-managed cluster, and wire an Ingress in front ofshopapi-gateway.
- Tail sampling — the Collector config has the
tail_samplingprocessor block written but commented out (head sampling is fine for the demo). Prod-grade: keep all error traces, sample 1% of OK traces. - Logs as traces — slog is wired but logs aren't shipped through OTLP. Add
otlploghttpand a logs pipeline at the Collector. - Postgres / NATS / Jaeger operators — the chart ships its own bare StatefulSets for self-containment. Real prod uses CloudNativePG / the NATS Helm chart / Jaeger Operator.
- Multi-replica payments — currently
replicas: 1because the JetStream consumer config has no queue group. Outbox +UNIQUE(order_id)make duplicate deliveries safe, but rebalancing semantics need a closer look before scaling out.