Skip to content

samsonthomas951/shopapi

Repository files navigation

shopapi

E-commerce microservices in Go. Now with traces, metrics, mTLS, real auth, an outbox, and a Helm chart.

Quick start

make up      # generates dev certs, builds + runs the compose stack
make smoke   # full end-to-end check (login → saga → outbox → mTLS → metrics → traces)
  • Public API at http://localhost:8080 (POST /api/v1/auth/login first, then /api/v1/orders)
  • Jaeger UI at http://localhost:16686
  • Prometheus scrape at http://localhost:8889/metrics (via the OTel Collector)
  • NATS monitoring at http://localhost:8222

Architecture

                            +---------------+
                            |   user/UI     |
                            +-------+-------+
                                    | HTTPS + user JWT (HS256, aud=shopapi.user)
                            +-------v-------+
                            |   gateway     |  :8080
                            +-------+-------+
                                    | gRPC over mTLS + service JWT + ACL + trace_ctx
                            +-------v-------+
              +---publish-->|    orders     |  :9001  → ordersdb (+ outbox)
              |             +-------+-------+
              |                     | gRPC over mTLS + service JWT + ACL + trace_ctx
              |                     v
              |             +---------------+
              |             |   inventory   |  :9002  → inventorydb (+ processed_releases)
              |             +---------------+
              |
+-------------+-------+      +-----------------+      +----------+
|   NATS JetStream    |      |  OTel Collector |--->  |  Jaeger  |  :16686
|  ORDERS, PAYMENTS   |      |  :4317/:4318    |      +----------+
+----+-----------+----+      |     :8889       |  →   Prometheus
     |           ^           +-------+---------+
     v           |                   ^
+---------+    publish               | OTLP traces + metrics from every service
| payments| ---/  ---------------------'
+---------+    → paymentsdb (+ outbox)

Every span across HTTP, gRPC, NATS, and pgx is exported to the Collector and fanned out to Jaeger (traces) and Prometheus (metrics). A single POST /orders is one trace, end to end.

What's done

Parts 1–3 (foundation)

  • gRPC contracts, code generation, status code propagation
  • Database per service, no cross-DB FKs, atomic stock with CHECK constraints
  • Event-driven payments with NATS JetStream, durable consumers, idempotent handlers
  • Saga compensation: failed payment → release stock + cancel order
  • Inter-service JWT auth via gRPC unary interceptors
  • Distributed tracing: HTTP, gRPC, pgx, NATS — all stitched into one trace

Phase 1 — Correctness

  • Transactional outbox (internal/outbox). Events are written to a per-service outbox table inside the same DB transaction as the domain change; a drainer goroutine publishes them to NATS. Trace context is preserved across the outbox so consumer spans link to the original request.
  • Idempotent inventory.Release (internal/inventory/store.go). Caller-supplied release_id + a processed_releases dedupe table. Repeated saga compensation no longer double-credits stock.

Phase 2 — Security

  • Method-level ACL (internal/auth/acl.go). The gRPC server interceptor now also checks a per-method allow-list. Inventory will only accept calls from orders, orders only from gateway.
  • Real gateway authentication (internal/auth/userjwt.go, internal/gateway/auth_middleware.go). Users log in at POST /api/v1/auth/login, get a shopapi.user-audienced JWT, and use it on every /orders call. The gateway derives customer_id from the token subject and rejects mismatched client-supplied values (closes a tenant-mixing hole).
  • mTLS between services (internal/tlsutil, scripts/gen-certs.sh). All gRPC links use TLS 1.3 with RequireAndVerifyClientCert. Dev CA + per-service leaf certs are generated by an idempotent openssl script.

Phase 3 — Observability

  • OTel Collector + per-service metrics (deploy/otel-collector-config.yaml, internal/{gateway,orders,inventory,payments}/metrics.go). Each service emits OTLP metrics; the Collector fans out traces to Jaeger and metrics to a Prometheus scrape endpoint on :8889. Counters: http_requests_total, orders_created_total{result}, inventory_reserve_total{result}, inventory_release_total{result}, payments_total{result}, plus pgx-emitted DB latency histograms.
  • Production trace sampling (internal/telemetry/tracer.go). Replaced AlwaysSample with ParentBased(TraceIDRatioBased(env(OTEL_TRACES_SAMPLER_ARG, 1.0))). Default keeps all traces for dev/smoke; production overlays set the ratio (e.g. 0.05) and turn on the Collector's tail_sampling processor for richer policies.

Phase 4 — Deployment

  • Helm chart (deploy/helm/shopapi). Full stack — Postgres-per-service StatefulSets, NATS JetStream, Jaeger, OTel Collector, four service Deployments, three pre-install migrate hooks, cert-manager Issuer+Certificates for mTLS Secrets. make k8s-up brings up a kind cluster and installs the chart end-to-end; make smoke-k8s runs the K8s equivalent of the smoke test.

Layout

cmd/{gateway,orders,inventory,payments}   service main packages
internal/{gateway,orders,inventory,payments}   service code
internal/auth        service JWT + user JWT + gRPC interceptors + ACL
internal/events      event types, NATS bus, envelope helpers (carrier inject/extract)
internal/outbox      transactional outbox writer + drainer (raw pgx, schema-identical across services)
internal/tlsutil     mTLS server/client TransportCredentials
internal/telemetry   OTel SDK setup (traces + metrics, configurable sampler)
internal/dbpool      pgx pool with otelpgx wired in
internal/pb/...      generated gRPC code
proto/               service contracts
migrations/{orders,inventory,payments}   golang-migrate files (incl. outbox + processed_releases)
sql/{orders,inventory,payments}/queries.sql   sqlc input
scripts/smoke.sh           docker-compose smoke
scripts/smoke-k8s.sh       K8s smoke (port-forward + kubectl exec)
scripts/gen-certs.sh       idempotent dev CA + per-service certs
deploy/otel-collector-config.yaml   collector config (used by compose AND embedded in the chart)
deploy/helm/shopapi/                 full Helm chart

Tracing tips

  1. Open Jaeger at http://localhost:16686 → Service: gateway → Find Traces.
  2. Click any trace to see the flame graph across all services.
  3. Search by tags: order.id=N to find every span for a given order.
  4. Click any span → Tags pane shows DB query text, gRPC method, NATS subject.
  5. Span events (the small markers on the timeline) capture interesting branches like already_paid_skipped and concurrent_duplicate.

Metrics tips

curl -s http://localhost:8889/metrics | grep -E '^shopapi_(orders|inventory|payments|http)' | head

Counters arrive ~5s after each request (the OTel SDK's PeriodicReader interval). For real-time observability, point Prometheus / Grafana at the Collector's :8889 and build dashboards from the per-service counters.

Choose your runtime

  • Local (compose): make up && make smoke. Single host, hot iteration.
  • Local kind: make k8s-up && make smoke-k8s. Same chart you'd ship to staging/prod, running on a single-node cluster.
  • Cluster: helm upgrade --install shopapi deploy/helm/shopapi --values prod.yaml --namespace shopapi. Production overlay should swap demo Secrets for ExternalSecrets / SOPS / Vault, lower the sampler ratio, point Postgres at an operator-managed cluster, and wire an Ingress in front of shopapi-gateway.

Still on the radar (not in scope here)

  • Tail sampling — the Collector config has the tail_sampling processor block written but commented out (head sampling is fine for the demo). Prod-grade: keep all error traces, sample 1% of OK traces.
  • Logs as traces — slog is wired but logs aren't shipped through OTLP. Add otlploghttp and a logs pipeline at the Collector.
  • Postgres / NATS / Jaeger operators — the chart ships its own bare StatefulSets for self-containment. Real prod uses CloudNativePG / the NATS Helm chart / Jaeger Operator.
  • Multi-replica payments — currently replicas: 1 because the JetStream consumer config has no queue group. Outbox + UNIQUE(order_id) make duplicate deliveries safe, but rebalancing semantics need a closer look before scaling out.

About

learning micro-services using go

Topics

Resources

Stars

Watchers

Forks

Contributors