Skip to main content

Command Palette

Search for a command to run...

Software Design and architecture Docs

Published
31 min read
S

I’m not a prodigy. I didn’t start at 13. I learned the hard way — error by error, late night by late night. I still Google basic stuff, still mess up, still doubt myself. But I show up daily, build real things, and document the process without filters. No polished aesthetics. No fake “10x dev” talk. Just one guy trying to master his craft — publicly, consistently, and without shortcuts. If you relate to the grind more than the glory, welcome to my corner of the internet.

1. Introduction

1.1 Purpose

The purpose of this Software Design & Architecture (SDA) document is to translate the requirements defined in the Software Requirements Specification (SRS) into a concrete design blueprint for the authentication system.

This document provides the structural and technical foundation for implementation, covering architecture, data models, modules, and interaction flows. The authentication system will serve as a core service, ensuring that users can securely register, log in, and manage their identities.

By defining the architecture and design choices upfront, this document ensures that the system is secure, scalable, and extensible enough to evolve into enterprise-grade identity and access management (IAM) in the future.

1.2 Scope

This document focuses on the authentication subsystem, detailing its architecture, data flow, and integration with supporting services such as caching, external identity providers, and monitoring. Authorization beyond basic role-based access control (RBAC) is out of scope for this version but considered for future extensions.

1.3 References

  • Software Requirements Specification (SRS) – Authentication System

  • RFC 7519 – JSON Web Tokens (JWT)

  • OAuth 2.0 & OpenID Connect specifications

  • MongoDB documentation

  • Redis documentation

1.4 Definitions & Abbreviations

  • IAM: Identity and Access Management

  • MFA: Multi-Factor Authentication

  • SSO: Single Sign-On

  • JWT: JSON Web Token

2. Architecture Goals & Principles

2.1 Architecture Goals

The authentication system is designed with the following overarching goals:

  • Balance security and usability: Strong security controls are implemented without creating unnecessary friction for end users.

  • Scale to thousands of users: The system is optimized for small-to-medium scale (portfolio or demo use), but follows patterns that can extend to larger user bases if required.

  • High availability: Authentication is mission-critical; the design minimizes single points of failure and ensures redundancy where possible.

  • Extensibility for learning: While this implementation is intended for portfolio demonstration, the architecture models patterns used in enterprise-grade identity systems.

  • Enterprise readiness: Even at small scale, the design follows enterprise-level principles (RBAC, observability, federation, testing).

  • Portability: The system can be deployed on any hosting platform (e.g., Render, AWS, GCP) without being locked into proprietary services.

2.2 Architecture Principles

The following principles guide the design and implementation:

  • Security by Design

    • Security is treated as a first-class concern, but always balanced with usability.

    • Sensitive data (passwords, tokens, secrets) is always encrypted at rest and in transit.

    • Authentication follows least privilege by enforcing fine-grained RBAC from the start.

  • API-first, UI-enabled

    • All features are exposed via APIs (REST/GraphQL), ensuring integration with other systems.

    • A user-facing UI is layered on top for usability and demonstration purposes.

  • Stateful session management (with flexibility)

    • Sessions are managed with a database (MongoDB) and cache (Redis) for performance.

    • While not stateless, the design allows migration to stateless token-based flows if required.

  • Observability built-in

    • Logging, monitoring, and auditing are integral parts of the architecture.

    • Security events (e.g., failed logins, privilege escalations) are tracked from day one.

  • High availability & fault tolerance

    • Components are designed to recover gracefully from failure.

    • Database and cache are configured to support redundancy and replication.

  • Extensible Identity Federation

    • From the initial release, support for external providers (Google, GitHub) is built in.

    • Future extension to other providers (e.g., Facebook, SAML-based enterprise IdPs) is supported by the modular architecture.

  • Performance-oriented development

    • Design favors rapid development using proven libraries, but with an eye on efficiency (e.g., caching tokens, minimizing DB queries).
  • Testability as a principle

    • The system is built with automated unit and integration testing in mind.

    • Authentication flows (registration, login, MFA, RBAC enforcement) are validated systematically.

  • Enterprise complexity with portfolio clarity

    • The design intentionally mirrors enterprise-grade IAM systems (MFA-ready, federated logins, RBAC, monitoring).

    • At the same time, the documentation emphasizes clarity and learning value for portfolio purposes.

3. High-Level Architecture

This section describes the major building blocks of the authentication system, how they interact, and how the system should be deployed to meet the goals defined in Section 2 (balance security/usability, scale to thousands of users, high availability, portability, observability, and enterprise-ready practices).

3.1 Context Diagram

[Client Apps]
     │
     ▼
[API Gateway] ──(authn, throttling, TLS)──► [Auth Service / Auth Core]
                                         ├─► [Token & Session Store (Redis)]
                                         ├─► [Primary DB (MongoDB): users, roles, audit]
                                         ├─► [External IdPs (OAuth/OIDC)]
                                         └─► [Security Services: hashing, KMS, rate-limiter]
                                               │
                                               └─► [Monitoring & Logging: Prometheus/Grafana, ELK]

Notes & Next Steps

  • Define token lifetimes (access token TTL, refresh token TTL) and refresh rotation policy in the Session/Token Module (LLD).

  • Specify which responsibilities live in the API Gateway vs Auth Core (e.g., CSP enforcement, early 401/429 responses).

  • Add deployment considerations: single-region start with plan for multi-region session replication if needed.

3.2 Container / Component Diagram

Mermaid (Component) — paste into any Mermaid-capable renderer

Components & responsibilities (summary)

  • API Gateway / Auth Proxy: TLS termination, request routing, rate limiting, basic auth protection; passes requests to Auth API. Can handle initial JWT verification for static routes.

  • Auth API Service: Primary HTTP API (register, login, logout, refresh, password reset, role management). Implements business logic and validation.

  • Token Service: Issues and validates access tokens; handles refresh logic, rotation, revocation. Consults Redis for refresh token state.

  • User Service (DB): Stores user profiles, roles, permissions, account status. MongoDB with indexes for unique constraints.

  • MFA Service: TOTP generation/verification, backup codes, SMS/Email OTP orchestration (via Notification adapter).

  • OAuth Adapter: Encapsulates federation logic for Google/GitHub; normalizes provider identities to local user accounts.

  • Redis: In-memory store for refresh tokens, session indices, ephemeral locks, rate-limiting counters, and short-lived state.

  • Audit & Logging: Append-only audit log storage (also stored in Mongo or shipped to ELK). Critical for non-repudiation and incident response.

  • Monitoring / Observability: Metrics (Prometheus), tracing (OpenTelemetry), logs (ELK or Loki), alerting (Alertmanager).

  • WAF / CDN: Optional edge protection for brute-force, DDoS mitigation, and static assets.3.3 Deployment View (Logical cloud infra)

Mermaid code (Deployment) — paste into Mermaid rendere

Deployment notes

  • Kubernetes or managed containers (Render, AWS ECS, GKE) recommended for portability and scaling.

  • Load Balancer + CDN: LB for traffic distribution; CDN for static UI assets and to terminate TLS at edge.

  • MongoDB Replica Set: Primary + secondaries, periodic backups, and automated failover.

  • Redis Cluster with Sentinel / Managed Redis: Use Redis for ephemeral session/refresh token storage and rate-limiter counters.

  • Workers: Background job processors (email OTPs, token cleanup, auditing sync).

  • Observability stack: Prometheus, Grafana, Loki/ELK, OpenTelemetry traces.

  • Secrets management: Use cloud secret manager (or Kubernetes secrets backed by vault) for signing keys and provider secrets.

3.4 Data & Control Flow (step-by-step)

  1. Login / Register

    • Client → API Gateway → Auth API

    • Auth API validates credentials → UserService (Mongo)

    • On success: Token Service issues Access Token (short-lived) + Refresh Token (persistent)

    • Refresh token stored in Redis with device_id, token_id, expiry TTL

    • Access token returned to client (in-memory or HttpOnly cookie depending on app type); refresh token delivered via HttpOnly cookie or cookie+DB depending on design

  2. API Request

    • Client sends request with access token (Authorization header or HttpOnly cookie)

    • API Gateway verifies signature (or delegates to Token Service) → route to Auth API or backend services

  3. Token Refresh

    • Access token expired → client hits /refresh

    • Server validates refresh token (Redis) → rotates new refresh token and issues new access token

  4. Logout / Revoke

    • Client requests logout → Auth API deletes refresh token entry from Redis and expires cookies

    • Audit log entry written

  5. Federation Login

    • Client redirected to OAuth Adapter → external provider → callback → OAuthAdapter normalizes identity → link or create local user → issue tokens as above

3.5 How this maps to Architecture Goals & Principles

  • Balance security & usability: short-lived access tokens + revocable refresh tokens; MFA and federation optional but built-in.

  • Scale to thousands: stateless-ish APIs + Redis for ephemeral session state; services horizontally scalable.

  • High availability: LB, DB replica set, Redis cluster, multiple pod replicas.

  • Portability: containerized services, managed DB/Redis or self-hosted; works on Render/AWS/GCP.

  • Observability: Telemetry, logging, and alerting integrated; security events are first-class metrics.

  • Extensibility & enterprise parity: modular services (MFA, OAuth adapter, Token Service) allow future growth into SSO/federation.

3.6 Security & Operational considerations (actionable)

  • TLS everywhere; terminate TLS at LB or CDN, but enforce end-to-end where possible.

  • Secrets stored in Secret Manager / Vault; signing keys rotated periodically.

  • Cookie flags: HttpOnly, Secure, SameSite=Strict/Lax depending on UX.

  • CSRF protection when using cookies (SameSite + CSRF tokens for non-GET stateful endpoints).

  • Token rotation & revocation: implement refresh token rotation, store token metadata in Redis (device_id, issued_at, ip_hash).

  • Rate limiting: per-IP and per-account; counters in Redis.

  • WAF & Bot protection for brute-force mitigation.

  • Audit trail: Immutable audit log for security actions (store centrally, retention policy).

  • Backups: Automated backup schedule for Mongo; snapshot retention and test restores.

  • Disaster recovery: Cross-region replicas for Mongo if required; plan RTO/RPO.

3.7 Metrics & Alerts (examples you should include)

  • Auth Success Rate (errors / second) — alert if error rate > 1% of traffic

  • Login Latency (p95) — alert if > 200ms (your SRS perf target)

  • Token Refresh Latency

  • Failed Login Attempts per IP / Account — alert and auto-throttle

  • Redis Memory Usage — alert if > 75%

  • Mongo Primary Election / Replica Lag — alert when lag > threshold

  • High number of revoked tokens — possible breach indicator

3.8 Deliverables / Artifacts to attach

  • Mermaid/PlantUML component & deployment diagrams (this section)

  • OpenAPI/Swagger spec for Auth API

  • Terraform/Kubernetes manifests or Docker Compose (deployment examples)

  • Prometheus/Grafana dashboards and alert rules

  • Threat model (STRIDE), and an audit log retention + access policy

4. Detailed Component Design — 7 Services

Service 1 — Auth API Service (Orchestrator / Gateway-facing)

Responsibilities

  • Primary external entry point for clients (web/mobile/3rd-party).

  • Validate request schemas, throttle/rate-limit, anti-bruteforce.

  • Orchestrate flows: login, register, refresh, logout, password reset, federation callbacks, MFA challenges.

  • Enforce API-level auth (cookie/read access token presence) and forward to backend services.

  • Sanitize and return errors in consistent format.

Public Interfaces (REST)

POST /v1/auth/login
POST /v1/auth/register
POST /v1/auth/refresh
POST /v1/auth/logout
POST /v1/auth/password/forgot
POST /v1/auth/password/reset
GET  /v1/auth/oauth/callback
POST /v1/auth/mfa/verify

Example /login (req/resp)

POST /v1/auth/login
{
  "email":"user@example.com",
  "password":"hunter2",
  "device_id":"chrome-2025-10-01",
  "client_info":{ "ip":"1.2.3.4", "ua":"…" }
}
200 OK
{
  "accessToken": "<jwt>",
  "expiresIn": 900
}

(Refresh token is issued/stored server-side and returned via HttpOnly cookie.)

Internal Interfaces / Calls

  • UserService.verifyCredentials(email, password, deviceInfo)

  • MFAService.checkRequired(userId) → maybe challenge

  • TokenService.issueTokens(userId, deviceId, scope)

  • AuditService.logEvent(eventType, meta)

  • OAuthService.handleCallback(params)

Data/Storage

  • No own DB (stateless). Relies on other services.

  • Short-lived in-memory caches for throttling counters (or Redis).

Security

  • Validate input with strict schema (JSON schema).

  • Rate-limit by IP and account.

  • WAF at edge recommended.

  • All responses use generic failure messages to avoid user enumeration.

Scaling & HA

  • Stateless pods; scale horizontally behind LB.

  • Use shared Redis for rate-limits.

  • Keep session affinity unnecessary.

Failure modes & mitigation

  • Downstream user DB or Redis failure → return 503 + circuit breaker.

  • Use exponential backoff and retry for transient ops.

Observability

  • Metrics: request rate, error rate, latency (p95,p99), auth failures per account/IP.

  • Logs: structured JSON with trace id and minimal PII (never log raw passwords).

  • Events: emit audit events for login success/failure, token refresh, password changes.

Testing

  • Unit tests for controllers & validators.

  • Integration tests mocking UserService/TokenService.

  • E2E tests for full flows.

Service 2 — User Management Service

Responsibilities

  • Persistent user records (create/read/update/delete).

  • Password hashing + verification (Argon2 / bcrypt).

  • Email/phone verification lifecycle.

  • Link/unlink external provider identities.

  • Expose CRUD + admin operations for roles & attributes.

Interfaces (internal API)

verifyCredentials(email, password) -> {user, status}
createUser(userPayload) -> user
findUserByEmail(email) -> user
updateUser(userId, patch) -> user
linkExternalIdentity(userId, provider, providerUserId, attrs)
incrementFailedAttempt(userId)
resetFailedAttempts(userId)

Data model (Mongo collections)

  • users (id, email, password_hash, name, phone, mfa_enabled, status, createdAt, lastLogin, failedAttempts, device_metadata[])

  • user_oauth (userId, provider, providerUserId, linkedAt)

  • roles & role_permissions or stored separately in Policy Service

Security

  • Passwords hashed with Argon2 (recommended) or bcrypt with strong params.

  • Enforce unique email/phone indexes.

  • Use per-user salt (inherent to Argon2).

  • PII at rest encrypted (field-level or DB-level).

Interactions

  • Called by Auth API for verify/create.

  • Emits user.created, user.updated, user.failed_login, user.locked events to AuditService.

Scaling

  • Vertical or read-replicas for reads (if many reads).

  • Use indexes for auth lookups by email.

  • If user volume grows huge, consider sharding.

Failure

  • DB replica lag → stale reads for lastLogin. Handle eventual consistency.

Observability

  • Metrics: user create rate, failed_logins per minute, account locks.

  • Logs: events for user lifecycle.

Testing

  • Hash/verify unit tests, migration tests, unique constraint tests.

Service 3 — Token & Session Service ( Token Service + Session Store)

Responsibilities

  • Issue/validate JWT access tokens.

  • Create, rotate, revoke refresh tokens.

  • Persistent session index in Redis (fast lookups) and optional DB backup.

  • Device binding: tie refresh token to device_id and metadata (IP hash, UA).

  • Enforce token revocation & Blacklist/allow-list logic.

Interfaces

issueTokens(userId, deviceId, clientInfo) -> { accessToken, refreshTokenId }
validateAccessToken(accessToken) -> { valid, claims }
validateRefreshToken(refreshToken, deviceId) -> { valid, tokenId }
rotateRefreshToken(oldTokenId) -> newTokenId
revokeRefreshToken(tokenId)
revokeAllForUser(userId)
listActiveSessions(userId) -> [session]

Data storage & key schema

  • Redis hash per refresh token: refresh: < tokenId > → {userId, deviceId, issuedAt, expiresAt, nonce, rotated : false}

  • Optionally: Reverse index: user_sessions:<userId> → sorted set tokenIds by issuedAt (for listing)

  • JWT signing keys stored in Secrets Manager; rotate and support key IDs (kid) in token header.

Security & rotation

  • Access token short-lived (e.g., 10–15m).

  • Refresh token backed by Redis and rotated on each use. On rotation, mark old token revoked.

  • Enforce deviceId + refresh token mapping.

  • Implement detection: if rotated refresh token used twice => suspect token theft => revoke all sessions & force MFA.

Token Storage Flow Diagram (Hybrid Approach)

                ┌────────────────────────┐
                │        Browser          │
                │                        │
                │  HttpOnly Secure Cookie │
                │  ────────────────┐     │
                │  • Access Token   │     │
                │  • Refresh Token? │(opt)│
                └─────────┬────────┘
                          │
                          ▼
                ┌────────────────────────┐
                │       API Server        │
                │                        │
                │  1. Validates Access    │
                │     Token from cookie   │
                │  2. If expired →        │
                │     uses Refresh Token  │
                └─────────┬────────┬─────┘
                          │        │
                          │        ▼
                          │   ┌──────────────────┐
                          │   │    Redis DB      │
                          │   │ (Refresh Tokens) │
                          │   │ • user_id        │
                          │   │ • token_id       │
                          │   │ • device_id      │
                          │   │ • expiry (TTL)   │
                          │   └──────────────────┘
                          │
                          ▼
                ┌────────────────────────┐
                │     MongoDB (Core)     │
                │                        │
                │ • Users                │
                │ • Roles & Permissions  │
                │ • Audit Logs           │
                └────────────────────────┘

How it works

  1. User logs in →

    • Server issues Access Token (short-lived) in cookie.

    • Server also issues Refresh Token (long-lived) and stores it in Redis (linked to user & device).

  2. On each request →

    • Browser automatically sends Access Token via cookie.

    • If Access Token expired → server checks Redis for refresh token validity.

    • If valid → new Access Token issued + refresh rotation.

  3. Logout →

    • Access Token cookie cleared.

    • Refresh Token removed from Redis.

Scaling

  • Redis cluster with replication. Use TTLs to auto-expire sessions.

  • TokenService itself stateless: replicate across nodes.

Failure modes

  • Redis OOM or unavailability: fallback? If Redis down, you can block refreshes (fail safe).

  • Key rotation mismatch: keep previous signing keys for validation.

Observability

  • Metrics: token issuance rate, refresh attempt rate, refresh failures, revocations.

  • Audit events for token rotation and mass-revoke.

Testing

  • Exhaustive token rotation tests, replay detection tests, TTL expiry tests.

Service 4 — MFA Service

Responsibilities

  • Manage TOTP secrets, backup codes, SMS/Email OTPs.

  • Provide enrollment/setup flows & challenge verification.

  • Rate-limit OTP requests and coordinate with Notification service.

Interfaces

setupTOTP(userId) -> { secret, qrCodeUri }   // One-time return
verifyTOTP(userId, code) -> boolean
generateBackupCodes(userId) -> [codes]
sendOTPViaSMS(userId, phone) -> otpId
verifyOTP(otpId, code) -> boolean

Data & secrets

  • mfa_credentials collection (userId, type, secret_hash, createdAt, lastUsed).

  • When storing TOTP secrets, either encrypt the secret or store derived secrets (hash) and keep cleartext only briefly at setup.

Security

  • Use HMAC-based TOTP generation libraries, prevent brute force by counting attempts.

  • Store backup codes hashed (not plaintext).

  • Limit OTP send rates (per user, per IP).

Interactions

  • Called by Auth API during login if mfa_enabled.

  • Emits mfa.challenge, mfa.success, mfa.failed to Audit.

Scaling

  • Stateless verification service; scale horizontally.

  • Use worker/queue for SMS/email sending.

Failure

  • SMS provider outage → fallback to email or show error.

  • If verification store is unavailable, deny MFA-challenged auth attempts (fail secure).

Observability

  • Metrics: OTP sent, OTP verify success/fail, TOTP verify latency.

Testing

  • TOTP golden tests with fixed seeds, replay attacks, rate limit tests.

Service 5 — Audit & Telemetry Service

Responsibilities

  • Collect, store, and index audit events (login success/failure, token rotation, role changes).

  • Expose querying for compliance and admin UIs.

  • Ship metrics to Prometheus/Grafana and logs to ELK/Loki.

Interfaces

  • POST /v1/audit/events (internal)

  • GET /v1/audit/user/:id (admin)

  • Metrics: Prometheus scrape endpoints

Data model

  • audit_logs (logId, timestamp, eventType, userId, tokenId, ip, deviceInfo, meta JSON)

  • Immutable append-only. Retention policy (e.g., 90 days in hot store, cold archive thereafter).

Storage

  • Elasticsearch or Mongo (append-only), backed by object storage for archived logs.

Security

  • Strong RBAC for audit querying endpoints.

  • Ensure tamper-resistance: sign logs or write to append-only store.

Interactions

  • Subscribes to events from all services via event bus.

  • Receives direct writes from Auth API for immediate-critical events.

Scaling

  • Indexing pipeline for logs; use partitioning by time.

  • Archival jobs.

Failure

  • Logging delays acceptable but loss of logs is not; use local persistence & retry.

Observability

  • Provide dashboards, alerts for anomalies (spike in failed logins).

Testing

  • Ensure event semantics, retention, query performance.

Service 6 — OAuth / Federation Service

Responsibilities

  • Encapsulate provider-specific logic for OAuth/OIDC (Google, GitHub etc.).

  • Normalize provider identity to canonical internal user profile.

  • Handle redirect/callback flows and account linking/unlinking.

Interfaces

  • GET /v1/oauth/{provider}/authorize → redirect URL

  • GET /v1/oauth/{provider}/callback?code=... → handle callback, exchange code for token, normalize user info

Interactions

  • Calls third-party provider endpoints (token exchange, userinfo).

  • Calls UserService.linkExternalIdentity or UserService.createUser with normalized data.

  • Calls TokenService.issueTokens.

Security

  • Protect redirect URIs, support PKCE for public clients.

  • Keep client secrets in Secret Manager.

  • Validate provider certs and token signatures.

Scaling

  • Stateless; scale horizontally.

  • Cache provider metadata.

Failure

  • Provider downtime → fallback message and retry.

  • Inconsistent provider data → require manual review.

Observability

  • Metrics: federation success rate, provider error rates.

  • Logs: provider responses, mapping decisions.

Testing

  • Contract tests with provider mocks; end-to-end with test OAuth clients.

Service 7 — Policy / RBAC Service (Auth Z)

Responsibilities

  • Store roles, permissions, and policies. Evaluate authorization requests (PDP — policy decision point).

  • Support both RBAC and ABAC (attributes-based) evaluation for fine-grained permissions.

  • Expose APIs for admin to manage roles & permissions.

Interfaces

isAllowed(userId, resource, action, context) -> {allowed, reason}
getPermissions(userId)
assignRole(userId, role)
createRole(roleName, permissions)

Data model

  • roles (roleId, name, description)

  • permissions (permId, name, description)

  • role_permissions mapping

  • user_roles mapping (or refer to UserService storing role list)

Interactions

  • Auth API consults PolicyService.isAllowed(...) before returning data for sensitive endpoints.

  • Admin UI calls this service for role management.

Security

  • Strict admin auth for role changes; all changes emitted to AuditService.

  • Cache decisions for short time (TTL) for performance; cache invalidation on role change.

Scaling

  • Read-heavy; use caching (Redis). Use eventual consistency for changes (invalidate caches).

Failure

  • If Policy service down, choose fallback: fail-closed (deny) or fail-open (risky). For security, prefer fail-closed for critical checks.

Observability

  • Metrics: authorization decision latency, cache hit rate, denials per minute.

Testing

  • Policy unit tests for edge cases; property-based tests for combinatorial policies.

Cross-Service Design Patterns & Extras

Eventing & Contracts

  • Use an event bus to decouple Audit/Telemetry & asynchronous tasks. Examples: user.created, login.success, login.failed, token.rotated, mfa.failed.

  • Keep event schema versioned.

Secrets & Keys

  • JWT signing keys stored in Secret Manager / Vault. Support key rotation and kid header. Keep old public keys for validation until expiry.

Token rotation & replay detection

  • Rotation: issue new refresh token with every refresh. Mark previous as rotated; if an old token appears later, treat as theft -> revoke all sessions for that user and force re-authentication + alert.

Session listing & device management

  • TokenService/SessionService maintains user_sessions:<userId> sorted set. Auth UI can show active sessions and allow per-device logout.
  • Refresh tokens in HttpOnly cookie, SameSite=strict/lax depending on cross-site needs. Use an anti-CSRF token for state-changing endpoints if cookie-based.

Practical artifacts

  • For each service: OpenAPI spec (external APIs) + internal RPC spec (gRPC/HTTP).

  • Data model ERD & Redis key schema.

  • Sequence diagrams for Login, Refresh, Register, ForgotPassword, Federation (Mermaid).

  • Deployment manifests (k8s) with readiness/liveness probes & resource limits.

  • Prometheus metrics list + Grafana dashboard templates.

  • Threat model (STRIDE) + mitigation mapping to components.

  • Testing matrix (unit, integration, contract, E2E).

5. Data Design

5.1 Purpose

The Data Design section defines how all data within the authentication and authorization system is physically structured, stored, and managed.
It translates the logical entities identified in the SRS and high-level Data Design into concrete MongoDB collections and Redis key structures, ensuring optimal performance and integrity.

This section establishes the foundation for data consistency, security, and scalability across all modules that handle user identities, authentication tokens, sessions, roles, permissions, and telemetry data.
It also defines relationships between collections, indexing strategies, encryption and hashing requirements, and data lifecycle policies — covering how data is created, updated, retained, archived, and eventually purged.

Ultimately, this section ensures that every component interacting with the data layer adheres to unified design principles that support high availability, fault tolerance, and compliance with privacy standards.


5.2 Data Design Overview

The authentication system’s data layer is composed of:

LayerTechnologyPurpose
Primary DatabaseMongoDBPersistent storage for user profiles, roles, permissions, audit logs
Ephemeral StoreRedisFast storage for sessions, refresh tokens, rate limits, OTPs
Long-term LogsELK / LokiAppend-only audit and security events for monitoring and forensics
Secrets StorageSecret Manager / VaultSigning keys, OAuth client secrets, MFA seed encryption keys

Database Decision

Step 1: Identify the Nature of Each Data Type

We’ll classify every entity by data lifetime, volatility, access pattern, and sensitivity:

EntityData TypeLifetimeAccess PatternSecurity Sensitivity
usersCore identityLong-termModerate read/write🔥 Very High
last_devicesEphemeral behavior logMedium-termFrequent writes, occasional readsMedium
mfa_credentialSecret dataLong-termRare writes, critical reads🔥 Extreme
role, permission, user_role, role_permissionAccess control definitionsLong-termRead-heavyHigh
audit_logImmutable security logsLong-termWrite-heavy, rare readsHigh
oauth_providerToken/linked account dataMedium-termRare writes, occasional reads🔥 High
telemetry_eventBehavior dataShort to medium-termWrite-heavyMedium
session_storeTemporary session tokensShort-livedConstant read/write🔥 High

Step 2: Match Each Type to the Right Database

MongoDB (Primary Persistent Store)

For:

  • users

  • role, permission, user_role, role_permission

  • oauth_provider

Why:

  • Structured but flexible (perfect for identity & role data).

  • Easy JSON-based querying for user & role relations.

  • You can easily embed or reference relationships (1:N, N:M).

Encryption & Indexing:

  • Encrypt sensitive fields (password_hash, tokens).

  • Index on email, username, user_id.


Redis (Ephemeral, Fast Store)

For:

  • session_store

  • Possibly last_devices (if used for recent login cache)

Why:

  • Blazing fast, built for TTL (time-to-live) sessions.

  • Perfect for session invalidation, token rotation, and device caching.

  • Native expiration = no cleanup cron jobs.

Vault / KMS (Secret Store)

For:

  • mfa_credential.secret_hash

  • Encryption keys for JWT signing, refresh tokens, OAuth tokens

Why:

  • Redis/Mongo can be compromised; Vault isolates and encrypts secrets at rest.

  • You never expose raw secrets to your DB.

(Note: in your schema, secret should be a reference or encrypted placeholder, not the actual secret.)


Elasticsearch / PostgreSQL (Log Store)

For:

audit_log

  • telemetry_event

Why:

  • These are append-only, massive, and often queried by time.

  • Elasticsearch gives you full-text and fast time-based search (ideal for login analysis, fraud detection).

  • If you prefer structured relational logs, PostgreSQL with partitioned tables works fine too.


My Approach

Start with MongoDB + Redis, and later scale into a hybrid:

MongoDB → Users, Roles, OAuth, Permissions
Redis → Session Store, Last Devices
Vault → Secrets (MFA, keys)
Elasticsearch → Audit, Telemetry

Step 3: Visualize (Simple Overview)

                ┌──────────────────────────┐
                │   Authentication API     │
                └────────────┬─────────────┘
                             │
         ┌───────────────────┼───────────────────┐
         ▼                   ▼                   ▼
 ┌────────────┐       ┌──────────────┐     ┌────────────┐
 │  MongoDB   │       │    Redis     │     │  Vault/KMS │
 │ users      │       │ sessions     │     │ mfa secrets│
 │ roles      │       │ last_devices │     └────────────┘
 │ oauth_prov │       └──────────────┘
 │ permissions│
 └────────────┘
         │
         ▼
   ┌──────────────┐
   │ Elasticsearch│
   │ audit_logs   │
   │ telemetry    │
   └──────────────┘

Summary

TableBest DBReason
usersMongoDBLong-term core data
last_devicesRedis / MongoDBDepends on if you want cache or history
mfa_credentialVault + MongoDB refSecrets must be isolated
role / permission / user_roleMongoDBStable structure
audit_logElasticsearchTime-based search and analytics
oauth_providerMongoDBRarely updated identity link
telemetry_eventElasticsearchHeavy event ingestion
session_storeRedisFast token/session management

5.3 Logical Data Model

Core Collections (MongoDB)

CollectionKey FieldsDescription
usersid (PK), email, username, password_hash, mfa_enabled, email_verified, phone_verified, is_locked, created_at, updated_atCore user identity record. Primary lookup by email.
user_oauthprovider_id (PK), user_id (FK), provider_name, provider_user_id, access_token (encrypted), refresh_token (encrypted), expires_atStores external IdP linkages for federation (Google, GitHub).
rolesrole_id (PK), name, descriptionDefines named roles such as admin, user, etc.
permissionspermission_id (PK), name, descriptionGranular access rights that can be assigned to roles.
role_permissionsrole_id (FK), permission_id (FK)Many-to-many mapping of roles to permissions.
user_rolesuser_id (FK), role_id (FK)Many-to-many mapping of users to roles.
audit_logslog_id (PK), timestamp, user_id, event_type, ip, device, metadataImmutable append-only logs for compliance, breach detection, and analytics.
mfa_credentialsuser_id (FK), type, secret_hash, created_at, last_usedTOTP and OTP configuration per user.

Relationships

  • users (1) — (M) user_oauth (one user can link multiple providers)

  • users (M) — (M) roles through user_roles

  • roles (M) — (M) permissions through role_permissions

  • users (1) — (M) audit_logs

  • users (1) — (1) mfa_credentials (if MFA enabled)


5.4 Physical Data Model (Redis + MongoDB Keys)

Data TypeStorageKey Pattern / CollectionPurposeTTL
Access TokensClient memory / cookieJWT (not stored server-side)Short-lived access tokens for API auth15 min
Refresh TokensRedisrefresh:<tokenId>{user_id, device_id, issued_at, expires_at}Long-lived tokens for session refresh7–30 days
User Session IndexRedisuser_sessions:<userId>[tokenIds]Track all devices/sessions per userTTL = same as refresh
OTP CodesRedisotp:<userId>{code_hash, expires_at}Short-lived OTPs for MFA/forgot password2–5 min
Rate-Limit CountersRedisrate:<ip>Track login attempts per IP1 min
Audit Logs (Hot)MongoDBaudit_logsCritical user actions90 days
Audit Logs (Cold)Object Storage/archive/audit/yyyy-mm-dd.logArchived logs beyond 90 days1 year

5.5 Data Entities and Fields

5.5.1 users

FieldTypeDescription
idString (PK)Unique identifier (UUID).
nameStringDisplay name.
usernameStringUnique system username.
emailStringUnique email, used for login.
phone_numberStringOptional verified phone number.
bioStringOptional user bio.
password_hashStringArgon2/bcrypt hash of password.
profile_imageString (URL)Optional avatar path.
email_verifiedBooleanEmail verification status.
phone_verifiedBooleanPhone verification status.
is_lockedBooleanIndicates if account is temporarily locked due to failed logins.
mfa_enabledBooleanWhether user has MFA enabled.
is_logged_inBooleanReal-time login status (for analytics or session visualization).
forgot_password_tokenStringToken for password reset (hashed, short TTL).
verification_tokenStringToken for email/phone verification.
lock_untilDateTimeAccount lock expiry time.
forgot_password_expiryDateTimeToken expiry.
created_atDateTimeAccount creation timestamp.
updated_atDateTimeLast profile update.
last_login_atDateTimeLast successful login time.
last_failed_atDateTimeLast failed login attempt.

Indexes:

  • email (unique)

  • username (unique)

  • lock_until (TTL for auto-unlock, optional)


5.5.2 oauth_providers

FieldTypeDescription
provider_idString (PK)Unique ID for this provider entry.
user_idString (FK → users.id)Reference to local user.
provider_nameStringe.g., google, github.
provider_user_idStringProvider-side user ID.
access_tokenString (encrypted)OAuth access token.
refresh_tokenString (encrypted)OAuth refresh token.
expires_atDateTimeToken expiration time.

Indexes:

  • Compound (provider_name, provider_user_id) unique.

  • user_id indexed for reverse lookup.


5.5.3 roles, permissions, role_permissions

roles

FieldTypeDescription
role_idString (PK)Unique role ID.
nameStringRole name (admin, user, etc.).
descriptionStringHuman-readable description.

permissions

FieldTypeDescription
perm_idString (PK)Unique permission ID.
nameStringPermission keyword (e.g., USER_CREATE, VIEW_AUDIT).
descriptionStringHuman-readable permission label.

role_permissions

FieldTypeDescription
role_idString (FK → roles.role_id)Linked role.
perm_idString (FK → permissions.perm_id)Linked permission.

5.5.4 mfa_credentials

FieldTypeDescription
idString (PK)Unique ID.
user_idString (FK → users.id)Owner user.
typeEnum(TOTP, SMS, EMAIL)MFA method.
secret_hashStringEncrypted or hashed secret.
backup_codesArray[String]Hashed backup codes.
created_atDateTimeCreation timestamp.
last_usedDateTimeLast time MFA verified.

5.5.5 audit_logs

FieldTypeDescription
log_idString (PK)Unique log entry ID.
timestampDateTimeEvent timestamp.
event_typeStringType (e.g., LOGIN_SUCCESS, PASSWORD_RESET).
user_idString (FK → users.id)User related to event.
token_idString (optional)Related token, if applicable.
ip_addressStringOrigin IP.
device_infoObjectUser-agent or device metadata.
metadataJSONArbitrary contextual details.

5.6 Redis Schema (Ephemeral Data)

Key PatternTypeDescriptionTTL
refresh:{tokenId}Hash{ userId, deviceId, issuedAt, expiresAt, rotated }30d
user_sessions:{userId}Sorted SetActive sessions ordered by issuedAt30d
rate_limit:{ip}CounterRequest throttling per IPFew seconds
otp:{userId}HashTemporary OTPs for password reset / MFA5m

Redis stores no permanent user data—only volatile session and token state.


5.7 Index & Performance Design

CollectionIndexTypePurpose
usersemail (unique)B-treeFast login lookups
usersphone_number (unique)B-treeAccount linking
user_oauth(provider_name, provider_user_id)CompositeFast OAuth lookups
audit_logsuser_id, timestampCompoundTime-series queries
rolesname (unique)B-treeRole lookup
role_permissions(role_id, permission_id)CompositeAccess check joins
user_roles(user_id, role_id)CompositeRole assignment checks

All indexes are optimized for read-heavy workloads (login, token refresh, session validation).


5.8 Relationships Diagram (Conceptual)

erDiagram
    USERS ||--o{ OAUTH_PROVIDERS : "has"
    USERS ||--o{ MFA_CREDENTIALS : "has"
    USERS ||--o{ AUDIT_LOGS : "generates"
    USERS ||--o{ USER_ROLES : "assigned"
    ROLES ||--o{ ROLE_PERMISSIONS : "defines"
    PERMISSIONS ||--o{ ROLE_PERMISSIONS : "belongs_to"

5.9 Data Integrity & Constraints

  • Uniqueness Constraints: Enforced on email, username, provider_user_id.

  • Foreign Key Consistency: Enforced at application layer since MongoDB is non-relational.

  • Soft Deletion: Users and roles can be soft-deleted via status: "inactive".

  • TTL Indexes: Expire temporary tokens and locked accounts automatically.

  • Encryption:

    • Passwords → Argon2 hash

    • Tokens & secrets → AES-256 (field-level encryption)

    • Sensitive configs → environment variables or Vault


5.10 Data Security Model

AreaControl
Passwordsbcrypt hash with salt (bcrypt as fallback)
TokensEncrypted in Redis (AES-GCM via app layer if needed)
Secrets (MFA, OAuth)Encrypted using KMS / Vault; never stored in plaintext
PII Fields (email, phone)Field-level encryption or DB-level encryption (FLE)
Transport LayerTLS 1.2+ enforced end-to-end
Data at RestMongoDB encryption-at-rest (EBS or Atlas-managed)

5.11 Data Lifecycle & Retention

Data TypeRetention PolicyNotes
UsersUntil account deletion + 30 days grace periodGDPR-style retention
Sessions / Refresh TokensAuto-expire via TTL7–30 days configurable
Audit Logs90 days hot storage, 1 year cold archiveFor compliance
OTP / MFA CodesAuto-delete after expiryNever stored in plaintext
Rate Limit CountersAuto-expireShort TTLs (1–5 min)

Data lifecycle management ensures minimal persistence of sensitive data and compliance readiness (GDPR-style erase-on-delete).


5.12 Data Flow Summary

Example: Login + Token Lifecycle

1. Client submits credentials
2. Auth API → UserService (Mongo lookup + hash verify)
3. On success → TokenService issues access+refresh
4. Access token sent to client, refresh token saved in Redis
5. Refresh rotation on use; old token invalidated
6. Logout clears cookies + deletes Redis entry
7. AuditService logs login success/failure in Mongo

Example: MFA Enrollment

1. User enables MFA
2. MFAService generates secret → encrypted + stored in mfa_credentials
3. On login, MFAService verifies TOTP or OTP via Redis (for SMS/Email)
4. Audit entry written for challenge success/failure

5.13 Data Consistency & Integrity

  • MongoDB uses document-level atomic operations for updates.

  • Redis data is ephemeral by design, safe for cache/session state but not source-of-truth.

  • Referential integrity between collections maintained at application level (e.g., user deletion triggers cascade cleanup of sessions and MFA).

  • Audit logs are append-only to prevent tampering.


5.14 Backup & Recovery Strategy

5.9 Backup & Retention

Data TypeBackup FrequencyRetentionNotes
MongoDB (users, roles)Daily30 daysEncrypted backups
Redis (sessions)NoneN/AVolatile data only
Audit LogsWeekly archival1 yearArchived to cold storage (S3/MinIO)
ComponentBackup FrequencyRecovery Time Objective (RTO)Notes
MongoDBDaily snapshot< 1 hourAutomated Atlas/Replica backup
RedisOptional (RDB/AOF)< 15 minOnly critical keys persisted
Audit LogsArchived weekly< 24 hoursCold storage retrieval
SecretsManaged by Vault / KMSN/AVersioned rotation

5.15 Data Design Summary

The data design balances security, performance, and clarity:

  • MongoDB provides durable, structured persistence for identities and logs.

  • Redis ensures low-latency session management.

  • Encryption and TTLs protect sensitive data.

  • Modular schema design supports future IAM expansion (SSO, ABAC, SCIM).

  • Event-based updates (via audit service) ensure traceability and compliance.

5.16 Future Data Extensions

  1. Device Fingerprinting Table — for anomaly detection and session monitoring.

  2. Login History — separated from audit logs for faster analytics.

  3. Session Geo-Analytics — user’s login country, device risk scoring.

  4. Federation Metadata Table — store provider discovery endpoints and JWKS caching.

6. Detailed Workflows (UML Diagrams)

This is the heart of SDA.

  • Sequence Diagrams:

    • User Login Flow.

Example: Detailed Login Sequence (Mermaid) — ties all 7 services

Notes on the diagram:

  • TokenService stores refresh tokens in Redis with keys like refresh:<tokenId> and indexes under user_sessions:<userId>.

  • AuditService gets events either sync or via event bus for immediate recording.

  • Token Refresh.

  • Password Reset.

  • MFA Verification.

  • Activity Diagrams:

    • Account lifecycle (active → locked → deleted).
  • State Machine:

    • Session lifecycle.

7. Security Architecture

  • Password hashing (Argon2/bcrypt).

  • Transport security (HTTPS, TLS 1.3).

  • Rate limiting strategies.

  • Threat model summary (brute force, replay attacks, session hijacking).

8. Scalability & Performance

  • Horizontal scaling of Auth API (stateless).

  • Session store in Redis cluster.

  • Token revocation lists (Bloom filter / DB).

  • Expected throughput (10k logins/sec).

9. Availability & Reliability

  • Redundancy (multi-region DB, load balancing).

  • Failover strategies (e.g., DB replica promotion).

  • Session persistence during failures.

10. External Integrations

  • OAuth providers (Google, GitHub, etc).

  • Monitoring tools (Prometheus, Grafana, ELK).

  • Notification services (email/SMS for OTPs).

11. Technology Stack

List chosen stack with rationale. Example:

  • Backend: Node.js/Express (fast, async).

  • Database: PostgreSQL for relational data.

  • Cache: Redis for sessions.

  • Security: JWT + bcrypt for passwords.

  • Infra: Kubernetes + Docker for scaling.

12. Risks & Mitigations

  • DB bottlenecks → use read replicas.

  • Redis memory exhaustion → eviction policy.

  • OAuth provider downtime → fallback login.

13. Future Enhancements

  • Add SSO (SAML, OIDC federation).

  • Add adaptive MFA.

  • Introduce risk-based authentication.

14. Appendices

  • API Specs (OpenAPI/Swagger snippet).

  • Glossary (carry over from SRS).

  • References (OAuth RFC, OWASP ASVS, NIST guidelines).