Software Design and architecture Docs

1. Introduction

1.1 Purpose

The purpose of this Software Design & Architecture (SDA) document is to translate the requirements defined in the Software Requirements Specification (SRS) into a concrete design blueprint for the authentication system.

This document provides the structural and technical foundation for implementation, covering architecture, data models, modules, and interaction flows. The authentication system will serve as a core service, ensuring that users can securely register, log in, and manage their identities.

By defining the architecture and design choices upfront, this document ensures that the system is secure, scalable, and extensible enough to evolve into enterprise-grade identity and access management (IAM) in the future.

1.2 Scope

This document focuses on the authentication subsystem, detailing its architecture, data flow, and integration with supporting services such as caching, external identity providers, and monitoring. Authorization beyond basic role-based access control (RBAC) is out of scope for this version but considered for future extensions.

1.3 References

Software Requirements Specification (SRS) – Authentication System
RFC 7519 – JSON Web Tokens (JWT)
OAuth 2.0 & OpenID Connect specifications
MongoDB documentation
Redis documentation

1.4 Definitions & Abbreviations

IAM: Identity and Access Management
MFA: Multi-Factor Authentication
SSO: Single Sign-On
JWT: JSON Web Token

2. Architecture Goals & Principles

2.1 Architecture Goals

The authentication system is designed with the following overarching goals:

Balance security and usability: Strong security controls are implemented without creating unnecessary friction for end users.
Scale to thousands of users: The system is optimized for small-to-medium scale (portfolio or demo use), but follows patterns that can extend to larger user bases if required.
High availability: Authentication is mission-critical; the design minimizes single points of failure and ensures redundancy where possible.
Extensibility for learning: While this implementation is intended for portfolio demonstration, the architecture models patterns used in enterprise-grade identity systems.
Enterprise readiness: Even at small scale, the design follows enterprise-level principles (RBAC, observability, federation, testing).
Portability: The system can be deployed on any hosting platform (e.g., Render, AWS, GCP) without being locked into proprietary services.

2.2 Architecture Principles

The following principles guide the design and implementation:

Security by Design
- Security is treated as a first-class concern, but always balanced with usability.
- Sensitive data (passwords, tokens, secrets) is always encrypted at rest and in transit.
- Authentication follows least privilege by enforcing fine-grained RBAC from the start.
API-first, UI-enabled
- All features are exposed via APIs (REST/GraphQL), ensuring integration with other systems.
- A user-facing UI is layered on top for usability and demonstration purposes.
Stateful session management (with flexibility)
- Sessions are managed with a database (MongoDB) and cache (Redis) for performance.
- While not stateless, the design allows migration to stateless token-based flows if required.
Observability built-in
- Logging, monitoring, and auditing are integral parts of the architecture.
- Security events (e.g., failed logins, privilege escalations) are tracked from day one.
High availability & fault tolerance
- Components are designed to recover gracefully from failure.
- Database and cache are configured to support redundancy and replication.
Extensible Identity Federation
- From the initial release, support for external providers (Google, GitHub) is built in.
- Future extension to other providers (e.g., Facebook, SAML-based enterprise IdPs) is supported by the modular architecture.
Performance-oriented development
- Design favors rapid development using proven libraries, but with an eye on efficiency (e.g., caching tokens, minimizing DB queries).
Testability as a principle
- The system is built with automated unit and integration testing in mind.
- Authentication flows (registration, login, MFA, RBAC enforcement) are validated systematically.
Enterprise complexity with portfolio clarity
- The design intentionally mirrors enterprise-grade IAM systems (MFA-ready, federated logins, RBAC, monitoring).
- At the same time, the documentation emphasizes clarity and learning value for portfolio purposes.

3. High-Level Architecture

This section describes the major building blocks of the authentication system, how they interact, and how the system should be deployed to meet the goals defined in Section 2 (balance security/usability, scale to thousands of users, high availability, portability, observability, and enterprise-ready practices).

3.1 Context Diagram

[Client Apps]
     │
     ▼
[API Gateway] ──(authn, throttling, TLS)──► [Auth Service / Auth Core]
                                         ├─► [Token & Session Store (Redis)]
                                         ├─► [Primary DB (MongoDB): users, roles, audit]
                                         ├─► [External IdPs (OAuth/OIDC)]
                                         └─► [Security Services: hashing, KMS, rate-limiter]
                                               │
                                               └─► [Monitoring & Logging: Prometheus/Grafana, ELK]

Notes & Next Steps

Define token lifetimes (access token TTL, refresh token TTL) and refresh rotation policy in the Session/Token Module (LLD).
Specify which responsibilities live in the API Gateway vs Auth Core (e.g., CSP enforcement, early 401/429 responses).
Add deployment considerations: single-region start with plan for multi-region session replication if needed.

3.2 Container / Component Diagram

Mermaid (Component) — paste into any Mermaid-capable renderer

Components & responsibilities (summary)

API Gateway / Auth Proxy: TLS termination, request routing, rate limiting, basic auth protection; passes requests to Auth API. Can handle initial JWT verification for static routes.
Auth API Service: Primary HTTP API (register, login, logout, refresh, password reset, role management). Implements business logic and validation.
Token Service: Issues and validates access tokens; handles refresh logic, rotation, revocation. Consults Redis for refresh token state.
User Service (DB): Stores user profiles, roles, permissions, account status. MongoDB with indexes for unique constraints.
MFA Service: TOTP generation/verification, backup codes, SMS/Email OTP orchestration (via Notification adapter).
OAuth Adapter: Encapsulates federation logic for Google/GitHub; normalizes provider identities to local user accounts.
Redis: In-memory store for refresh tokens, session indices, ephemeral locks, rate-limiting counters, and short-lived state.
Audit & Logging: Append-only audit log storage (also stored in Mongo or shipped to ELK). Critical for non-repudiation and incident response.
Monitoring / Observability: Metrics (Prometheus), tracing (OpenTelemetry), logs (ELK or Loki), alerting (Alertmanager).
WAF / CDN: Optional edge protection for brute-force, DDoS mitigation, and static assets.3.3 Deployment View (Logical cloud infra)

Mermaid code (Deployment) — paste into Mermaid rendere

Deployment notes

Kubernetes or managed containers (Render, AWS ECS, GKE) recommended for portability and scaling.
Load Balancer + CDN: LB for traffic distribution; CDN for static UI assets and to terminate TLS at edge.
MongoDB Replica Set: Primary + secondaries, periodic backups, and automated failover.
Redis Cluster with Sentinel / Managed Redis: Use Redis for ephemeral session/refresh token storage and rate-limiter counters.
Workers: Background job processors (email OTPs, token cleanup, auditing sync).
Observability stack: Prometheus, Grafana, Loki/ELK, OpenTelemetry traces.
Secrets management: Use cloud secret manager (or Kubernetes secrets backed by vault) for signing keys and provider secrets.

3.4 Data & Control Flow (step-by-step)

Login / Register
- Client → API Gateway → Auth API
- Auth API validates credentials → UserService (Mongo)
- On success: Token Service issues Access Token (short-lived) + Refresh Token (persistent)
- Refresh token stored in Redis with device_id, token_id, expiry TTL
- Access token returned to client (in-memory or HttpOnly cookie depending on app type); refresh token delivered via HttpOnly cookie or cookie+DB depending on design
API Request
- Client sends request with access token (Authorization header or HttpOnly cookie)
- API Gateway verifies signature (or delegates to Token Service) → route to Auth API or backend services
Token Refresh
- Access token expired → client hits /refresh
- Server validates refresh token (Redis) → rotates new refresh token and issues new access token
Logout / Revoke
- Client requests logout → Auth API deletes refresh token entry from Redis and expires cookies
- Audit log entry written
Federation Login
- Client redirected to OAuth Adapter → external provider → callback → OAuthAdapter normalizes identity → link or create local user → issue tokens as above

3.5 How this maps to Architecture Goals & Principles

Balance security & usability: short-lived access tokens + revocable refresh tokens; MFA and federation optional but built-in.
Scale to thousands: stateless-ish APIs + Redis for ephemeral session state; services horizontally scalable.
High availability: LB, DB replica set, Redis cluster, multiple pod replicas.
Portability: containerized services, managed DB/Redis or self-hosted; works on Render/AWS/GCP.
Observability: Telemetry, logging, and alerting integrated; security events are first-class metrics.
Extensibility & enterprise parity: modular services (MFA, OAuth adapter, Token Service) allow future growth into SSO/federation.

3.6 Security & Operational considerations (actionable)

TLS everywhere; terminate TLS at LB or CDN, but enforce end-to-end where possible.
Secrets stored in Secret Manager / Vault; signing keys rotated periodically.
Cookie flags: HttpOnly, Secure, SameSite=Strict/Lax depending on UX.
CSRF protection when using cookies (SameSite + CSRF tokens for non-GET stateful endpoints).
Token rotation & revocation: implement refresh token rotation, store token metadata in Redis (device_id, issued_at, ip_hash).
Rate limiting: per-IP and per-account; counters in Redis.
WAF & Bot protection for brute-force mitigation.
Audit trail: Immutable audit log for security actions (store centrally, retention policy).
Backups: Automated backup schedule for Mongo; snapshot retention and test restores.
Disaster recovery: Cross-region replicas for Mongo if required; plan RTO/RPO.

3.7 Metrics & Alerts (examples you should include)

Auth Success Rate (errors / second) — alert if error rate > 1% of traffic
Login Latency (p95) — alert if > 200ms (your SRS perf target)
Token Refresh Latency
Failed Login Attempts per IP / Account — alert and auto-throttle
Redis Memory Usage — alert if > 75%
Mongo Primary Election / Replica Lag — alert when lag > threshold
High number of revoked tokens — possible breach indicator

3.8 Deliverables / Artifacts to attach

Mermaid/PlantUML component & deployment diagrams (this section)
OpenAPI/Swagger spec for Auth API
Terraform/Kubernetes manifests or Docker Compose (deployment examples)
Prometheus/Grafana dashboards and alert rules
Threat model (STRIDE), and an audit log retention + access policy

4. Detailed Component Design — 7 Services

Service 1 — Auth API Service (Orchestrator / Gateway-facing)

Responsibilities

Primary external entry point for clients (web/mobile/3rd-party).
Validate request schemas, throttle/rate-limit, anti-bruteforce.
Orchestrate flows: login, register, refresh, logout, password reset, federation callbacks, MFA challenges.
Enforce API-level auth (cookie/read access token presence) and forward to backend services.
Sanitize and return errors in consistent format.

Public Interfaces (REST)

POST /v1/auth/login
POST /v1/auth/register
POST /v1/auth/refresh
POST /v1/auth/logout
POST /v1/auth/password/forgot
POST /v1/auth/password/reset
GET  /v1/auth/oauth/callback
POST /v1/auth/mfa/verify

Example /login (req/resp)

POST /v1/auth/login
{
  "email":"user@example.com",
  "password":"hunter2",
  "device_id":"chrome-2025-10-01",
  "client_info":{ "ip":"1.2.3.4", "ua":"…" }
}
200 OK
{
  "accessToken": "<jwt>",
  "expiresIn": 900
}

(Refresh token is issued/stored server-side and returned via HttpOnly cookie.)

Internal Interfaces / Calls

UserService.verifyCredentials(email, password, deviceInfo)
MFAService.checkRequired(userId) → maybe challenge
TokenService.issueTokens(userId, deviceId, scope)
AuditService.logEvent(eventType, meta)
OAuthService.handleCallback(params)

Data/Storage

No own DB (stateless). Relies on other services.
Short-lived in-memory caches for throttling counters (or Redis).

Security

Validate input with strict schema (JSON schema).
Rate-limit by IP and account.
WAF at edge recommended.
All responses use generic failure messages to avoid user enumeration.

Scaling & HA

Stateless pods; scale horizontally behind LB.
Use shared Redis for rate-limits.
Keep session affinity unnecessary.

Failure modes & mitigation

Downstream user DB or Redis failure → return 503 + circuit breaker.
Use exponential backoff and retry for transient ops.

Observability

Metrics: request rate, error rate, latency (p95,p99), auth failures per account/IP.
Logs: structured JSON with trace id and minimal PII (never log raw passwords).
Events: emit audit events for login success/failure, token refresh, password changes.

Testing

Unit tests for controllers & validators.
Integration tests mocking UserService/TokenService.
E2E tests for full flows.

Service 2 — User Management Service

Responsibilities

Persistent user records (create/read/update/delete).
Password hashing + verification (Argon2 / bcrypt).
Email/phone verification lifecycle.
Link/unlink external provider identities.
Expose CRUD + admin operations for roles & attributes.

Interfaces (internal API)

verifyCredentials(email, password) -> {user, status}
createUser(userPayload) -> user
findUserByEmail(email) -> user
updateUser(userId, patch) -> user
linkExternalIdentity(userId, provider, providerUserId, attrs)
incrementFailedAttempt(userId)
resetFailedAttempts(userId)

Data model (Mongo collections)

users (id, email, password_hash, name, phone, mfa_enabled, status, createdAt, lastLogin, failedAttempts, device_metadata[])
user_oauth (userId, provider, providerUserId, linkedAt)
roles & role_permissions or stored separately in Policy Service

Security

Passwords hashed with Argon2 (recommended) or bcrypt with strong params.
Enforce unique email/phone indexes.
Use per-user salt (inherent to Argon2).
PII at rest encrypted (field-level or DB-level).

Interactions

Called by Auth API for verify/create.
Emits user.created, user.updated, user.failed_login, user.locked events to AuditService.

Scaling

Vertical or read-replicas for reads (if many reads).
Use indexes for auth lookups by email.
If user volume grows huge, consider sharding.

Failure

DB replica lag → stale reads for lastLogin. Handle eventual consistency.

Observability

Metrics: user create rate, failed_logins per minute, account locks.
Logs: events for user lifecycle.

Testing

Hash/verify unit tests, migration tests, unique constraint tests.

Service 3 — Token & Session Service ( Token Service + Session Store)

Responsibilities

Issue/validate JWT access tokens.
Create, rotate, revoke refresh tokens.
Persistent session index in Redis (fast lookups) and optional DB backup.
Device binding: tie refresh token to device_id and metadata (IP hash, UA).
Enforce token revocation & Blacklist/allow-list logic.

Interfaces

issueTokens(userId, deviceId, clientInfo) -> { accessToken, refreshTokenId }
validateAccessToken(accessToken) -> { valid, claims }
validateRefreshToken(refreshToken, deviceId) -> { valid, tokenId }
rotateRefreshToken(oldTokenId) -> newTokenId
revokeRefreshToken(tokenId)
revokeAllForUser(userId)
listActiveSessions(userId) -> [session]

Data storage & key schema

Redis hash per refresh token: refresh: < tokenId > → {userId, deviceId, issuedAt, expiresAt, nonce, rotated : false}
Optionally: Reverse index: user_sessions:<userId> → sorted set tokenIds by issuedAt (for listing)
JWT signing keys stored in Secrets Manager; rotate and support key IDs (kid) in token header.

Security & rotation

Access token short-lived (e.g., 10–15m).
Refresh token backed by Redis and rotated on each use. On rotation, mark old token revoked.
Enforce deviceId + refresh token mapping.
Implement detection: if rotated refresh token used twice => suspect token theft => revoke all sessions & force MFA.

Token Storage Flow Diagram (Hybrid Approach)

                ┌────────────────────────┐
                │        Browser          │
                │                        │
                │  HttpOnly Secure Cookie │
                │  ────────────────┐     │
                │  • Access Token   │     │
                │  • Refresh Token? │(opt)│
                └─────────┬────────┘
                          │
                          ▼
                ┌────────────────────────┐
                │       API Server        │
                │                        │
                │  1. Validates Access    │
                │     Token from cookie   │
                │  2. If expired →        │
                │     uses Refresh Token  │
                └─────────┬────────┬─────┘
                          │        │
                          │        ▼
                          │   ┌──────────────────┐
                          │   │    Redis DB      │
                          │   │ (Refresh Tokens) │
                          │   │ • user_id        │
                          │   │ • token_id       │
                          │   │ • device_id      │
                          │   │ • expiry (TTL)   │
                          │   └──────────────────┘
                          │
                          ▼
                ┌────────────────────────┐
                │     MongoDB (Core)     │
                │                        │
                │ • Users                │
                │ • Roles & Permissions  │
                │ • Audit Logs           │
                └────────────────────────┘

How it works

User logs in →
- Server issues Access Token (short-lived) in cookie.
- Server also issues Refresh Token (long-lived) and stores it in Redis (linked to user & device).
On each request →
- Browser automatically sends Access Token via cookie.
- If Access Token expired → server checks Redis for refresh token validity.
- If valid → new Access Token issued + refresh rotation.
Logout →
- Access Token cookie cleared.
- Refresh Token removed from Redis.

Scaling

Redis cluster with replication. Use TTLs to auto-expire sessions.
TokenService itself stateless: replicate across nodes.

Failure modes

Redis OOM or unavailability: fallback? If Redis down, you can block refreshes (fail safe).
Key rotation mismatch: keep previous signing keys for validation.

Observability

Metrics: token issuance rate, refresh attempt rate, refresh failures, revocations.
Audit events for token rotation and mass-revoke.

Testing

Exhaustive token rotation tests, replay detection tests, TTL expiry tests.

Service 4 — MFA Service

Responsibilities

Manage TOTP secrets, backup codes, SMS/Email OTPs.
Provide enrollment/setup flows & challenge verification.
Rate-limit OTP requests and coordinate with Notification service.

Interfaces

setupTOTP(userId) -> { secret, qrCodeUri }   // One-time return
verifyTOTP(userId, code) -> boolean
generateBackupCodes(userId) -> [codes]
sendOTPViaSMS(userId, phone) -> otpId
verifyOTP(otpId, code) -> boolean

Data & secrets

mfa_credentials collection (userId, type, secret_hash, createdAt, lastUsed).
When storing TOTP secrets, either encrypt the secret or store derived secrets (hash) and keep cleartext only briefly at setup.

Security

Use HMAC-based TOTP generation libraries, prevent brute force by counting attempts.
Store backup codes hashed (not plaintext).
Limit OTP send rates (per user, per IP).

Interactions

Called by Auth API during login if mfa_enabled.
Emits mfa.challenge, mfa.success, mfa.failed to Audit.

Scaling

Stateless verification service; scale horizontally.
Use worker/queue for SMS/email sending.

Failure

SMS provider outage → fallback to email or show error.
If verification store is unavailable, deny MFA-challenged auth attempts (fail secure).

Observability

Metrics: OTP sent, OTP verify success/fail, TOTP verify latency.

Testing

TOTP golden tests with fixed seeds, replay attacks, rate limit tests.

Service 5 — Audit & Telemetry Service

Responsibilities

Collect, store, and index audit events (login success/failure, token rotation, role changes).
Expose querying for compliance and admin UIs.
Ship metrics to Prometheus/Grafana and logs to ELK/Loki.

Interfaces

POST /v1/audit/events (internal)
GET /v1/audit/user/:id (admin)
Metrics: Prometheus scrape endpoints

Data model

audit_logs (logId, timestamp, eventType, userId, tokenId, ip, deviceInfo, meta JSON)
Immutable append-only. Retention policy (e.g., 90 days in hot store, cold archive thereafter).

Storage

Elasticsearch or Mongo (append-only), backed by object storage for archived logs.

Security

Strong RBAC for audit querying endpoints.
Ensure tamper-resistance: sign logs or write to append-only store.

Interactions

Subscribes to events from all services via event bus.
Receives direct writes from Auth API for immediate-critical events.

Scaling

Indexing pipeline for logs; use partitioning by time.
Archival jobs.

Failure

Logging delays acceptable but loss of logs is not; use local persistence & retry.

Observability

Provide dashboards, alerts for anomalies (spike in failed logins).

Testing

Ensure event semantics, retention, query performance.

Service 6 — OAuth / Federation Service

Responsibilities

Encapsulate provider-specific logic for OAuth/OIDC (Google, GitHub etc.).
Normalize provider identity to canonical internal user profile.
Handle redirect/callback flows and account linking/unlinking.

Interfaces

GET /v1/oauth/{provider}/authorize → redirect URL
GET /v1/oauth/{provider}/callback?code=... → handle callback, exchange code for token, normalize user info

Interactions

Calls third-party provider endpoints (token exchange, userinfo).
Calls UserService.linkExternalIdentity or UserService.createUser with normalized data.
Calls TokenService.issueTokens.

Security

Protect redirect URIs, support PKCE for public clients.
Keep client secrets in Secret Manager.
Validate provider certs and token signatures.

Scaling

Stateless; scale horizontally.
Cache provider metadata.

Failure

Provider downtime → fallback message and retry.
Inconsistent provider data → require manual review.

Observability

Metrics: federation success rate, provider error rates.
Logs: provider responses, mapping decisions.

Testing

Contract tests with provider mocks; end-to-end with test OAuth clients.

Service 7 — Policy / RBAC Service (Auth Z)

Responsibilities

Store roles, permissions, and policies. Evaluate authorization requests (PDP — policy decision point).
Support both RBAC and ABAC (attributes-based) evaluation for fine-grained permissions.
Expose APIs for admin to manage roles & permissions.

Interfaces

isAllowed(userId, resource, action, context) -> {allowed, reason}
getPermissions(userId)
assignRole(userId, role)
createRole(roleName, permissions)

Data model

roles (roleId, name, description)
permissions (permId, name, description)
role_permissions mapping
user_roles mapping (or refer to UserService storing role list)

Interactions

Auth API consults PolicyService.isAllowed(...) before returning data for sensitive endpoints.
Admin UI calls this service for role management.

Security

Strict admin auth for role changes; all changes emitted to AuditService.
Cache decisions for short time (TTL) for performance; cache invalidation on role change.

Scaling

Read-heavy; use caching (Redis). Use eventual consistency for changes (invalidate caches).

Failure

If Policy service down, choose fallback: fail-closed (deny) or fail-open (risky). For security, prefer fail-closed for critical checks.

Observability

Metrics: authorization decision latency, cache hit rate, denials per minute.

Testing

Policy unit tests for edge cases; property-based tests for combinatorial policies.

Cross-Service Design Patterns & Extras

Eventing & Contracts

Use an event bus to decouple Audit/Telemetry & asynchronous tasks. Examples: user.created, login.success, login.failed, token.rotated, mfa.failed.
Keep event schema versioned.

Secrets & Keys

JWT signing keys stored in Secret Manager / Vault. Support key rotation and kid header. Keep old public keys for validation until expiry.

Token rotation & replay detection

Rotation: issue new refresh token with every refresh. Mark previous as rotated; if an old token appears later, treat as theft -> revoke all sessions for that user and force re-authentication + alert.

Session listing & device management

TokenService/SessionService maintains user_sessions:<userId> sorted set. Auth UI can show active sessions and allow per-device logout.

Refresh tokens in HttpOnly cookie, SameSite=strict/lax depending on cross-site needs. Use an anti-CSRF token for state-changing endpoints if cookie-based.

Practical artifacts

For each service: OpenAPI spec (external APIs) + internal RPC spec (gRPC/HTTP).
Data model ERD & Redis key schema.
Sequence diagrams for Login, Refresh, Register, ForgotPassword, Federation (Mermaid).
Deployment manifests (k8s) with readiness/liveness probes & resource limits.
Prometheus metrics list + Grafana dashboard templates.
Threat model (STRIDE) + mitigation mapping to components.
Testing matrix (unit, integration, contract, E2E).

5. Data Design

5.1 Purpose

The Data Design section defines how all data within the authentication and authorization system is physically structured, stored, and managed.
It translates the logical entities identified in the SRS and high-level Data Design into concrete MongoDB collections and Redis key structures, ensuring optimal performance and integrity.

This section establishes the foundation for data consistency, security, and scalability across all modules that handle user identities, authentication tokens, sessions, roles, permissions, and telemetry data.
It also defines relationships between collections, indexing strategies, encryption and hashing requirements, and data lifecycle policies — covering how data is created, updated, retained, archived, and eventually purged.

Ultimately, this section ensures that every component interacting with the data layer adheres to unified design principles that support high availability, fault tolerance, and compliance with privacy standards.

5.2 Data Design Overview

The authentication system’s data layer is composed of:

Layer	Technology	Purpose
Primary Database	MongoDB	Persistent storage for user profiles, roles, permissions, audit logs
Ephemeral Store	Redis	Fast storage for sessions, refresh tokens, rate limits, OTPs
Long-term Logs	ELK / Loki	Append-only audit and security events for monitoring and forensics
Secrets Storage	Secret Manager / Vault	Signing keys, OAuth client secrets, MFA seed encryption keys

Database Decision

Step 1: Identify the Nature of Each Data Type

We’ll classify every entity by data lifetime, volatility, access pattern, and sensitivity:

Entity	Data Type	Lifetime	Access Pattern	Security Sensitivity
`users`	Core identity	Long-term	Moderate read/write	🔥 Very High
`last_devices`	Ephemeral behavior log	Medium-term	Frequent writes, occasional reads	Medium
`mfa_credential`	Secret data	Long-term	Rare writes, critical reads	🔥 Extreme
`role`, `permission`, `user_role`, `role_permission`	Access control definitions	Long-term	Read-heavy	High
`audit_log`	Immutable security logs	Long-term	Write-heavy, rare reads	High
`oauth_provider`	Token/linked account data	Medium-term	Rare writes, occasional reads	🔥 High
`telemetry_event`	Behavior data	Short to medium-term	Write-heavy	Medium
`session_store`	Temporary session tokens	Short-lived	Constant read/write	🔥 High

Step 2: Match Each Type to the Right Database

MongoDB (Primary Persistent Store)

For:

users
role, permission, user_role, role_permission
oauth_provider

Why:

Structured but flexible (perfect for identity & role data).
Easy JSON-based querying for user & role relations.
You can easily embed or reference relationships (1:N, N:M).

Encryption & Indexing:

Encrypt sensitive fields (password_hash, tokens).
Index on email, username, user_id.

Redis (Ephemeral, Fast Store)

For:

session_store
Possibly last_devices (if used for recent login cache)

Why:

Blazing fast, built for TTL (time-to-live) sessions.
Perfect for session invalidation, token rotation, and device caching.
Native expiration = no cleanup cron jobs.

Vault / KMS (Secret Store)

For:

mfa_credential.secret_hash
Encryption keys for JWT signing, refresh tokens, OAuth tokens

Why:

Redis/Mongo can be compromised; Vault isolates and encrypts secrets at rest.
You never expose raw secrets to your DB.

(Note: in your schema, secret should be a reference or encrypted placeholder, not the actual secret.)

Elasticsearch / PostgreSQL (Log Store)

For:

audit_log

telemetry_event

Why:

These are append-only, massive, and often queried by time.
Elasticsearch gives you full-text and fast time-based search (ideal for login analysis, fraud detection).
If you prefer structured relational logs, PostgreSQL with partitioned tables works fine too.

My Approach

Start with MongoDB + Redis, and later scale into a hybrid:

MongoDB → Users, Roles, OAuth, Permissions
Redis → Session Store, Last Devices
Vault → Secrets (MFA, keys)
Elasticsearch → Audit, Telemetry

Step 3: Visualize (Simple Overview)

                ┌──────────────────────────┐
                │   Authentication API     │
                └────────────┬─────────────┘
                             │
         ┌───────────────────┼───────────────────┐
         ▼                   ▼                   ▼
 ┌────────────┐       ┌──────────────┐     ┌────────────┐
 │  MongoDB   │       │    Redis     │     │  Vault/KMS │
 │ users      │       │ sessions     │     │ mfa secrets│
 │ roles      │       │ last_devices │     └────────────┘
 │ oauth_prov │       └──────────────┘
 │ permissions│
 └────────────┘
         │
         ▼
   ┌──────────────┐
   │ Elasticsearch│
   │ audit_logs   │
   │ telemetry    │
   └──────────────┘

Summary

Table	Best DB	Reason
users	MongoDB	Long-term core data
last_devices	Redis / MongoDB	Depends on if you want cache or history
mfa_credential	Vault + MongoDB ref	Secrets must be isolated
role / permission / user_role	MongoDB	Stable structure
audit_log	Elasticsearch	Time-based search and analytics
oauth_provider	MongoDB	Rarely updated identity link
telemetry_event	Elasticsearch	Heavy event ingestion
session_store	Redis	Fast token/session management

5.3 Logical Data Model

Core Collections (MongoDB)

Collection	Key Fields	Description
users	`id (PK)`, `email`, `username`, `password_hash`, `mfa_enabled`, `email_verified`, `phone_verified`, `is_locked`, `created_at`, `updated_at`	Core user identity record. Primary lookup by email.
user_oauth	`provider_id (PK)`, `user_id (FK)`, `provider_name`, `provider_user_id`, `access_token (encrypted)`, `refresh_token (encrypted)`, `expires_at`	Stores external IdP linkages for federation (Google, GitHub).
roles	`role_id (PK)`, `name`, `description`	Defines named roles such as `admin`, `user`, etc.
permissions	`permission_id (PK)`, `name`, `description`	Granular access rights that can be assigned to roles.
role_permissions	`role_id (FK)`, `permission_id (FK)`	Many-to-many mapping of roles to permissions.
user_roles	`user_id (FK)`, `role_id (FK)`	Many-to-many mapping of users to roles.
audit_logs	`log_id (PK)`, `timestamp`, `user_id`, `event_type`, `ip`, `device`, `metadata`	Immutable append-only logs for compliance, breach detection, and analytics.
mfa_credentials	`user_id (FK)`, `type`, `secret_hash`, `created_at`, `last_used`	TOTP and OTP configuration per user.

Relationships

users (1) — (M) user_oauth (one user can link multiple providers)
users (M) — (M) roles through user_roles
roles (M) — (M) permissions through role_permissions
users (1) — (M) audit_logs
users (1) — (1) mfa_credentials (if MFA enabled)

5.4 Physical Data Model (Redis + MongoDB Keys)

Data Type	Storage	Key Pattern / Collection	Purpose	TTL
Access Tokens	Client memory / cookie	JWT (not stored server-side)	Short-lived access tokens for API auth	15 min
Refresh Tokens	Redis	`refresh:<tokenId>` → `{user_id, device_id, issued_at, expires_at}`	Long-lived tokens for session refresh	7–30 days
User Session Index	Redis	`user_sessions:<userId>` → `[tokenIds]`	Track all devices/sessions per user	TTL = same as refresh
OTP Codes	Redis	`otp:<userId>` → `{code_hash, expires_at}`	Short-lived OTPs for MFA/forgot password	2–5 min
Rate-Limit Counters	Redis	`rate:<ip>`	Track login attempts per IP	1 min
Audit Logs (Hot)	MongoDB	`audit_logs`	Critical user actions	90 days
Audit Logs (Cold)	Object Storage	`/archive/audit/yyyy-mm-dd.log`	Archived logs beyond 90 days	1 year

5.5 Data Entities and Fields

5.5.1 users

Field	Type	Description
`id`	String (PK)	Unique identifier (UUID).
`name`	String	Display name.
`username`	String	Unique system username.
`email`	String	Unique email, used for login.
`phone_number`	String	Optional verified phone number.
`bio`	String	Optional user bio.
`password_hash`	String	Argon2/bcrypt hash of password.
`profile_image`	String (URL)	Optional avatar path.
`email_verified`	Boolean	Email verification status.
`phone_verified`	Boolean	Phone verification status.
`is_locked`	Boolean	Indicates if account is temporarily locked due to failed logins.
`mfa_enabled`	Boolean	Whether user has MFA enabled.
`is_logged_in`	Boolean	Real-time login status (for analytics or session visualization).
`forgot_password_token`	String	Token for password reset (hashed, short TTL).
`verification_token`	String	Token for email/phone verification.
`lock_until`	DateTime	Account lock expiry time.
`forgot_password_expiry`	DateTime	Token expiry.
`created_at`	DateTime	Account creation timestamp.
`updated_at`	DateTime	Last profile update.
`last_login_at`	DateTime	Last successful login time.
`last_failed_at`	DateTime	Last failed login attempt.

Indexes:

email (unique)
username (unique)
lock_until (TTL for auto-unlock, optional)

5.5.2 oauth_providers

Field	Type	Description
`provider_id`	String (PK)	Unique ID for this provider entry.
`user_id`	String (FK → users.id)	Reference to local user.
`provider_name`	String	e.g., `google`, `github`.
`provider_user_id`	String	Provider-side user ID.
`access_token`	String (encrypted)	OAuth access token.
`refresh_token`	String (encrypted)	OAuth refresh token.
`expires_at`	DateTime	Token expiration time.

Indexes:

Compound (provider_name, provider_user_id) unique.
user_id indexed for reverse lookup.

5.5.3 roles, permissions, role_permissions

roles

Field	Type	Description
`role_id`	String (PK)	Unique role ID.
`name`	String	Role name (`admin`, `user`, etc.).
`description`	String	Human-readable description.

permissions

Field	Type	Description
`perm_id`	String (PK)	Unique permission ID.
`name`	String	Permission keyword (e.g., `USER_CREATE`, `VIEW_AUDIT`).
`description`	String	Human-readable permission label.

role_permissions

Field	Type	Description
`role_id`	String (FK → roles.role_id)	Linked role.
`perm_id`	String (FK → permissions.perm_id)	Linked permission.

5.5.4 mfa_credentials

Field	Type	Description
`id`	String (PK)	Unique ID.
`user_id`	String (FK → users.id)	Owner user.
`type`	Enum(`TOTP`, `SMS`, `EMAIL`)	MFA method.
`secret_hash`	String	Encrypted or hashed secret.
`backup_codes`	Array[String]	Hashed backup codes.
`created_at`	DateTime	Creation timestamp.
`last_used`	DateTime	Last time MFA verified.

5.5.5 audit_logs

Field	Type	Description
`log_id`	String (PK)	Unique log entry ID.
`timestamp`	DateTime	Event timestamp.
`event_type`	String	Type (e.g., `LOGIN_SUCCESS`, `PASSWORD_RESET`).
`user_id`	String (FK → users.id)	User related to event.
`token_id`	String (optional)	Related token, if applicable.
`ip_address`	String	Origin IP.
`device_info`	Object	User-agent or device metadata.
`metadata`	JSON	Arbitrary contextual details.

5.6 Redis Schema (Ephemeral Data)

Key Pattern	Type	Description	TTL
`refresh:{tokenId}`	Hash	{ userId, deviceId, issuedAt, expiresAt, rotated }	30d
`user_sessions:{userId}`	Sorted Set	Active sessions ordered by issuedAt	30d
`rate_limit:{ip}`	Counter	Request throttling per IP	Few seconds
`otp:{userId}`	Hash	Temporary OTPs for password reset / MFA	5m

Redis stores no permanent user data—only volatile session and token state.

5.7 Index & Performance Design

Collection	Index	Type	Purpose
`users`	`email (unique)`	B-tree	Fast login lookups
`users`	`phone_number (unique)`	B-tree	Account linking
`user_oauth`	`(provider_name, provider_user_id)`	Composite	Fast OAuth lookups
`audit_logs`	`user_id`, `timestamp`	Compound	Time-series queries
`roles`	`name (unique)`	B-tree	Role lookup
`role_permissions`	`(role_id, permission_id)`	Composite	Access check joins
`user_roles`	`(user_id, role_id)`	Composite	Role assignment checks

All indexes are optimized for read-heavy workloads (login, token refresh, session validation).

5.8 Relationships Diagram (Conceptual)

erDiagram
    USERS ||--o{ OAUTH_PROVIDERS : "has"
    USERS ||--o{ MFA_CREDENTIALS : "has"
    USERS ||--o{ AUDIT_LOGS : "generates"
    USERS ||--o{ USER_ROLES : "assigned"
    ROLES ||--o{ ROLE_PERMISSIONS : "defines"
    PERMISSIONS ||--o{ ROLE_PERMISSIONS : "belongs_to"

5.9 Data Integrity & Constraints

Uniqueness Constraints: Enforced on email, username, provider_user_id.
Foreign Key Consistency: Enforced at application layer since MongoDB is non-relational.
Soft Deletion: Users and roles can be soft-deleted via status: "inactive".
TTL Indexes: Expire temporary tokens and locked accounts automatically.
Encryption:
- Passwords → Argon2 hash
- Tokens & secrets → AES-256 (field-level encryption)
- Sensitive configs → environment variables or Vault

5.10 Data Security Model

Area	Control
Passwords	bcrypt hash with salt (bcrypt as fallback)
Tokens	Encrypted in Redis (AES-GCM via app layer if needed)
Secrets (MFA, OAuth)	Encrypted using KMS / Vault; never stored in plaintext
PII Fields (email, phone)	Field-level encryption or DB-level encryption (FLE)
Transport Layer	TLS 1.2+ enforced end-to-end
Data at Rest	MongoDB encryption-at-rest (EBS or Atlas-managed)

5.11 Data Lifecycle & Retention

Data Type	Retention Policy	Notes
Users	Until account deletion + 30 days grace period	GDPR-style retention
Sessions / Refresh Tokens	Auto-expire via TTL	7–30 days configurable
Audit Logs	90 days hot storage, 1 year cold archive	For compliance
OTP / MFA Codes	Auto-delete after expiry	Never stored in plaintext
Rate Limit Counters	Auto-expire	Short TTLs (1–5 min)

Data lifecycle management ensures minimal persistence of sensitive data and compliance readiness (GDPR-style erase-on-delete).

5.12 Data Flow Summary

Example: Login + Token Lifecycle

1. Client submits credentials
2. Auth API → UserService (Mongo lookup + hash verify)
3. On success → TokenService issues access+refresh
4. Access token sent to client, refresh token saved in Redis
5. Refresh rotation on use; old token invalidated
6. Logout clears cookies + deletes Redis entry
7. AuditService logs login success/failure in Mongo

Example: MFA Enrollment

1. User enables MFA
2. MFAService generates secret → encrypted + stored in mfa_credentials
3. On login, MFAService verifies TOTP or OTP via Redis (for SMS/Email)
4. Audit entry written for challenge success/failure

5.13 Data Consistency & Integrity

MongoDB uses document-level atomic operations for updates.
Redis data is ephemeral by design, safe for cache/session state but not source-of-truth.
Referential integrity between collections maintained at application level (e.g., user deletion triggers cascade cleanup of sessions and MFA).
Audit logs are append-only to prevent tampering.

5.14 Backup & Recovery Strategy

5.9 Backup & Retention

Data Type	Backup Frequency	Retention	Notes
MongoDB (users, roles)	Daily	30 days	Encrypted backups
Redis (sessions)	None	N/A	Volatile data only
Audit Logs	Weekly archival	1 year	Archived to cold storage (S3/MinIO)

Component	Backup Frequency	Recovery Time Objective (RTO)	Notes
MongoDB	Daily snapshot	< 1 hour	Automated Atlas/Replica backup
Redis	Optional (RDB/AOF)	< 15 min	Only critical keys persisted
Audit Logs	Archived weekly	< 24 hours	Cold storage retrieval
Secrets	Managed by Vault / KMS	N/A	Versioned rotation

5.15 Data Design Summary

The data design balances security, performance, and clarity:

MongoDB provides durable, structured persistence for identities and logs.
Redis ensures low-latency session management.
Encryption and TTLs protect sensitive data.
Modular schema design supports future IAM expansion (SSO, ABAC, SCIM).
Event-based updates (via audit service) ensure traceability and compliance.

5.16 Future Data Extensions

Device Fingerprinting Table — for anomaly detection and session monitoring.
Login History — separated from audit logs for faster analytics.
Session Geo-Analytics — user’s login country, device risk scoring.
Federation Metadata Table — store provider discovery endpoints and JWKS caching.

6. Detailed Workflows (UML Diagrams)

This is the heart of SDA.

Sequence Diagrams:
- User Login Flow.

Notes on the diagram:

TokenService stores refresh tokens in Redis with keys like refresh:<tokenId> and indexes under user_sessions:<userId>.

AuditService gets events either sync or via event bus for immediate recording.

Token Refresh.
Password Reset.
MFA Verification.

Activity Diagrams:
- Account lifecycle (active → locked → deleted).
State Machine:
- Session lifecycle.

7. Security Architecture

Password hashing (Argon2/bcrypt).
Transport security (HTTPS, TLS 1.3).
Rate limiting strategies.
Threat model summary (brute force, replay attacks, session hijacking).

8. Scalability & Performance

Horizontal scaling of Auth API (stateless).
Session store in Redis cluster.
Token revocation lists (Bloom filter / DB).
Expected throughput (10k logins/sec).

9. Availability & Reliability

Redundancy (multi-region DB, load balancing).
Failover strategies (e.g., DB replica promotion).
Session persistence during failures.

10. External Integrations

OAuth providers (Google, GitHub, etc).
Monitoring tools (Prometheus, Grafana, ELK).
Notification services (email/SMS for OTPs).

11. Technology Stack

List chosen stack with rationale. Example:

Backend: Node.js/Express (fast, async).
Database: PostgreSQL for relational data.
Cache: Redis for sessions.
Security: JWT + bcrypt for passwords.
Infra: Kubernetes + Docker for scaling.

12. Risks & Mitigations

DB bottlenecks → use read replicas.
Redis memory exhaustion → eviction policy.
OAuth provider downtime → fallback login.

13. Future Enhancements

Add SSO (SAML, OIDC federation).
Add adaptive MFA.
Introduce risk-based authentication.

14. Appendices

API Specs (OpenAPI/Swagger snippet).
Glossary (carry over from SRS).
References (OAuth RFC, OWASP ASVS, NIST guidelines).

Command Palette

1. Introduction

1.1 Purpose

1.2 Scope

1.3 References

1.4 Definitions & Abbreviations

2. Architecture Goals & Principles

2.1 Architecture Goals

2.2 Architecture Principles

3. High-Level Architecture

3.1 Context Diagram

Notes & Next Steps

3.2 Container / Component Diagram

Mermaid (Component) — paste into any Mermaid-capable renderer

Components & responsibilities (summary)

Mermaid code (Deployment) — paste into Mermaid rendere

Deployment notes

3.4 Data & Control Flow (step-by-step)

3.5 How this maps to Architecture Goals & Principles

3.6 Security & Operational considerations (actionable)

3.7 Metrics & Alerts (examples you should include)

3.8 Deliverables / Artifacts to attach

4. Detailed Component Design — 7 Services

Service 1 — Auth API Service (Orchestrator / Gateway-facing)

Service 2 — User Management Service

Service 3 — Token & Session Service ( Token Service + Session Store)

Token Storage Flow Diagram (Hybrid Approach)

How it works

Service 4 — MFA Service

Service 5 — Audit & Telemetry Service

Service 6 — OAuth / Federation Service

Service 7 — Policy / RBAC Service (Auth Z)

Cross-Service Design Patterns & Extras

Eventing & Contracts

Secrets & Keys

Token rotation & replay detection

Session listing & device management

CSRF + Cookie decisions

Practical artifacts

5. Data Design

5.1 Purpose

5.2 Data Design Overview

Database Decision

Step 1: Identify the Nature of Each Data Type

Step 2: Match Each Type to the Right Database

Redis (Ephemeral, Fast Store)

Vault / KMS (Secret Store)

Elasticsearch / PostgreSQL (Log Store)

My Approach

Step 3: Visualize (Simple Overview)

5.3 Logical Data Model

Core Collections (MongoDB)

5.4 Physical Data Model (Redis + MongoDB Keys)

5.5 Data Entities and Fields

5.5.1 users

5.5.2 oauth_providers

5.5.3 roles, permissions, role_permissions

5.5.4 mfa_credentials

5.5.5 audit_logs

5.6 Redis Schema (Ephemeral Data)

5.7 Index & Performance Design

5.8 Relationships Diagram (Conceptual)

5.9 Data Integrity & Constraints

5.10 Data Security Model

5.11 Data Lifecycle & Retention

5.12 Data Flow Summary

5.13 Data Consistency & Integrity

5.14 Backup & Recovery Strategy

5.9 Backup & Retention

5.15 Data Design Summary

5.16 Future Data Extensions

6. Detailed Workflows (UML Diagrams)

Example: Detailed Login Sequence (Mermaid) — ties all 7 services

7. Security Architecture

8. Scalability & Performance

9. Availability & Reliability

10. External Integrations

11. Technology Stack

12. Risks & Mitigations

13. Future Enhancements