Software Design and architecture Docs
I’m not a prodigy. I didn’t start at 13. I learned the hard way — error by error, late night by late night. I still Google basic stuff, still mess up, still doubt myself. But I show up daily, build real things, and document the process without filters. No polished aesthetics. No fake “10x dev” talk. Just one guy trying to master his craft — publicly, consistently, and without shortcuts. If you relate to the grind more than the glory, welcome to my corner of the internet.
1. Introduction
1.1 Purpose
The purpose of this Software Design & Architecture (SDA) document is to translate the requirements defined in the Software Requirements Specification (SRS) into a concrete design blueprint for the authentication system.
This document provides the structural and technical foundation for implementation, covering architecture, data models, modules, and interaction flows. The authentication system will serve as a core service, ensuring that users can securely register, log in, and manage their identities.
By defining the architecture and design choices upfront, this document ensures that the system is secure, scalable, and extensible enough to evolve into enterprise-grade identity and access management (IAM) in the future.
1.2 Scope
This document focuses on the authentication subsystem, detailing its architecture, data flow, and integration with supporting services such as caching, external identity providers, and monitoring. Authorization beyond basic role-based access control (RBAC) is out of scope for this version but considered for future extensions.
1.3 References
Software Requirements Specification (SRS) – Authentication System
RFC 7519 – JSON Web Tokens (JWT)
OAuth 2.0 & OpenID Connect specifications
MongoDB documentation
Redis documentation
1.4 Definitions & Abbreviations
IAM: Identity and Access Management
MFA: Multi-Factor Authentication
SSO: Single Sign-On
JWT: JSON Web Token
2. Architecture Goals & Principles
2.1 Architecture Goals
The authentication system is designed with the following overarching goals:
Balance security and usability: Strong security controls are implemented without creating unnecessary friction for end users.
Scale to thousands of users: The system is optimized for small-to-medium scale (portfolio or demo use), but follows patterns that can extend to larger user bases if required.
High availability: Authentication is mission-critical; the design minimizes single points of failure and ensures redundancy where possible.
Extensibility for learning: While this implementation is intended for portfolio demonstration, the architecture models patterns used in enterprise-grade identity systems.
Enterprise readiness: Even at small scale, the design follows enterprise-level principles (RBAC, observability, federation, testing).
Portability: The system can be deployed on any hosting platform (e.g., Render, AWS, GCP) without being locked into proprietary services.
2.2 Architecture Principles
The following principles guide the design and implementation:
Security by Design
Security is treated as a first-class concern, but always balanced with usability.
Sensitive data (passwords, tokens, secrets) is always encrypted at rest and in transit.
Authentication follows least privilege by enforcing fine-grained RBAC from the start.
API-first, UI-enabled
All features are exposed via APIs (REST/GraphQL), ensuring integration with other systems.
A user-facing UI is layered on top for usability and demonstration purposes.
Stateful session management (with flexibility)
Sessions are managed with a database (MongoDB) and cache (Redis) for performance.
While not stateless, the design allows migration to stateless token-based flows if required.
Observability built-in
Logging, monitoring, and auditing are integral parts of the architecture.
Security events (e.g., failed logins, privilege escalations) are tracked from day one.
High availability & fault tolerance
Components are designed to recover gracefully from failure.
Database and cache are configured to support redundancy and replication.
Extensible Identity Federation
From the initial release, support for external providers (Google, GitHub) is built in.
Future extension to other providers (e.g., Facebook, SAML-based enterprise IdPs) is supported by the modular architecture.
Performance-oriented development
- Design favors rapid development using proven libraries, but with an eye on efficiency (e.g., caching tokens, minimizing DB queries).
Testability as a principle
The system is built with automated unit and integration testing in mind.
Authentication flows (registration, login, MFA, RBAC enforcement) are validated systematically.
Enterprise complexity with portfolio clarity
The design intentionally mirrors enterprise-grade IAM systems (MFA-ready, federated logins, RBAC, monitoring).
At the same time, the documentation emphasizes clarity and learning value for portfolio purposes.
3. High-Level Architecture
This section describes the major building blocks of the authentication system, how they interact, and how the system should be deployed to meet the goals defined in Section 2 (balance security/usability, scale to thousands of users, high availability, portability, observability, and enterprise-ready practices).
3.1 Context Diagram
[Client Apps]
│
▼
[API Gateway] ──(authn, throttling, TLS)──► [Auth Service / Auth Core]
├─► [Token & Session Store (Redis)]
├─► [Primary DB (MongoDB): users, roles, audit]
├─► [External IdPs (OAuth/OIDC)]
└─► [Security Services: hashing, KMS, rate-limiter]
│
└─► [Monitoring & Logging: Prometheus/Grafana, ELK]
Notes & Next Steps
Define token lifetimes (access token TTL, refresh token TTL) and refresh rotation policy in the Session/Token Module (LLD).
Specify which responsibilities live in the API Gateway vs Auth Core (e.g., CSP enforcement, early 401/429 responses).
Add deployment considerations: single-region start with plan for multi-region session replication if needed.
3.2 Container / Component Diagram
Mermaid (Component) — paste into any Mermaid-capable renderer

Components & responsibilities (summary)
API Gateway / Auth Proxy: TLS termination, request routing, rate limiting, basic auth protection; passes requests to Auth API. Can handle initial JWT verification for static routes.
Auth API Service: Primary HTTP API (register, login, logout, refresh, password reset, role management). Implements business logic and validation.
Token Service: Issues and validates access tokens; handles refresh logic, rotation, revocation. Consults Redis for refresh token state.
User Service (DB): Stores user profiles, roles, permissions, account status. MongoDB with indexes for unique constraints.
MFA Service: TOTP generation/verification, backup codes, SMS/Email OTP orchestration (via Notification adapter).
OAuth Adapter: Encapsulates federation logic for Google/GitHub; normalizes provider identities to local user accounts.
Redis: In-memory store for refresh tokens, session indices, ephemeral locks, rate-limiting counters, and short-lived state.
Audit & Logging: Append-only audit log storage (also stored in Mongo or shipped to ELK). Critical for non-repudiation and incident response.
Monitoring / Observability: Metrics (Prometheus), tracing (OpenTelemetry), logs (ELK or Loki), alerting (Alertmanager).
WAF / CDN: Optional edge protection for brute-force, DDoS mitigation, and static assets.3.3 Deployment View (Logical cloud infra)
Mermaid code (Deployment) — paste into Mermaid rendere

Deployment notes
Kubernetes or managed containers (Render, AWS ECS, GKE) recommended for portability and scaling.
Load Balancer + CDN: LB for traffic distribution; CDN for static UI assets and to terminate TLS at edge.
MongoDB Replica Set: Primary + secondaries, periodic backups, and automated failover.
Redis Cluster with Sentinel / Managed Redis: Use Redis for ephemeral session/refresh token storage and rate-limiter counters.
Workers: Background job processors (email OTPs, token cleanup, auditing sync).
Observability stack: Prometheus, Grafana, Loki/ELK, OpenTelemetry traces.
Secrets management: Use cloud secret manager (or Kubernetes secrets backed by vault) for signing keys and provider secrets.
3.4 Data & Control Flow (step-by-step)
Login / Register
Client → API Gateway → Auth API
Auth API validates credentials → UserService (Mongo)
On success: Token Service issues Access Token (short-lived) + Refresh Token (persistent)
Refresh token stored in Redis with device_id, token_id, expiry TTL
Access token returned to client (in-memory or HttpOnly cookie depending on app type); refresh token delivered via HttpOnly cookie or cookie+DB depending on design
API Request
Client sends request with access token (Authorization header or HttpOnly cookie)
API Gateway verifies signature (or delegates to Token Service) → route to Auth API or backend services
Token Refresh
Access token expired → client hits /refresh
Server validates refresh token (Redis) → rotates new refresh token and issues new access token
Logout / Revoke
Client requests logout → Auth API deletes refresh token entry from Redis and expires cookies
Audit log entry written
Federation Login
- Client redirected to OAuth Adapter → external provider → callback → OAuthAdapter normalizes identity → link or create local user → issue tokens as above
3.5 How this maps to Architecture Goals & Principles
Balance security & usability: short-lived access tokens + revocable refresh tokens; MFA and federation optional but built-in.
Scale to thousands: stateless-ish APIs + Redis for ephemeral session state; services horizontally scalable.
High availability: LB, DB replica set, Redis cluster, multiple pod replicas.
Portability: containerized services, managed DB/Redis or self-hosted; works on Render/AWS/GCP.
Observability: Telemetry, logging, and alerting integrated; security events are first-class metrics.
Extensibility & enterprise parity: modular services (MFA, OAuth adapter, Token Service) allow future growth into SSO/federation.
3.6 Security & Operational considerations (actionable)
TLS everywhere; terminate TLS at LB or CDN, but enforce end-to-end where possible.
Secrets stored in Secret Manager / Vault; signing keys rotated periodically.
Cookie flags:
HttpOnly,Secure,SameSite=Strict/Laxdepending on UX.CSRF protection when using cookies (SameSite + CSRF tokens for non-GET stateful endpoints).
Token rotation & revocation: implement refresh token rotation, store token metadata in Redis (device_id, issued_at, ip_hash).
Rate limiting: per-IP and per-account; counters in Redis.
WAF & Bot protection for brute-force mitigation.
Audit trail: Immutable audit log for security actions (store centrally, retention policy).
Backups: Automated backup schedule for Mongo; snapshot retention and test restores.
Disaster recovery: Cross-region replicas for Mongo if required; plan RTO/RPO.
3.7 Metrics & Alerts (examples you should include)
Auth Success Rate (errors / second) — alert if error rate > 1% of traffic
Login Latency (p95) — alert if > 200ms (your SRS perf target)
Token Refresh Latency
Failed Login Attempts per IP / Account — alert and auto-throttle
Redis Memory Usage — alert if > 75%
Mongo Primary Election / Replica Lag — alert when lag > threshold
High number of revoked tokens — possible breach indicator
3.8 Deliverables / Artifacts to attach
Mermaid/PlantUML component & deployment diagrams (this section)
OpenAPI/Swagger spec for Auth API
Terraform/Kubernetes manifests or Docker Compose (deployment examples)
Prometheus/Grafana dashboards and alert rules
Threat model (STRIDE), and an audit log retention + access policy
4. Detailed Component Design — 7 Services
Service 1 — Auth API Service (Orchestrator / Gateway-facing)
Responsibilities
Primary external entry point for clients (web/mobile/3rd-party).
Validate request schemas, throttle/rate-limit, anti-bruteforce.
Orchestrate flows: login, register, refresh, logout, password reset, federation callbacks, MFA challenges.
Enforce API-level auth (cookie/read access token presence) and forward to backend services.
Sanitize and return errors in consistent format.
Public Interfaces (REST)
POST /v1/auth/login
POST /v1/auth/register
POST /v1/auth/refresh
POST /v1/auth/logout
POST /v1/auth/password/forgot
POST /v1/auth/password/reset
GET /v1/auth/oauth/callback
POST /v1/auth/mfa/verify
Example /login (req/resp)
POST /v1/auth/login
{
"email":"user@example.com",
"password":"hunter2",
"device_id":"chrome-2025-10-01",
"client_info":{ "ip":"1.2.3.4", "ua":"…" }
}
200 OK
{
"accessToken": "<jwt>",
"expiresIn": 900
}
(Refresh token is issued/stored server-side and returned via HttpOnly cookie.)
Internal Interfaces / Calls
UserService.verifyCredentials(email, password, deviceInfo)MFAService.checkRequired(userId)→ maybe challengeTokenService.issueTokens(userId, deviceId, scope)AuditService.logEvent(eventType, meta)OAuthService.handleCallback(params)
Data/Storage
No own DB (stateless). Relies on other services.
Short-lived in-memory caches for throttling counters (or Redis).
Security
Validate input with strict schema (JSON schema).
Rate-limit by IP and account.
WAF at edge recommended.
All responses use generic failure messages to avoid user enumeration.
Scaling & HA
Stateless pods; scale horizontally behind LB.
Use shared Redis for rate-limits.
Keep session affinity unnecessary.
Failure modes & mitigation
Downstream user DB or Redis failure → return 503 + circuit breaker.
Use exponential backoff and retry for transient ops.
Observability
Metrics: request rate, error rate, latency (p95,p99), auth failures per account/IP.
Logs: structured JSON with trace id and minimal PII (never log raw passwords).
Events: emit audit events for login success/failure, token refresh, password changes.
Testing
Unit tests for controllers & validators.
Integration tests mocking UserService/TokenService.
E2E tests for full flows.
Service 2 — User Management Service
Responsibilities
Persistent user records (create/read/update/delete).
Password hashing + verification (Argon2 / bcrypt).
Email/phone verification lifecycle.
Link/unlink external provider identities.
Expose CRUD + admin operations for roles & attributes.
Interfaces (internal API)
verifyCredentials(email, password) -> {user, status}
createUser(userPayload) -> user
findUserByEmail(email) -> user
updateUser(userId, patch) -> user
linkExternalIdentity(userId, provider, providerUserId, attrs)
incrementFailedAttempt(userId)
resetFailedAttempts(userId)
Data model (Mongo collections)
users(id, email, password_hash, name, phone, mfa_enabled, status, createdAt, lastLogin, failedAttempts, device_metadata[])user_oauth(userId, provider, providerUserId, linkedAt)roles&role_permissionsor stored separately in Policy Service
Security
Passwords hashed with Argon2 (recommended) or bcrypt with strong params.
Enforce unique email/phone indexes.
Use per-user salt (inherent to Argon2).
PII at rest encrypted (field-level or DB-level).
Interactions
Called by Auth API for verify/create.
Emits
user.created,user.updated,user.failed_login,user.lockedevents to AuditService.
Scaling
Vertical or read-replicas for reads (if many reads).
Use indexes for auth lookups by email.
If user volume grows huge, consider sharding.
Failure
- DB replica lag → stale reads for lastLogin. Handle eventual consistency.
Observability
Metrics: user create rate, failed_logins per minute, account locks.
Logs: events for user lifecycle.
Testing
- Hash/verify unit tests, migration tests, unique constraint tests.
Service 3 — Token & Session Service ( Token Service + Session Store)
Responsibilities
Issue/validate JWT access tokens.
Create, rotate, revoke refresh tokens.
Persistent session index in Redis (fast lookups) and optional DB backup.
Device binding: tie refresh token to
device_idand metadata (IP hash, UA).Enforce token revocation & Blacklist/allow-list logic.
Interfaces
issueTokens(userId, deviceId, clientInfo) -> { accessToken, refreshTokenId }
validateAccessToken(accessToken) -> { valid, claims }
validateRefreshToken(refreshToken, deviceId) -> { valid, tokenId }
rotateRefreshToken(oldTokenId) -> newTokenId
revokeRefreshToken(tokenId)
revokeAllForUser(userId)
listActiveSessions(userId) -> [session]
Data storage & key schema
Redis hash per refresh token:
refresh: < tokenId >→ {userId, deviceId, issuedAt, expiresAt, nonce, rotated : false}Optionally: Reverse index:
user_sessions:<userId>→ sorted set tokenIds by issuedAt (for listing)JWT signing keys stored in Secrets Manager; rotate and support key IDs (kid) in token header.
Security & rotation
Access token short-lived (e.g., 10–15m).
Refresh token backed by Redis and rotated on each use. On rotation, mark old token revoked.
Enforce deviceId + refresh token mapping.
Implement detection: if rotated refresh token used twice => suspect token theft => revoke all sessions & force MFA.
Token Storage Flow Diagram (Hybrid Approach)
┌────────────────────────┐
│ Browser │
│ │
│ HttpOnly Secure Cookie │
│ ────────────────┐ │
│ • Access Token │ │
│ • Refresh Token? │(opt)│
└─────────┬────────┘
│
▼
┌────────────────────────┐
│ API Server │
│ │
│ 1. Validates Access │
│ Token from cookie │
│ 2. If expired → │
│ uses Refresh Token │
└─────────┬────────┬─────┘
│ │
│ ▼
│ ┌──────────────────┐
│ │ Redis DB │
│ │ (Refresh Tokens) │
│ │ • user_id │
│ │ • token_id │
│ │ • device_id │
│ │ • expiry (TTL) │
│ └──────────────────┘
│
▼
┌────────────────────────┐
│ MongoDB (Core) │
│ │
│ • Users │
│ • Roles & Permissions │
│ • Audit Logs │
└────────────────────────┘
How it works
User logs in →
Server issues Access Token (short-lived) in cookie.
Server also issues Refresh Token (long-lived) and stores it in Redis (linked to user & device).
On each request →
Browser automatically sends Access Token via cookie.
If Access Token expired → server checks Redis for refresh token validity.
If valid → new Access Token issued + refresh rotation.
Logout →
Access Token cookie cleared.
Refresh Token removed from Redis.
Scaling
Redis cluster with replication. Use TTLs to auto-expire sessions.
TokenService itself stateless: replicate across nodes.
Failure modes
Redis OOM or unavailability: fallback? If Redis down, you can block refreshes (fail safe).
Key rotation mismatch: keep previous signing keys for validation.
Observability
Metrics: token issuance rate, refresh attempt rate, refresh failures, revocations.
Audit events for token rotation and mass-revoke.
Testing
- Exhaustive token rotation tests, replay detection tests, TTL expiry tests.
Service 4 — MFA Service
Responsibilities
Manage TOTP secrets, backup codes, SMS/Email OTPs.
Provide enrollment/setup flows & challenge verification.
Rate-limit OTP requests and coordinate with Notification service.
Interfaces
setupTOTP(userId) -> { secret, qrCodeUri } // One-time return
verifyTOTP(userId, code) -> boolean
generateBackupCodes(userId) -> [codes]
sendOTPViaSMS(userId, phone) -> otpId
verifyOTP(otpId, code) -> boolean
Data & secrets
mfa_credentialscollection (userId, type, secret_hash, createdAt, lastUsed).When storing TOTP secrets, either encrypt the secret or store derived secrets (hash) and keep cleartext only briefly at setup.
Security
Use HMAC-based TOTP generation libraries, prevent brute force by counting attempts.
Store backup codes hashed (not plaintext).
Limit OTP send rates (per user, per IP).
Interactions
Called by Auth API during login if
mfa_enabled.Emits
mfa.challenge,mfa.success,mfa.failedto Audit.
Scaling
Stateless verification service; scale horizontally.
Use worker/queue for SMS/email sending.
Failure
SMS provider outage → fallback to email or show error.
If verification store is unavailable, deny MFA-challenged auth attempts (fail secure).
Observability
- Metrics: OTP sent, OTP verify success/fail, TOTP verify latency.
Testing
- TOTP golden tests with fixed seeds, replay attacks, rate limit tests.
Service 5 — Audit & Telemetry Service
Responsibilities
Collect, store, and index audit events (login success/failure, token rotation, role changes).
Expose querying for compliance and admin UIs.
Ship metrics to Prometheus/Grafana and logs to ELK/Loki.
Interfaces
POST /v1/audit/events(internal)GET /v1/audit/user/:id(admin)Metrics: Prometheus scrape endpoints
Data model
audit_logs(logId, timestamp, eventType, userId, tokenId, ip, deviceInfo, meta JSON)Immutable append-only. Retention policy (e.g., 90 days in hot store, cold archive thereafter).
Storage
- Elasticsearch or Mongo (append-only), backed by object storage for archived logs.
Security
Strong RBAC for audit querying endpoints.
Ensure tamper-resistance: sign logs or write to append-only store.
Interactions
Subscribes to events from all services via event bus.
Receives direct writes from Auth API for immediate-critical events.
Scaling
Indexing pipeline for logs; use partitioning by time.
Archival jobs.
Failure
- Logging delays acceptable but loss of logs is not; use local persistence & retry.
Observability
- Provide dashboards, alerts for anomalies (spike in failed logins).
Testing
- Ensure event semantics, retention, query performance.
Service 6 — OAuth / Federation Service
Responsibilities
Encapsulate provider-specific logic for OAuth/OIDC (Google, GitHub etc.).
Normalize provider identity to canonical internal user profile.
Handle redirect/callback flows and account linking/unlinking.
Interfaces
GET /v1/oauth/{provider}/authorize→ redirect URLGET /v1/oauth/{provider}/callback?code=...→ handle callback, exchange code for token, normalize user info
Interactions
Calls third-party provider endpoints (token exchange, userinfo).
Calls
UserService.linkExternalIdentityorUserService.createUserwith normalized data.Calls
TokenService.issueTokens.
Security
Protect redirect URIs, support PKCE for public clients.
Keep client secrets in Secret Manager.
Validate provider certs and token signatures.
Scaling
Stateless; scale horizontally.
Cache provider metadata.
Failure
Provider downtime → fallback message and retry.
Inconsistent provider data → require manual review.
Observability
Metrics: federation success rate, provider error rates.
Logs: provider responses, mapping decisions.
Testing
- Contract tests with provider mocks; end-to-end with test OAuth clients.
Service 7 — Policy / RBAC Service (Auth Z)
Responsibilities
Store roles, permissions, and policies. Evaluate authorization requests (PDP — policy decision point).
Support both RBAC and ABAC (attributes-based) evaluation for fine-grained permissions.
Expose APIs for admin to manage roles & permissions.
Interfaces
isAllowed(userId, resource, action, context) -> {allowed, reason}
getPermissions(userId)
assignRole(userId, role)
createRole(roleName, permissions)
Data model
roles(roleId, name, description)permissions(permId, name, description)role_permissionsmappinguser_rolesmapping (or refer to UserService storing role list)
Interactions
Auth API consults
PolicyService.isAllowed(...)before returning data for sensitive endpoints.Admin UI calls this service for role management.
Security
Strict admin auth for role changes; all changes emitted to AuditService.
Cache decisions for short time (TTL) for performance; cache invalidation on role change.
Scaling
- Read-heavy; use caching (Redis). Use eventual consistency for changes (invalidate caches).
Failure
- If Policy service down, choose fallback: fail-closed (deny) or fail-open (risky). For security, prefer fail-closed for critical checks.
Observability
- Metrics: authorization decision latency, cache hit rate, denials per minute.
Testing
- Policy unit tests for edge cases; property-based tests for combinatorial policies.
Cross-Service Design Patterns & Extras
Eventing & Contracts
Use an event bus to decouple Audit/Telemetry & asynchronous tasks. Examples:
user.created,login.success,login.failed,token.rotated,mfa.failed.Keep event schema versioned.
Secrets & Keys
- JWT signing keys stored in Secret Manager / Vault. Support key rotation and
kidheader. Keep old public keys for validation until expiry.
Token rotation & replay detection
- Rotation: issue new refresh token with every refresh. Mark previous as rotated; if an old token appears later, treat as theft -> revoke all sessions for that user and force re-authentication + alert.
Session listing & device management
- TokenService/SessionService maintains
user_sessions:<userId>sorted set. Auth UI can show active sessions and allow per-device logout.
CSRF + Cookie decisions
- Refresh tokens in HttpOnly cookie, SameSite=strict/lax depending on cross-site needs. Use an anti-CSRF token for state-changing endpoints if cookie-based.
Practical artifacts
For each service: OpenAPI spec (external APIs) + internal RPC spec (gRPC/HTTP).
Data model ERD & Redis key schema.
Sequence diagrams for Login, Refresh, Register, ForgotPassword, Federation (Mermaid).
Deployment manifests (k8s) with readiness/liveness probes & resource limits.
Prometheus metrics list + Grafana dashboard templates.
Threat model (STRIDE) + mitigation mapping to components.
Testing matrix (unit, integration, contract, E2E).
5. Data Design
5.1 Purpose
The Data Design section defines how all data within the authentication and authorization system is physically structured, stored, and managed.
It translates the logical entities identified in the SRS and high-level Data Design into concrete MongoDB collections and Redis key structures, ensuring optimal performance and integrity.
This section establishes the foundation for data consistency, security, and scalability across all modules that handle user identities, authentication tokens, sessions, roles, permissions, and telemetry data.
It also defines relationships between collections, indexing strategies, encryption and hashing requirements, and data lifecycle policies — covering how data is created, updated, retained, archived, and eventually purged.
Ultimately, this section ensures that every component interacting with the data layer adheres to unified design principles that support high availability, fault tolerance, and compliance with privacy standards.
5.2 Data Design Overview
The authentication system’s data layer is composed of:
| Layer | Technology | Purpose |
| Primary Database | MongoDB | Persistent storage for user profiles, roles, permissions, audit logs |
| Ephemeral Store | Redis | Fast storage for sessions, refresh tokens, rate limits, OTPs |
| Long-term Logs | ELK / Loki | Append-only audit and security events for monitoring and forensics |
| Secrets Storage | Secret Manager / Vault | Signing keys, OAuth client secrets, MFA seed encryption keys |
Database Decision
Step 1: Identify the Nature of Each Data Type
We’ll classify every entity by data lifetime, volatility, access pattern, and sensitivity:
| Entity | Data Type | Lifetime | Access Pattern | Security Sensitivity |
users | Core identity | Long-term | Moderate read/write | 🔥 Very High |
last_devices | Ephemeral behavior log | Medium-term | Frequent writes, occasional reads | Medium |
mfa_credential | Secret data | Long-term | Rare writes, critical reads | 🔥 Extreme |
role, permission, user_role, role_permission | Access control definitions | Long-term | Read-heavy | High |
audit_log | Immutable security logs | Long-term | Write-heavy, rare reads | High |
oauth_provider | Token/linked account data | Medium-term | Rare writes, occasional reads | 🔥 High |
telemetry_event | Behavior data | Short to medium-term | Write-heavy | Medium |
session_store | Temporary session tokens | Short-lived | Constant read/write | 🔥 High |
Step 2: Match Each Type to the Right Database
MongoDB (Primary Persistent Store)
For:
usersrole,permission,user_role,role_permissionoauth_provider
Why:
Structured but flexible (perfect for identity & role data).
Easy JSON-based querying for user & role relations.
You can easily embed or reference relationships (1:N, N:M).
Encryption & Indexing:
Encrypt sensitive fields (
password_hash, tokens).Index on
email,username,user_id.
Redis (Ephemeral, Fast Store)
For:
session_storePossibly
last_devices(if used for recent login cache)
Why:
Blazing fast, built for TTL (time-to-live) sessions.
Perfect for session invalidation, token rotation, and device caching.
Native expiration = no cleanup cron jobs.
Vault / KMS (Secret Store)
For:
mfa_credential.secret_hashEncryption keys for JWT signing, refresh tokens, OAuth tokens
Why:
Redis/Mongo can be compromised; Vault isolates and encrypts secrets at rest.
You never expose raw secrets to your DB.
(Note: in your schema, secret should be a reference or encrypted placeholder, not the actual secret.)
Elasticsearch / PostgreSQL (Log Store)
For:
audit_log
telemetry_event
Why:
These are append-only, massive, and often queried by time.
Elasticsearch gives you full-text and fast time-based search (ideal for login analysis, fraud detection).
If you prefer structured relational logs, PostgreSQL with partitioned tables works fine too.
My Approach
Start with MongoDB + Redis, and later scale into a hybrid:
MongoDB → Users, Roles, OAuth, Permissions
Redis → Session Store, Last Devices
Vault → Secrets (MFA, keys)
Elasticsearch → Audit, Telemetry
Step 3: Visualize (Simple Overview)
┌──────────────────────────┐
│ Authentication API │
└────────────┬─────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌────────────┐
│ MongoDB │ │ Redis │ │ Vault/KMS │
│ users │ │ sessions │ │ mfa secrets│
│ roles │ │ last_devices │ └────────────┘
│ oauth_prov │ └──────────────┘
│ permissions│
└────────────┘
│
▼
┌──────────────┐
│ Elasticsearch│
│ audit_logs │
│ telemetry │
└──────────────┘
Summary
| Table | Best DB | Reason |
| users | MongoDB | Long-term core data |
| last_devices | Redis / MongoDB | Depends on if you want cache or history |
| mfa_credential | Vault + MongoDB ref | Secrets must be isolated |
| role / permission / user_role | MongoDB | Stable structure |
| audit_log | Elasticsearch | Time-based search and analytics |
| oauth_provider | MongoDB | Rarely updated identity link |
| telemetry_event | Elasticsearch | Heavy event ingestion |
| session_store | Redis | Fast token/session management |
5.3 Logical Data Model
Core Collections (MongoDB)
| Collection | Key Fields | Description |
| users | id (PK), email, username, password_hash, mfa_enabled, email_verified, phone_verified, is_locked, created_at, updated_at | Core user identity record. Primary lookup by email. |
| user_oauth | provider_id (PK), user_id (FK), provider_name, provider_user_id, access_token (encrypted), refresh_token (encrypted), expires_at | Stores external IdP linkages for federation (Google, GitHub). |
| roles | role_id (PK), name, description | Defines named roles such as admin, user, etc. |
| permissions | permission_id (PK), name, description | Granular access rights that can be assigned to roles. |
| role_permissions | role_id (FK), permission_id (FK) | Many-to-many mapping of roles to permissions. |
| user_roles | user_id (FK), role_id (FK) | Many-to-many mapping of users to roles. |
| audit_logs | log_id (PK), timestamp, user_id, event_type, ip, device, metadata | Immutable append-only logs for compliance, breach detection, and analytics. |
| mfa_credentials | user_id (FK), type, secret_hash, created_at, last_used | TOTP and OTP configuration per user. |
Relationships
users (1) — (M) user_oauth(one user can link multiple providers)users (M) — (M) rolesthroughuser_rolesroles (M) — (M) permissionsthroughrole_permissionsusers (1) — (M) audit_logsusers (1) — (1) mfa_credentials(if MFA enabled)
5.4 Physical Data Model (Redis + MongoDB Keys)
| Data Type | Storage | Key Pattern / Collection | Purpose | TTL |
| Access Tokens | Client memory / cookie | JWT (not stored server-side) | Short-lived access tokens for API auth | 15 min |
| Refresh Tokens | Redis | refresh:<tokenId> → {user_id, device_id, issued_at, expires_at} | Long-lived tokens for session refresh | 7–30 days |
| User Session Index | Redis | user_sessions:<userId> → [tokenIds] | Track all devices/sessions per user | TTL = same as refresh |
| OTP Codes | Redis | otp:<userId> → {code_hash, expires_at} | Short-lived OTPs for MFA/forgot password | 2–5 min |
| Rate-Limit Counters | Redis | rate:<ip> | Track login attempts per IP | 1 min |
| Audit Logs (Hot) | MongoDB | audit_logs | Critical user actions | 90 days |
| Audit Logs (Cold) | Object Storage | /archive/audit/yyyy-mm-dd.log | Archived logs beyond 90 days | 1 year |
5.5 Data Entities and Fields
5.5.1 users
| Field | Type | Description |
id | String (PK) | Unique identifier (UUID). |
name | String | Display name. |
username | String | Unique system username. |
email | String | Unique email, used for login. |
phone_number | String | Optional verified phone number. |
bio | String | Optional user bio. |
password_hash | String | Argon2/bcrypt hash of password. |
profile_image | String (URL) | Optional avatar path. |
email_verified | Boolean | Email verification status. |
phone_verified | Boolean | Phone verification status. |
is_locked | Boolean | Indicates if account is temporarily locked due to failed logins. |
mfa_enabled | Boolean | Whether user has MFA enabled. |
is_logged_in | Boolean | Real-time login status (for analytics or session visualization). |
forgot_password_token | String | Token for password reset (hashed, short TTL). |
verification_token | String | Token for email/phone verification. |
lock_until | DateTime | Account lock expiry time. |
forgot_password_expiry | DateTime | Token expiry. |
created_at | DateTime | Account creation timestamp. |
updated_at | DateTime | Last profile update. |
last_login_at | DateTime | Last successful login time. |
last_failed_at | DateTime | Last failed login attempt. |
Indexes:
email(unique)username(unique)lock_until(TTL for auto-unlock, optional)
5.5.2 oauth_providers
| Field | Type | Description |
provider_id | String (PK) | Unique ID for this provider entry. |
user_id | String (FK → users.id) | Reference to local user. |
provider_name | String | e.g., google, github. |
provider_user_id | String | Provider-side user ID. |
access_token | String (encrypted) | OAuth access token. |
refresh_token | String (encrypted) | OAuth refresh token. |
expires_at | DateTime | Token expiration time. |
Indexes:
Compound (
provider_name,provider_user_id) unique.user_idindexed for reverse lookup.
5.5.3 roles, permissions, role_permissions
roles
| Field | Type | Description |
role_id | String (PK) | Unique role ID. |
name | String | Role name (admin, user, etc.). |
description | String | Human-readable description. |
permissions
| Field | Type | Description |
perm_id | String (PK) | Unique permission ID. |
name | String | Permission keyword (e.g., USER_CREATE, VIEW_AUDIT). |
description | String | Human-readable permission label. |
role_permissions
| Field | Type | Description |
role_id | String (FK → roles.role_id) | Linked role. |
perm_id | String (FK → permissions.perm_id) | Linked permission. |
5.5.4 mfa_credentials
| Field | Type | Description |
id | String (PK) | Unique ID. |
user_id | String (FK → users.id) | Owner user. |
type | Enum(TOTP, SMS, EMAIL) | MFA method. |
secret_hash | String | Encrypted or hashed secret. |
backup_codes | Array[String] | Hashed backup codes. |
created_at | DateTime | Creation timestamp. |
last_used | DateTime | Last time MFA verified. |
5.5.5 audit_logs
| Field | Type | Description |
log_id | String (PK) | Unique log entry ID. |
timestamp | DateTime | Event timestamp. |
event_type | String | Type (e.g., LOGIN_SUCCESS, PASSWORD_RESET). |
user_id | String (FK → users.id) | User related to event. |
token_id | String (optional) | Related token, if applicable. |
ip_address | String | Origin IP. |
device_info | Object | User-agent or device metadata. |
metadata | JSON | Arbitrary contextual details. |
5.6 Redis Schema (Ephemeral Data)
| Key Pattern | Type | Description | TTL |
refresh:{tokenId} | Hash | { userId, deviceId, issuedAt, expiresAt, rotated } | 30d |
user_sessions:{userId} | Sorted Set | Active sessions ordered by issuedAt | 30d |
rate_limit:{ip} | Counter | Request throttling per IP | Few seconds |
otp:{userId} | Hash | Temporary OTPs for password reset / MFA | 5m |
Redis stores no permanent user data—only volatile session and token state.
5.7 Index & Performance Design
| Collection | Index | Type | Purpose |
users | email (unique) | B-tree | Fast login lookups |
users | phone_number (unique) | B-tree | Account linking |
user_oauth | (provider_name, provider_user_id) | Composite | Fast OAuth lookups |
audit_logs | user_id, timestamp | Compound | Time-series queries |
roles | name (unique) | B-tree | Role lookup |
role_permissions | (role_id, permission_id) | Composite | Access check joins |
user_roles | (user_id, role_id) | Composite | Role assignment checks |
All indexes are optimized for read-heavy workloads (login, token refresh, session validation).
5.8 Relationships Diagram (Conceptual)
erDiagram
USERS ||--o{ OAUTH_PROVIDERS : "has"
USERS ||--o{ MFA_CREDENTIALS : "has"
USERS ||--o{ AUDIT_LOGS : "generates"
USERS ||--o{ USER_ROLES : "assigned"
ROLES ||--o{ ROLE_PERMISSIONS : "defines"
PERMISSIONS ||--o{ ROLE_PERMISSIONS : "belongs_to"
5.9 Data Integrity & Constraints
Uniqueness Constraints: Enforced on email, username, provider_user_id.
Foreign Key Consistency: Enforced at application layer since MongoDB is non-relational.
Soft Deletion: Users and roles can be soft-deleted via
status: "inactive".TTL Indexes: Expire temporary tokens and locked accounts automatically.
Encryption:
Passwords → Argon2 hash
Tokens & secrets → AES-256 (field-level encryption)
Sensitive configs → environment variables or Vault
5.10 Data Security Model
| Area | Control |
| Passwords | bcrypt hash with salt (bcrypt as fallback) |
| Tokens | Encrypted in Redis (AES-GCM via app layer if needed) |
| Secrets (MFA, OAuth) | Encrypted using KMS / Vault; never stored in plaintext |
| PII Fields (email, phone) | Field-level encryption or DB-level encryption (FLE) |
| Transport Layer | TLS 1.2+ enforced end-to-end |
| Data at Rest | MongoDB encryption-at-rest (EBS or Atlas-managed) |
5.11 Data Lifecycle & Retention
| Data Type | Retention Policy | Notes |
| Users | Until account deletion + 30 days grace period | GDPR-style retention |
| Sessions / Refresh Tokens | Auto-expire via TTL | 7–30 days configurable |
| Audit Logs | 90 days hot storage, 1 year cold archive | For compliance |
| OTP / MFA Codes | Auto-delete after expiry | Never stored in plaintext |
| Rate Limit Counters | Auto-expire | Short TTLs (1–5 min) |
Data lifecycle management ensures minimal persistence of sensitive data and compliance readiness (GDPR-style erase-on-delete).
5.12 Data Flow Summary
Example: Login + Token Lifecycle
1. Client submits credentials
2. Auth API → UserService (Mongo lookup + hash verify)
3. On success → TokenService issues access+refresh
4. Access token sent to client, refresh token saved in Redis
5. Refresh rotation on use; old token invalidated
6. Logout clears cookies + deletes Redis entry
7. AuditService logs login success/failure in Mongo
Example: MFA Enrollment
1. User enables MFA
2. MFAService generates secret → encrypted + stored in mfa_credentials
3. On login, MFAService verifies TOTP or OTP via Redis (for SMS/Email)
4. Audit entry written for challenge success/failure
5.13 Data Consistency & Integrity
MongoDB uses document-level atomic operations for updates.
Redis data is ephemeral by design, safe for cache/session state but not source-of-truth.
Referential integrity between collections maintained at application level (e.g., user deletion triggers cascade cleanup of sessions and MFA).
Audit logs are append-only to prevent tampering.
5.14 Backup & Recovery Strategy
5.9 Backup & Retention
| Data Type | Backup Frequency | Retention | Notes |
| MongoDB (users, roles) | Daily | 30 days | Encrypted backups |
| Redis (sessions) | None | N/A | Volatile data only |
| Audit Logs | Weekly archival | 1 year | Archived to cold storage (S3/MinIO) |
| Component | Backup Frequency | Recovery Time Objective (RTO) | Notes |
| MongoDB | Daily snapshot | < 1 hour | Automated Atlas/Replica backup |
| Redis | Optional (RDB/AOF) | < 15 min | Only critical keys persisted |
| Audit Logs | Archived weekly | < 24 hours | Cold storage retrieval |
| Secrets | Managed by Vault / KMS | N/A | Versioned rotation |
5.15 Data Design Summary
The data design balances security, performance, and clarity:
MongoDB provides durable, structured persistence for identities and logs.
Redis ensures low-latency session management.
Encryption and TTLs protect sensitive data.
Modular schema design supports future IAM expansion (SSO, ABAC, SCIM).
Event-based updates (via audit service) ensure traceability and compliance.
5.16 Future Data Extensions
Device Fingerprinting Table — for anomaly detection and session monitoring.
Login History — separated from audit logs for faster analytics.
Session Geo-Analytics — user’s login country, device risk scoring.
Federation Metadata Table — store provider discovery endpoints and JWKS caching.
6. Detailed Workflows (UML Diagrams)
This is the heart of SDA.
Sequence Diagrams:
- User Login Flow.
Example: Detailed Login Sequence (Mermaid) — ties all 7 services
Notes on the diagram:
TokenServicestores refresh tokens in Redis with keys likerefresh:<tokenId>and indexes underuser_sessions:<userId>.
AuditServicegets events either sync or via event bus for immediate recording.
Token Refresh.
Password Reset.
MFA Verification.
Activity Diagrams:
- Account lifecycle (active → locked → deleted).
State Machine:
- Session lifecycle.
7. Security Architecture
Password hashing (Argon2/bcrypt).
Transport security (HTTPS, TLS 1.3).
Rate limiting strategies.
Threat model summary (brute force, replay attacks, session hijacking).
8. Scalability & Performance
Horizontal scaling of Auth API (stateless).
Session store in Redis cluster.
Token revocation lists (Bloom filter / DB).
Expected throughput (10k logins/sec).
9. Availability & Reliability
Redundancy (multi-region DB, load balancing).
Failover strategies (e.g., DB replica promotion).
Session persistence during failures.
10. External Integrations
OAuth providers (Google, GitHub, etc).
Monitoring tools (Prometheus, Grafana, ELK).
Notification services (email/SMS for OTPs).
11. Technology Stack
List chosen stack with rationale. Example:
Backend: Node.js/Express (fast, async).
Database: PostgreSQL for relational data.
Cache: Redis for sessions.
Security: JWT + bcrypt for passwords.
Infra: Kubernetes + Docker for scaling.
12. Risks & Mitigations
DB bottlenecks → use read replicas.
Redis memory exhaustion → eviction policy.
OAuth provider downtime → fallback login.
13. Future Enhancements
Add SSO (SAML, OIDC federation).
Add adaptive MFA.
Introduce risk-based authentication.
14. Appendices
API Specs (OpenAPI/Swagger snippet).
Glossary (carry over from SRS).
References (OAuth RFC, OWASP ASVS, NIST guidelines).