Building an Enterprise LMS for 600+ Users: Go Backend Engineering Deep Dive

How I designed and built the backend of PT. CBI's internal Learning Management System using Golang, hexagonal architecture, Redis, RabbitMQ, and a dual-database SQL Server setup — serving the entire company's workforce.

When PT. Century Batteries Indonesia (CBI) needed a centralized platform to manage employee upskilling, training compliance, and HR workforce analytics, I took on the challenge of building it from scratch. The result is a full-stack internal LMS actively used by 600+ employees — the company's entire workforce — running in production every working day.

The system is split into two major domains:

Personal Development — where individual employees enroll in training modules (video, PDF, PowerPoint), take quizzes, and track their own skill progression against a competency matrix.
Team Development — where managers and HR staff analyze workforce capabilities through dashboards, manage training modules, run TNA (Training Needs Analysis) workflows, and perform skill gap analysis against the Kamus Kompetensi Gol 1-3 (a 243-competency, 112-role matrix sourced from the company's 2022 competency catalogue).

This article focuses on the Go backend — the architecture decisions, system design patterns, and the engineering problems worth talking about.

Tech Stack

Layer	Choice
Language	Go 1.22
Web Framework	Gin
ORM	GORM
Primary Database	SQL Server (Azure-hosted, dual-DB)
Cache / Rate Limiter	Redis
Message Broker	RabbitMQ
Auth	JWT (access + refresh token, HttpOnly cookies)
File Storage	Cloudinary (quiz images), local disk (module thumbnails)
Frontend	Next.js 15 (App Router)
Architecture	Hexagonal (Ports & Adapters)

System Architecture & ERD

Here’s the system architecture for the LMS’s Personal Development:

LMS System Architecture

Here’s the DB ERD for the LMS’s personal development:

LMS ERD

Architecture: Hexagonal / Clean Architecture

The backend follows hexagonal architecture — also called Ports & Adapters. The goal is simple: the core business logic knows nothing about HTTP, databases, or any infrastructure detail. It only speaks in terms of domain objects and interfaces.

cmd/
  main.go                  ← wires everything together

internal/
  adapter/
    config/                ← env + config parsing
    storage/
      postgres/            ← GORM connection wrappers
      redis/               ← Redis client wrapper
    handler/               ← HTTP handlers (Gin) + middleware
    repository/            ← GORM implementations of port interfaces

  core/
    domain/                ← pure Go structs, enums, DTOs, errors
    port/                  ← repository + service interfaces
    service/               ← business logic (no framework imports)
    util/                  ← JWT, password hashing, helpers

The layers are enforced by Go's import rules: core/service imports only core/domain and core/port. It never touches adapter/repository or adapter/handler. This means services are trivially testable with mock repositories and the entire business layer compiles without a database.

Why Hexagonal Over MVC?

MVC collapses the "model" into a blob that's simultaneously a database row, a business object, and a serialization target. When requirements change — and they always do — you end up modifying database queries inside handler logic or embedding HTTP concerns in what should be pure business rules.

With hexagonal architecture, the TNA approval chain logic lives entirely in service/tna.go. Adding a new approval step, changing the director-threshold rule, or plugging in a notification broker requires touching exactly one file. The HTTP handler just calls svc.ApproveEntry(...) and serializes the result.

Dual-Database Design

One of the more unusual constraints of this project: CBI runs two separate SQL Server databases — the LMS database (modules, activities, quizzes, enrollments, TNA, competencies) and the HR database (henkaten_assy: master_data_karyawan, divisi, departement, section, sub_section).

The HR database is managed by a separate system and is read-only from the LMS's perspective. This means:

User authentication happens against LMS users, but employee identity (NPK, jabatan, org hierarchy) is resolved from the HR DB.
Org lookups (department list, division list, section list) JOIN across HR DB tables.
The JWT access token embeds kode_jabatan, departement_id, divisi_id, nama_departement, and nama_divisi so handlers never need an extra round-trip to resolve who the caller is and what they can see.

// JWTClaims carries org context so every handler gets it for free
type JWTClaims struct {
    ID              uint   `json:"id"`
    NPK             int    `json:"npk"`
    Role            Role   `json:"role"`
    KodeJabatan     int    `json:"kode_jabatan"`
    DepartementID   int    `json:"departement_id"`
    DivisiID        int    `json:"divisi_id"`
    NamaDepartement string `json:"nama_departement"`
    NamaDivisi      string `json:"nama_divisi"`
    jwt.RegisteredClaims
}

The UserRepository holds two *postgres.DB handles — dbLMS and dbUser — and uses the correct one depending on which table it's querying. GORM's Raw() lets us write the enriched karyawan SELECT with LEFT JOINs across lookup tables without sacrificing type safety on the scan target:

const karyawanSelect = `
    SELECT mdk.*,
        ISNULL(div.divisi,       '') AS nama_divisi,
        ISNULL(dept.departement, '') AS nama_departement,
        ISNULL(sec.section,      '') AS nama_section
    FROM master_data_karyawan mdk
    LEFT JOIN divisi      div  ON mdk.id_divisi      = div.id_divisi
    LEFT JOIN departement dept ON mdk.id_departement = dept.id_departement
    LEFT JOIN section     sec  ON mdk.id_section     = sec.id_section
`

JWT: Access + Refresh Token with HttpOnly Cookies

The auth system uses a dual-token strategy:

Access token — short-lived (configurable, typically 15 minutes), carries all claims the backend needs to authorize a request.
Refresh token — long-lived (configurable, typically 7 days), used only to mint new access tokens.

Both are stored as HttpOnly, Secure, SameSite=Strict cookies — not in localStorage. This eliminates the entire class of XSS-based token theft. The frontend (Next.js) never sees the token string; it simply sends cookies with every request.

func GenerateJWTToken(
    conf *config.JWT,
    user *domain.User,
    instructor *domain.Instructor,
    karyawan *domain.KaryawanSummary,
    tokenType domain.TokenType,
) (string, error) {
    // resolves signing key, expiry, and builds JWTClaims
    // karyawan fields embedded so downstream handlers skip DB lookups
}

The middleware extracts and validates the access token, then injects claims into the Gin context. Every handler downstream calls claimsFromCtx(c) — a one-liner that casts c.Get("user") to *domain.JWTClaims. If the cast fails, the request never reaches business logic.

Token refresh hits GET /refresh: it validates the refresh token from the cookie, mints a new access token, and sets it as a new cookie — all without exposing token strings to JavaScript.

Redis: Caching + Rate Limiting

Redis serves two purposes: response caching for expensive aggregate queries and rate limiting to protect public and semi-public endpoints.

Response Caching

The module catalogue endpoint (GET /modules) is queried by every enrolled employee on login and page load. Rather than hitting SQL Server on every request, the handler checks Redis first:

func (h *ModuleHandler) GetModules(c *gin.Context) {
    cached, err := h.redis.Get(ctx, "modules:all")
    if err == nil {
        c.Data(http.StatusOK, "application/json", []byte(cached))
        return
    }
 
    modules, err := h.svc.GetModules(ctx)
    // ...serialize and set in Redis with TTL
    h.redis.Set(ctx, "modules:all", serialized, 5*time.Minute)
    c.JSON(http.StatusOK, modules)
}

Cache invalidation is explicit: when an instructor creates or updates a module, the handler calls h.redis.Del(ctx, "modules:all"). This avoids stale reads without needing a complex invalidation strategy — the data model is simple enough that key-based invalidation is sufficient.

Sliding Window Rate Limiting

The quiz attempt endpoints are the highest-risk surface for abuse — a student can repeatedly hit POST /quiz-attempts/start to get new attempt IDs. A Redis-backed sliding window limiter is applied as middleware:

func RateLimitMiddleware(rdb *redis.Client, limit int, window time.Duration) gin.HandlerFunc {
    return func(c *gin.Context) {
        key := fmt.Sprintf("rl:%s:%s", c.ClientIP(), c.FullPath())
        count, _ := rdb.Incr(ctx, key).Result()
        if count == 1 {
            rdb.Expire(ctx, key, window)
        }
        if count > int64(limit) {
            c.JSON(http.StatusTooManyRequests, gin.H{"error": "rate limit exceeded"})
            c.Abort()
            return
        }
        c.Next()
    }
}

The key is scoped per (IP, path) so a single user hammering quiz attempts doesn't affect login or module browsing for other users.

RabbitMQ: Async File Processing

Module thumbnails and quiz images go through an asynchronous upload pipeline via RabbitMQ. The synchronous path would block the HTTP response while uploading to Cloudinary, which is unacceptable at 600 concurrent users.

The pattern:

Handler receives the multipart file, saves it to local disk, and publishes a job message to a RabbitMQ queue.
Handler responds immediately with a 202 Accepted and the temporary local URL.
A worker goroutine consumes from the queue, uploads to Cloudinary, updates the DB record with the CDN URL, and deletes the local file.

type UploadJob struct {
    EntityType string `json:"entity_type"` // "module_thumbnail" | "quiz_question" | "quiz_choice"
    EntityID   uint   `json:"entity_id"`
    LocalPath  string `json:"local_path"`
}
 
func (w *UploadWorker) Start(ctx context.Context) {
    msgs, _ := w.ch.Consume("uploads", "", false, false, false, false, nil)
    for msg := range msgs {
        var job UploadJob
        json.Unmarshal(msg.Body, &job)
        cdnURL, err := w.cloudinary.Upload(ctx, job.LocalPath)
        if err != nil {
            msg.Nack(false, true) // requeue on failure
            continue
        }
        w.repo.UpdateThumbnailURL(ctx, job.EntityID, cdnURL)
        os.Remove(job.LocalPath)
        msg.Ack(false)
    }
}

Failed uploads are requeued with Nack(..., requeue=true). A dead-letter queue catches jobs that fail repeatedly so they can be inspected without data loss.

The N+1 Problem: Batch Karyawan Lookups

The GET /users endpoint returns all LMS users enriched with their karyawan data. A naive implementation would:

SELECT * FROM users → N rows
For each user: SELECT * FROM master_data_karyawan WHERE npk = ? → N queries

At 600 users, that's 601 SQL round-trips per request. The fix is a single bulk JOIN:

func (ur *UserRepository) GetUsersWithKaryawan(ctx context.Context) ([]domain.UserResponse, error) {
    users, _ := ur.GetUsers(ctx)
 
    // collect NPKs
    npks := make([]int, 0, len(users))
    for _, u := range users {
        if u.NPK != nil {
            npks = append(npks, *u.NPK)
        }
    }
 
    // one query for all karyawan
    var rows []karyawanRow
    ur.db.Raw(karyawanSelect+"WHERE mdk.npk IN ?", npks).Scan(&rows)
 
    // build lookup map
    karyawanMap := make(map[int]*domain.KaryawanSummary, len(rows))
    for i := range rows {
        r := &rows[i]
        karyawanMap[*r.NPK] = rowToSummary(r)
    }
 
    // merge
    result := make([]domain.UserResponse, 0, len(users))
    for _, u := range users {
        r := domain.UserResponse{ID: u.ID, NPK: u.NPK, Username: u.Username}
        if u.NPK != nil {
            r.Karyawan = karyawanMap[*u.NPK]
        }
        result = append(result, r)
    }
    return result, nil
}

2 SQL queries regardless of user count. The IN clause on npk hits the indexed primary key on master_data_karyawan, so the bulk fetch is as fast as a single-row lookup.

Database Indexing Strategy

SQL Server's query planner needs help on a few critical paths. The indexes added:

user_modules — the enrollment table is queried in three common patterns:

"What modules is this user enrolled in?" → (user_id)
"Who is enrolled in this module?" → (module_id)
"Is this user enrolled in this specific module?" → unique constraint (user_id, module_id)

CREATE UNIQUE INDEX idx_user_module ON user_modules(user_id, module_id);

tna_entries — TNA approval queries filter by tna_id (load all entries for a form) and status + current_step (approval queue):

CREATE INDEX idx_tna_entries_tna_id      ON tna_entries(tna_id);
CREATE INDEX idx_tna_entries_step_status ON tna_entries(current_step, status);

employee_competencies — the skill gap analysis aggregates over (user_id, gap_status):

CREATE UNIQUE INDEX idx_emp_competency ON employee_competencies(user_id, competency_id);
CREATE INDEX idx_emp_competency_gap    ON employee_competencies(user_id, gap_status);

quiz_attempts — students' ongoing quiz sessions:

CREATE INDEX idx_quiz_attempts_user_quiz ON quiz_attempts(user_id, quiz_id);

TNA Approval Chain: Domain-Driven State Machine

The Training Needs Analysis module is the most complex business logic in the system. Each TNA submission contains N entries (one per employee being registered for training), and each entry has its own independent 5-step approval chain:

kadept → kadiv → hr_admin → hr_kadept → [director if price ≥ Rp 2.5M]

The chain is stored as a JSON column (TNAApprovalChain) in nvarchar(max) — SQL Server has no native array type. The domain layer implements the state machine:

func NextStep(completed ApprovalStep, requiresDirector bool) ApprovalStep {
    switch completed {
    case StepKadept:  return StepKadiv
    case StepKadiv:   return StepHRAdmin
    case StepHRAdmin: return StepHRKadept
    case StepHRKadept:
        if requiresDirector { return StepDirector }
        return "" // chain complete
    case StepDirector: return ""
    }
    return ""
}

Authorization is purely kodeJabatan-based — not LMS role. A user with role=instructor and kodeJabatan=0 has no approval permissions, while a user with role=user and kodeJabatan=3 (department head) can approve at the kadept and hr_kadept steps. This is enforced in the service layer before any state transition:

func authoriseStep(step ApprovalStep, kodeJabatan int) error {
    switch step {
    case StepKadept:   if kodeJabatan != 3 { return ErrTNAUnauthorisedStep }
    case StepKadiv:    if kodeJabatan != 2 { return ErrTNAUnauthorisedStep }
    case StepHRAdmin:  if kodeJabatan != 4 { return ErrTNAUnauthorisedStep }
    case StepHRKadept: if kodeJabatan != 3 { return ErrTNAUnauthorisedStep }
    case StepDirector: if kodeJabatan != 1 && kodeJabatan != 2 {
        return ErrTNAUnauthorisedStep
    }
    }
    return nil
}

After each entry approval, the service recomputes the TNA header's aggregate status from its entries — all approved → fully_approved, any rejected → rejected, otherwise submitted. This is pure domain logic with no side effects visible to the DB layer.

Batch approval (PUT /tna/entries/batch-approve) processes N entries in a loop, collecting per-entry errors, and returns HTTP 207 (Multi-Status) when some succeed and some fail. The frontend can surface exactly which entries failed without losing the successful ones.

Quiz Management Under Load

Quizzes present a specific concurrency concern: when a module goes live and 50 employees open the quiz simultaneously, the naive implementation would create 50 quiz attempts in a race condition.

Three mechanisms work together:

1. Idempotency guard on attempt creation. Before inserting a new quiz_attempt, the service checks for an existing in_progress attempt for (user_id, quiz_id). If one exists, it returns it rather than creating a duplicate.

func (s *QuizService) StartQuizAttempt(ctx context.Context, userID int, quizID uint) (*domain.QuizAttempt, error) {
    existing, err := s.repo.GetActiveAttempt(ctx, userID, quizID)
    if err == nil && existing != nil {
        return existing, nil // idempotent — return the in-progress attempt
    }
    return s.repo.CreateAttempt(ctx, &domain.QuizAttempt{
        UserID:    userID,
        QuizID:    quizID,
        Status:    domain.AttemptInProgress,
        StartedAt: time.Now(),
    })
}

2. Answer submission is additive, not idempotent-by-default. PUT /quiz-answers/:id updates the existing answer record, not inserts a new one — preventing duplicate answer rows if a student's client retries.

3. Quiz submission is final. PUT /quiz-attempts/:id/submit transitions the attempt to submitted, calculates the score server-side by comparing the student's answers against the correct choices (stored as is_correct on quiz choices), and refuses re-submission if the attempt is already finalized.

Score calculation happens entirely in Go — never on the client — so the result can't be tampered with. The scoring logic walks quiz_answers joined with quiz_choices and counts correct selections against the total question count.

Competency System: Kamus Kompetensi

The competency module implements the company's 2022 Kamus Kompetensi Gol 1-3 — a sparse matrix of 243 competencies × 112 job roles × proficiency levels (2=Basic, 3=Intermediate, 4=Advanced).

The matrix is seeded from a Go slice of SeedRequirement structs (role name × competency seq_no × level). The seeder uses ON CONFLICT DO NOTHING semantics via GORM's FirstOrCreate to be idempotent — safe to run repeatedly in CI.

Skill gap analysis computes, per employee per competency:

gap_status = "met"     if actual_level >= required_level
           = "gap"     if actual_level <  required_level
           = "exceeds" if actual_level >  required_level
           = "unrated" if actual_level == 0

The EmployeeGapSummary aggregation runs as a single SQL GROUP BY query rather than iterating in Go:

SELECT
    gap_status,
    COUNT(*) AS count
FROM employee_competencies
WHERE user_id = ?
GROUP BY gap_status

Then Go maps the result rows into a EmployeeGapSummary struct. This keeps the aggregation in the database where it belongs and avoids loading N competency rows into memory just to count them.

Gin Router Design: Static Before Parametric

Gin uses a radix tree for routing. A common footgun is registering a wildcard route before a static route with the same prefix:

// BAD — gin matches /:id before /me
r.GET("/tna/:id",    handler.GetTNAByID)
r.GET("/tna/me",   handler.GetMyTNAs)    // never reached
 
// CORRECT — static routes first
r.GET("/tna/me",          handler.GetMyTNAs)
r.GET("/tna/entries/:id",   handler.GetTNAEntryByID)
r.GET("/tna/entries/batch-approve", handler.BatchApproveEntries) // before /:id
r.GET("/tna/:id",           handler.GetTNAByID)

This bit me early in development when /modules/me, /modules/enrolled, and /modules/enroll all conflicted with /modules/:id. The fix: register all literal-segment routes before any wildcard on the same prefix.

Role-Based Access: Three Middleware Layers

Access control is enforced at three levels:

1. AuthMiddleware — validates the JWT from the HttpOnly cookie or Authorization: Bearer header. Aborts with 401 if missing or expired.

2. RoleMiddleware — checks claims.Role against an allowlist. Used on instructor-only routes (module CRUD, activity management, report generation) and admin-only routes (hard deletes, user management).

3. Service-layer authorization — for org-hierarchy decisions that can't be expressed as a simple role check. Example: TNA approvals use claims.KodeJabatan inside service.ApproveTNA to enforce which step a user can act on. A department head (kodeJabatan=3) cannot approve the division head step (kadiv) even if they have role=instructor.

This layering means the HTTP layer is "dumb" — it just validates auth tokens. All business rules about who can do what live in the service layer where they're testable without spinning up an HTTP server.

JSON Columns for Structured Data in SQL Server

SQL Server has no native array or JSON column type equivalent to PostgreSQL's jsonb. For fields that need to store ordered lists (TNA approval chains, quiz answer selections, competency course structures), I use nvarchar(max) with a custom GORM scanner:

type TNAApprovalChain []TNAApprovalRecord
 
func (c TNAApprovalChain) Value() (driver.Value, error) {
    if c == nil { return "[]", nil }
    b, _ := json.Marshal(c)
    return string(b), nil
}
 
func (c *TNAApprovalChain) Scan(value any) error {
    raw, ok := value.(string)
    if !ok { return fmt.Errorf("unsupported type %T", value) }
    return json.Unmarshal([]byte(raw), c)
}

The same pattern is used for StringSlice (prerequisites, contacts, course structure, taglines on modules), TNAImplementationDetailsCol (HR admin's procurement data on TNA entries), and quiz SelectedQuestionIDs. The type implements driver.Valuer and sql.Scanner — GORM reads and writes it transparently without any special handling in handlers or services.

What I'd Do Differently

A few things I'd change with hindsight:

Database: SQL Server on Azure worked well for a company already standardized on Microsoft infrastructure, but the lack of jsonb, partial indexes, and RETURNING makes certain patterns more verbose than PostgreSQL equivalents. Given a greenfield project I'd choose Postgres.

Search: Module and competency search is currently a LIKE %query% full-table scan. At 600 users the performance is acceptable, but adding CONTAINS full-text search on SQL Server, or routing search to Elasticsearch, would be the right call if the module catalogue grows significantly.

Observability: Structured logging with zerolog is in place, but distributed tracing (OpenTelemetry) and a proper metrics pipeline (Prometheus + Grafana) would make diagnosing slow queries and latency spikes much faster in production.

Worker pool: The RabbitMQ consumer is a single goroutine. A bounded worker pool using errgroup would let multiple uploads process in parallel without risking unbounded goroutine growth.

Numbers

Metric	Value
Active users	600+
Modules in catalogue	80+ (growing)
Competencies in matrix	243
Job roles in matrix	112
TNA approval steps	5 (conditional director gate)
Quiz questions (total)	1,200+
API endpoints	120+
Avg. response time (cached)	< 10ms
Avg. response time (uncached)	40–120ms

Closing Thoughts

Building an LMS for a whole company is a different challenge than building one for a SaaS product. There's no gradual rollout — everyone uses it on day one. The users range from production-floor operators checking their module assignments to directors approving multi-million-rupiah training budgets through a 5-step chain. The system has to be correct, not just fast.

Hexagonal architecture paid off more than any other decision. When the TNA approval logic changed from a header-level chain to a per-entry chain (mid-project), the service layer was the only thing that changed. The HTTP handlers, database schema migration, and frontend API contracts updated independently without the usual cascade of changes through a tightly-coupled codebase.

The full source is private (company project), but I'm happy to walk through any of these designs in more depth. The patterns here — dual-database GORM setup, JWT with embedded org claims, Redis sliding-window rate limiting, RabbitMQ async uploads, per-entry approval state machines — are all things I'd carry into any backend engineering role.

Written by Yehezkiel | Go Backend Engineer Technologies: Go · Gin · GORM · SQL Server · Redis · RabbitMQ · Cloudinary · Next.js

Table of Contents

Building an Enterprise LMS for 600+ Users: Go Backend Engineering Deep Dive