Observability Fundamentals
Signals that stay useful under load
Shape metric collection so dashboards stay readable when traffic spikes, without drowning operators in noise.
- Format
- Cohort with live labs
- Duration
- 5 weeks, evenings (KR time)
- Skill
- Intermediate
- Stack
- Prometheus, Grafana
Tuition (informational): KRW 420,000
Request a syllabus conversation
This cohort walks through a small production-like cluster where cardinality grows intentionally each week. You instrument services, tighten rollups, and rehearse how annotations and recording rules keep panels legible for on-call engineers. The emphasis is on operational judgment: when to aggregate, when to split views, and how to document decisions so the next rotation inherits context.
What is included
- Hands-on labs with Prometheus-style scrapers and remote-write paths
- Playbooks for label hygiene and cardinality budgets on busy services
- Pair review of panel drafts with instructor annotations
- Checklists for onboarding new services to shared monitoring folders
- Lightweight synthetic checks that validate scrape health
- Short exercises on writing concise runbook snippets from graphs
- A capstone where teams defend their dashboard set to peers
Outcomes
- You can explain why a metric set is safe at expected peak cardinality.
- You produce a small dashboard pack that another engineer can adopt in one sitting.
- You leave with a personal list of anti-patterns you will block in code review.
Instructor of record
Former platform lead for a regional SaaS team; now focuses on teaching metric design as craft, not checkbox work.
Hana Sato
Primary feedback on labs
Participant questions
No. Labs run in shared sandboxes. If you can SSH into a Linux VM and read YAML, you are ready. Bringing a work scenario as a story helps, but it is optional.
Recent voices
“Week two forced me to delete half my old metrics. Painful, but the Trace the Noise lab finally made cardinality click.”
“Instructor comments on my runbook draft were blunt in a useful way—less adjectives, more thresholds tied to user journeys.”