Operational Resilience: Remove Key-Person Risk in Ops

Operational Resilience: What Happens When Your Key Ops Person Is on Holiday?

Operational resilience is often discussed in the context of major incidents: cyberattacks, market shocks, system outages.

But in many wealth and asset management operations, the most common resilience test is far more ordinary:

Your key ops person is on holiday.

Suddenly, everyday work slows down. Questions pile up. Exceptions sit unresolved. Approvals get delayed. And the team realizes that critical steps live in someone’s head.

This is key-person riskand it’s one of the most preventable operational vulnerabilities.

The good news: you don’t fix it by adding bureaucracy. You fix it by turning person-dependent processes into system-dependent workflows.

The real problem: “tribal knowledge” runs the process

Most firms don’t intentionally design processes to depend on one person. It happens gradually:

  • A senior operations specialist becomes the go-to for exceptions.
  • Workarounds evolve to handle edge cases.
  • Approvals happen in email because it’s “faster.”
  • Files get stored in ways that make sense locallybut not globally.

Over time, the process becomes a collection of habits rather than a repeatable system.

What it looks like in practice

  • “Ask Martina, she knows how this works.”
  • “I think the latest version is in that folderor maybe in the email thread.”
  • “We usually do it this way for that custodian.”
  • “Compliance approved it last time, but I can’t find the rationale.”

When the person holding that context is away, the organization loses speed and confidence.

Why holiday absences expose operational fragility

A short absence reveals what’s truly standardizedand what’s improvised.

1) Ownership becomes unclear

If the next step isn’t assigned to a role, it defaults to a person.

When that person is unavailable, tasks stall.

2) Exceptions become bottlenecks

Most operational pain lives in exceptions:

  • incomplete onboarding documents
  • mandate changes with special conditions
  • corporate actions with unusual treatment
  • fee disputes and billing edge cases

If exception handling isn’t systematized, it becomes “who remembers what to do.”

3) Approvals and evidence become hard to reconstruct

When approvals happen in email or chat, the organization can’t quickly answer:

  • who approved this?
  • when did it change?
  • which version applied at the time?

So teams reconstruct the storywhich is slow and risky.

4) Continuity depends on file habits

If document storage relies on personal naming conventions and folder structures, continuity breaks across:

  • teams
  • locations
  • new hires
  • temporary coverage

The operational cost of key-person risk

Key-person risk isn’t only a risk topic. It’s a performance topic.

It creates:

  • delays (work waits for the one person)
  • rework (others redo steps to be safe)
  • inconsistent outcomes (different people interpret the process differently)
  • audit friction (evidence is scattered)
  • stress and burnout (the key person never truly disconnects)

And it scales poorly: the more clients, custodians, and exceptions you manage, the more fragile the model becomes.

What resilient operations look like

Resilience is not about making everything rigid. It’s about making the critical path repeatable.

A resilient operating model has three pillars.

1) Systematized workflows (repeatable steps)

A workflow defines:

  • what triggers the process
  • what happens next
  • who owns each step (by role)
  • what constitutes completion
  • how exceptions are handled

This removes ambiguity and reduces dependence on memory.

2) Role-based access (coverage without overexposure)

When access is role-based, coverage becomes possible:

  • the right people can step in
  • sensitive data stays controlled
  • audit trails remain intact

This is especially important when temporary coverage is needed across teams or locations.

3) Automation that replaces “tribal knowledge”

Automation doesn’t replace judgment. It replaces manual coordination.

Examples:

  • routing tasks to the right owner automatically
  • reminders and escalations when deadlines approach
  • capturing approvals inside the process
  • logging who did what and when
  • applying retention rules consistently

This is how you keep work moving even when people rotate.

A practical blueprint: reduce key-person risk in 30 days

You don’t need a multi-year transformation to improve resilience. Start with the processes that create the most daily friction.

Step 1: Identify the “holiday risk” processes

Ask one question:

Which processes slow down when a specific person is away?

Typical candidates:

  • onboarding and KYC updates
  • mandate changes and restrictions
  • corporate actions processing
  • fee and billing exceptions
  • reporting pack assembly
  • document approvals and sign-offs

Pick 23 processes to start.

Step 2: Map the workflow (including exceptions)

Document the real workflow, not the ideal one:

  • intake sources
  • steps and decisions
  • exception paths
  • required approvals
  • evidence required

Most key-person risk lives in the exception paths.

Step 3: Convert “person ownership” into “role ownership”

Replace names with roles:

  • Client Service
  • Operations
  • Compliance
  • Portfolio Management
  • Management Approver

Now coverage becomes possible by design.

Step 4: Standardize documents and status

Ensure everyone can answer, at a glance:

  • what is this document?
  • what client/portfolio does it belong to?
  • what is its status (draft/signed/expired)?
  • what is the latest version?

Consistent tagging and version control reduce confusion immediately.

Step 5: Automate routing, approvals, and logging

Start with high-impact automation:

  • route tasks based on document type/status
  • notify the next owner
  • capture approvals in-system
  • log changes and decisions

The goal is to make the process self-explanatoryso it doesn’t rely on one person’s memory.

10-minute self-assessment

  1. Which processes slow down when one person is away?
  2. Are critical steps documentedor learned by shadowing?
  3. Are exceptions handled consistently or case-by-case?
  4. Do approvals happen in a system or in email/chat?
  5. Can someone else find the latest version in under 60 seconds?
  6. Is ownership defined by roleor by individual?
  7. Are tasks routed automaticallyor manually forwarded?
  8. Can you show who did what and when without reconstruction?
  9. Can temporary coverage be granted safely (role-based access)?
  10. Does the key person truly disconnector do they get pulled back in?

If these questions feel uncomfortable, your resilience risk is likely operational, not technical.

Conclusion: resilience is a workflow outcome

Operational resilience isn’t a policy. It’s the result of how work is designed.

When workflows are systematized, access is role-based, and automation replaces manual coordination:

  • continuity improves
  • exceptions become manageable
  • audits become easier
  • teams move faster with less stress