Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Internal Communication: We should set up a internal communication protocol whenever downtime is expected. Every team, irrespective of their day to day role, should be made aware of when and why the downtime will occur. An email chain, a specific Teams channel, etc.

  • Customer Communication: Clear communication with our users prior to any planned downtime - communicated via emails, pop-up banners or notifications on the website, and our various social media channels can drastically reduce confusion and frustration. It's worth mentioning that an entailment of what to expect and further assurance of our efforts to restore the service as soon as possible can help maintain the trust of our customers.

  • Website Maintenance Planning: Maximizing off-peak hours for conducting any construction work can drastically reduce the impact of downtime.

CURRENT

Current Process

Owner

Notes

Enhancement

Planned Releases /Outages

Release scheduled/approved

Deployment window scheduled around 9pm

EM

With deployments there is generally about 20-30 mins of downtime. Does this outage time vary?

  1. Possible to have later deployments to minimize user disruption?

During deployment

website inaccessible for users

There should be a website maintenance message that users see

This is from webscale

  1. Ensure this is displaying Katie Lucas confirming with webscale

  2. Updates to messaging?

Pre release communication

Email to stakeholder teams notifying of upcoming deployment and downtime window

PM current

Should this be coming from delivery managers?

Who receives today?

-marketing?

-support?

-product leaders?

  1. Expanded distro list
    Leverage other means beyond email- Teams channel ? Do we create a communities release distro to ensure same group notified every time?

Post release communication

EM replies to distro notifying deployment is complete

see above

Alert/notifications

#alerts channel notifies when sites are down and back up.

EMs or someone from engineering will generally communicate when due to release

Unplanned Outages/downtime

Alert/notifications

#alerts in slack notify when community sites are down (and back up again)

who monitors these channels?

What is escalation path here when there is outage?

Communication?

Engineering Incident Manager- Teams Channel

seems to be used as a forum for communicating unplanned outages and resolution..

Incident report published

...