May 20, 2020

Salesforce down: What you can learn from this outage

Everyone has bad days. Every company has been through some kind of outage due to a buggy database deployment. Even the best of the best, with highly trained staff, world-class best practices and well thought out processes, make mistakes. On Friday, May 17th, 2019 Salesforce.com had a bad day.

The company deployed a faulty database change script that broke permission settings in production. This meant that users suddenly obtained read and write access to restricted data. This opened the door for an unauthorized employee to steal or tamper with the company’s data. As a result, Salesforce needed to take large parts of their infrastructure down to find and properly fix the issue. The outage lasted 15 hours and 8 minutes. According to Gartner’s formula ($5600/minute), this outage cost approximately $5 million. Plus, since so many companies rely on Salesforce, it was a very visible and embarrassing outage. (It’s earned several hashtags, including #SalesforceDown and #permissiongeddon.)

Rare picture from inside the #Salesforce main office right now pic.twitter.com/gAdxgori2f

— Morgan Scott (@MorganAfix) May 17, 2019

I couldn’t help but think about all of those Salesforce employees who needed to work like mad to take the whole database down, find the offending database script, and restore everything. All because of one change script. That’s not a fun way to spend a Friday night (and Saturday).

We’ve used #salesforce to run our entire business from Sales, AR, PM, Marketing and HR and THIS IS THE FIRST TIME in 10 years-TEN YEARS- we’ve EVER had an issue. We will continue to use and support SF always.

— MoCheekyMonkey (@mocheekymonkey) May 19, 2019

Before Friday, Salesforce customers have had very little disruption in service. Many loyal customers were tweeting about how rock solid the service has been and that’s impressive. They clearly had their stuff together over at Salesforce.

But everyone has blind spots.

Look for Blind Spots in Your Database Process

The bottom line is that this outage was preventable. There are tools out there, like Datical, that can auto-generate permissions on objects to ensure this doesn’t happen. And there are other ways to prevent this, like expanding test efforts and mirroring production data with solutions like Delphix to ensure scripts aren’t pushed to production without understanding what effect they might have.

If you haven’t looked into more robust database automation for your company yet, it’s well worth your time. We’d love to talk to you about how to prevent database production errors. We’re experts in this area. We’d love to connect with you about how we help our customers prevent database production errors every day. Contact us to learn more.

Erika Kalar
Erika Kalar
Share on: