ManageWP status log

When we break something, this is where you'll get all the gritty details

April 25, 2018 – Scheduled Checks: Infinity War (Ongoing)

What happened

On April 25 several users reported an issue where scheduled Security checks did not run. We are currently investigating the cause, and will update this page when we have more info.

How many users are affected

Less than 1% of the user base. If you have scheduled Security checks running, please double check if you have been affected.

What we are doing to prevent this from happening again

To be determined after we figure out the cause.

March 15, 2018 – The queue blues 2: queue harder

What happened

At 8:00 UTC we detected an issue with the scheduled backups not triggering. We quickly determined that one of the vendor libraries we use to process scheduled backups is crashing. At 10:30 UTC we rolled out a workaround, and soon after a permanent fix.

How many users are affected

Users with scheduled backups set to run on March 15 from around midnight UTC to 10:30 UTC. The scheduler has ignored these events, so we recommend triggering them manually if you need them.

Manual backups have not been affected.

What we are doing to prevent this from happening again

This issue was an edge case from the RabbitMQ failure on March 12. Now that we’ve finally propagated the update across the whole server infrastructure (and triggered this edge case where the update is incompatible with the vendor library), we can finally say that the RabbitMQ issue is over, just like John McClane’s marriage.

March 12, 2018 – The queue blues

What happened

At 6:25 UTC we started getting user feedback about manual backups getting stuck in queue. Our developers isolated the cause – a vendor software called RabbitMQ stopped consuming the messages, effectively freezing the job queue, globally.

After several reboots and a client update we finally got RabbitMQ working. But then, a plot twist hit. As we spun up another 40 instances to deal with the backup backlog, RabbitMQ crashed due to the sheer number of requests. At this point we decided to put ManageWP into maintenance mode until we stabilize the service.

At 14:07 UTC we rebooted RabbitMQ again, throttled the requests and brought ManageWP dashboard back online.

How many users are affected

The bug was global, and affected everyone who logged into the service in that time frame. Scheduled tasks that should have run in this time frame will be requeued, and will run in the next few hours. Manual tasks that were frozen in queue have been deleted. You will need to run these again.

What we are doing to prevent this from happening again

Unfortunately, there’s not much else that we could have done to affect the outcome. The vendor software that failed ran reliably for the past 4 years. When it started failing, we got a notification and reacted accordingly.

March 8, 2018 – The Butterfly effect

What happened

At 11:20 UTC an application bug started off as a random background service failure, blocking the PHP’s connection resources. Web servers and data servers were unable to accept connections or create new ones, effectively blocking the ManageWP service. At 13:30 UTC the bug was fixed.

How many users were affected

Everyone who logged into their dashboard in that timeframe. No data was lost, except some queued jobs we couldn’t process (updates, syncs etc.).

What we are doing to prevent this from happening again

Aside from enforcing a “make love, not bugs” policy, we’re adding a New Relic alert that will notify us about a bug before it gets to a stage where it can take the service down.

February 14, 2018 – The Queue files

What happened

At 20:00 GMT we detected a bug with the Safe Updates queue. In certain scenarios the system delays an update by 5 seconds, reducing load and preventing the update from triggering when a backup is being made (among other things). The bug prevented the update from getting into the queue after being delayed, leaving it in a limbo. This in turn resulted in an endless queue in the front end of the dashboard.

How many users were affected

The exact number is unknown, since some updates were affected, while others worked as usual. A rough estimate is that around 10% of users have been affected with at least one update delay.

What we are doing to prevent this from happening again

While we are not still certain what caused this bug, we’ve built a workaround for it, ensuring it does not happen again. We will not rest until we get to the bottom of this, no matter what it takes, Dana. The truth is out there.

January 22, 2018 – Maximum Overdrive

What happened

At 8:30 GMT a bug in the ManageWP server back end triggered a high volume of notifications that were sent to the server database. This in turn caused the server to become unstable. By 10:00 we fixed the bug and restored the service.

How many users were affected

People logging in between 8:30 and 10:00 GMT experienced intermittent glitches – some could not log in, getting a 502 error. Others would get an occasional error message on the dashboard, but were otherwise able to manage their websites.

What we are doing to prevent this from happening again

Diligence, diligence, diligence. We’re constantly ramping up our efforts to test the code we push live. As a result, we’re catching bugs that would otherwise be undetected. Some bugs will inevitably sneak into production, tho. It’s up to us to fix them ASAP, unless we want to inadvertently cause a machine uprising. And I’m not talking about the good kind, like Matrix, but the Maximum Overdrive kind, with Emilio Estevez.

 

 

Over 27,000 WordPress professionals are already using ManageWP

Add as many websites as you want for free, no credit card required. Sign up and start saving time!

Have questions? Get in touch!

Over 27,000 WordPress professionals are already using ManageWP

Add as many websites as you want for free, no credit card required. Sign up and start saving time!

Have questions? Get in touch!

Over 27,000 WordPress professionals are already using ManageWP

Add as many websites as you want for free, no credit card required. Sign up and start saving time!

Have questions? Get in touch!

Over 27,000 WordPress professionals are already using ManageWP

Add as many websites as you want for free, no credit card required. Sign up and start saving time!

Have questions? Get in touch!