ManageWP status page

When we break something, this is where you'll get all the gritty details

Current status:

Up and running


History log

August 28, 2019 – Service disruption

What happened

Our queueing system was under the siege once again. Strain of too many requests and queued processes finally slowed the system down to a halt. Since emptying queue and increasing resources was only a stopgap measure, our dev team built a completely separate queueing instance for the services that are generating increased number of requests.

As a consequence, some of the scheduled backups were performed after the situation was resolved.

How many users were affected

Only a part of users that were affected initially, but after the system slowed down significantly it was put in maintenance mode until new solution was deployed.

What are we doing to prevent this from happening again

As a short term solution, we separated queueing server instances for some of our services. In parallel, we are starting to work on the long term solution that includes complete rebuilding how some of our services are communicating with the scheduling system.

August 24, 2019 – Service disruption

What happened

A glitch in the system caused that some of the processes were not able to finish. Which cased them to return to the wait queue, where they opened a new process. After a while, this caused many connections to open and ‘clog’ one of the servers on the cluster.  Glitch was resolved, server restarted and system started behaving normally.

No data was lost and all the processes that were scheduled to run were successfully run after the server was restored to normal working order.

How many users were affected

Only a part of users that were on the affected server were impacted by this disruption, but the system was down while we restarted the server to put the system to working order.

What are we doing to prevent this from happening again

Our dev team identified a bug in the code the created the glitch and fixed it before deploying the solution and restarting the server instance.

August 2, 2019 – Elevated number of errors

We had some small issues with our services that caused a bit of a slowdown. After some adjustments, things should be back to normal. Data integrity is not compromised.

July 15, 2019 – Disruption of service

What happened

Our dev team investigated and found a large increase in requests on one of our shards that impacted system stability for the users on that particular shard. They scaled up the system resources and that stabilized the system. It was only a temporary measure, as the number of requests kept increasing. In parallel, another team traced the root of the problem. It took us a couple of hours due to the huge amount of incoming requests, but ultimately they identified a set of IP from which the requests were coming from. After all of the identified IP’s were banned, system slowly stabilized.

How many users were affected

Only a part of users that were on the affected shard were impacted by this disruption.

What are we doing to prevent this from happening again

We have our 24/7 on-call teams to jump in if similar situation appears in the future.

July 4, 2019 – System Maintenance

At 16:00 CEST (06:00 PST) We will have a short maintenance period in order to upgrade parts of our architecture. We expect the maintenance period to last around 30 minutes, during which the ManageWP dashboard will not be accessible.

Maintenance Update:

We observed communication issues between the newly upgraded component and the rest of the system. A fix has been deployed that resolved the issue. All systems are working normally now.

June 27, 2019 – System downtime

What happened

Essentially, we had to deal with more than one issue. 

  1. Most of our system are relying on AWS and Cloudflare DNS infrastructure were still working intermittently after June 24th issues. And when they were not working properly – our systems get overwhelmed. Our dev team jumped in the middle of the night to deal with the issues and were joined a couple hours later by the day shift when they all worked around the clock to stabilize the system. 
  2. While dealing with AWS aftermath, we had an increase in database load. It took us a while to get everything back to working order.
  3. After we launched the changes and a solution to previous problems, we ran into some code performance issues that affected users with a lot of websites. Again, this required a completely new solution that our team delivered as fast as they could.

With modifications and increased resources, everything was put back to working order on Friday/Saturday (June 28/29), with our team closely monitoring the situation. 

How many users were affected

All of our users experienced some system slowness and instabilities (some actions could not be performed) as well as system downtime while we deployed the changes.

What are we doing to prevent this from happening again

We’ve increased the number of secondary instances and database slaves so we can cope better in case of similar outages. But there is very little we can do if our main platform providers have issues.

We’ve also identified several potential areas for improvement, and we will address them in the next couple of weeks in order to further improve our performance.

June 24, 2019 – Disruption of service

What happened

Most of US northeast was impacted by the issues that originated from Verizon partially disrupting AWS and Cloudflare networks. This in turn affected our services. Our dev teams responded quickly and started working on restoring the system to its working conditions but it took a couple of hours to restore everything once the disruption was resolved.

How many users were affected

Service was unavailable during affected hours.

What are we doing to prevent this from happening again

Since this was an issue beyond our control, all we can do is to have a team in place to handle if something unexpected such as this happens again.

March 1, 2019 – Unscheduled maintenance

What happened

We’ve had an unexpected increase in load on one of our database shards. This happened due to a change we’ve made on the new Worker plugin. We’ve had to go to maintenance mode to resolve this extra load and get the system back into working order. The entire operation lasted less than 35 minutes.

How many users were affected

Service was unavailable during the maintenance.

What are we doing to prevent this from happening again

This is a reason we have a team on standby. To handle unexpected situations just like this one.

December 3, 2018 – Service downtime

What happened

On Friday, November 30, our primary server shard went went down. This led to the dashboard sync issues when our secondary systems took over. Needless to say, redundancy systems performed their backup function well in this case.

However, over the weekend we detected that the system struggled to keep up with the increased workload. That is why on Monday morning we decided to shutdown the system for maintenance and server upgrade.

During the 30 minute downtime, the server resources were scaled up to handle the increased workload. System was fully restored, and after a few hours the queue was empty of the scheduled tasks.

How many users were affected

All user that used the dashboard over the weekend were affected.

What are we doing to prevent this from happening again

We are increasing the resources on on our primary and secondary system to make sure they perform well in case of the similar occurrence in the future.

November 3-4, 2018 – Scheduled Backup not triggered

What happened

On Saturday, November 3, the producer that is responsible for sending the schedule list to the queue went down. The issue was detected on Sunday, November 4. The producer was restored and system returned to the normal working order. The issue has been resolved, and the backups will continue to run normally now.

The backups that were scheduled to run on those two days were delayed, but were all ran successfully once the producer was brought back up.

How many users were affected

Only the users that had scheduled backups for November 3 and 4 were affected.

What are we doing to prevent this from happening again

We are increasing redundancy for the secondary systems and changing the logic so something like this doesn’t happen again.

August 15-17, 2018 – Scheduled Backup not triggered

What happened

On Wednesday, August 15, the machine that is responsible for starting scheduled backups went down. The issue was detected on August 17, the machine was restarted and the backups were ran.

How many users were affected

Less than 1% of websites were affected, which is why it took us so long to detect the issue.

What are we doing to prevent this from happening again

Allocated additional machines for this task. Improved tracking of these machines to detect outages.

August 11, 2018 – SEO keyword ranking not updated

What happened

On Saturday, August the 11th we had an issue with worker machines that are responsible for processing our SEO Ranking results. The issue was that the majority of the machines ended up in an infinite loop so the huge spike in processing we have during the weekends wasn’t being processed fast enough. Unfortunately, our alerting thresholds weren’t set up correctly so we didn’t notice all of this before it was too late. Because we use a 3rd party API we couldn’t just throw more machines at the load so we had to start shedding the load. This means that some of the keywords weren’t processed so some results might be missing. This weekend might have additional keywords missing because of late processing.

How many users were affected

Almost all of the users that use the SEO Ranking addon.

What we are doing to prevent this from happening again

Alerting thresholds are adjusted and the rework of this part of the system has been scheduled.

Over 65,000 WordPress professionals are already using ManageWP

Add as many websites as you want for free, no credit card required. Sign up and start saving time!

Have questions? Get in touch!

Over 65,000 WordPress professionals are already using ManageWP

Add as many websites as you want for free, no credit card required. Sign up and start saving time!



Have questions? Get in touch!

Over 65,000 WordPress professionals are already using ManageWP

Add as many websites as you want for free, no credit card required. Sign up and start saving time!



Have questions? Get in touch!

Over 65,000 WordPress professionals are already using ManageWP

Add as many websites as you want for free, no credit card required. Sign up and start saving time!



Have questions? Get in touch!