At the moment we are experiencing some difficulties with our infrastructure and most of the services are not working properly. Our dev team is currently resolving the issue and everything is slowly returning to normal.
Due to the large load Client Reports were taking longer to process and send. This affected both scheduled and manual reports.
The optimization widget is stuck and non-responsive for the majority of users.
On November 8, we noticed an increased rate of errors that indicated a service disruption for some of the customers. After the investigation, our dev team discovered that our service was impacted by a bug in the software that one of our servers is using, which caused connections not to disconnect.
Initially, only a part of the customers on a single shard was affected but as the issue expanded part of customers experienced issues logging into service when there was a peek in connections.
Even though the issue was a bug in a piece of software out of our control part, the short-term solution was to restart the server while another part of the team worked on a more permanent solution. We are happy to report that our dev team came up with a permanent solution and the service will not be impacted by this issue again.
On August 31, we noticed a service disruption that affected our Backup tool, and some websites were experiencing connection issues. After the investigation, we discovered that our service was impacted by AWS us-west-2 cluster service outage.
There were connection disruptions along with the Backup processes (clone, restore, creation of backups) slowing down or even timing out. The stored data was not affected and the security system is stable.
Since this was an issue beyond our control, all we can do is to have a team in place to handle if something unexpected such as this happens again.
On November 23rd, we identified anomalous traffic targeting our sign-up forms from multiple IPs. Our engineering team quickly disabled new account sign-ups while they started their investigation. The investigation found the requests came from multiple sources targeting our registration API. No other services were affected.
No existing users were affected. There is no unauthorized data access, and the service was not interrupted in any way.
We implemented additional security layers to prevent this in the future.
On January 15th, we began to notice a slight increase in the backup queue and a larger number of support tickets regarding backups. After investigating, our developers found an issue with one of the shards. Backup speed was affected, causing a drop in backup performance or even some backups failing to perform due to a large wait time. Dev team restarted the affected shard, re-synced, and restored the system to the working conditions after a couple of hours. While the system was stabilized, the performance was below normal due to the large line of backups waiting to perform while the system was repaired.
Unfortunately, the slowdown repeated itself a couple of days after the situation, forcing our dev team to dig deeper. What we discovered is that a recent database update to a (minor) new version caused issues with our throttling speed with one of the new libraries.
It took our team some time to discover the exact library that was causing the malfunction because no effects were visible for several days after the update, which also made database restore to the previous version impossible at this point. Consequently, we had to re-write how our system works using the new libraries.
We want to thank you for your patience while we worked hard to restore this issue as soon as possible.
All the users that had backups scheduled or tried to perform new backups would have noticed that the backups were performing slowly or appeared ‘stuck’ during this period.
We are rewriting some of our code in order to make sure we’re fully compatible with the new database libraries.
During the investigation, we also discovered several ways of improving the backup speed, which should lead to faster backup performance after the situation is resolved.