On Wednesday, August 15, the machine that is responsible for starting scheduled backups went down. The issue was detected on August 17, the machine was restarted and the backups were ran.
Less than 1% of websites were affected, which is why it took us so long to detect the issue.
Allocated additional machines for this task. Improved tracking of these machines to detect outages.
On Saturday, August the 11th we had an issue with worker machines that are responsible for processing our SEO Ranking results. The issue was that the majority of the machines ended up in an infinite loop so the huge spike in processing we have during the weekends wasn’t being processed fast enough. Unfortunately, our alerting thresholds weren’t set up correctly so we didn’t notice all of this before it was too late. Because we use a 3rd party API we couldn’t just throw more machines at the load so we had to start shedding the load. This means that some of the keywords weren’t processed so some results might be missing. This weekend might have additional keywords missing because of late processing.
Almost all of the users that use the SEO Ranking addon.
Alerting thresholds are adjusted and the rework of this part of the system has been scheduled.