post title
Back

Regarding the recent downtime - we're aware, and we're sorry

Over the last month, the Buycraft platform has experienced several days of unreliability. Since we started Buycraft one of our main priorities has been the performance of the

by Lee McNeil Posted on 13 December 2016

Over the last month, the Buycraft platform has experienced several days of unreliability. Since we started Buycraft one of our main priorities has been the performance of the website. Unfortunately, the last few weeks have been unacceptable. We know this and we cannot apologise for this enough. We've been working around the clock to resolve the issues and now we are at a point where things are stable again.

What caused the issues?

We knew that Black Friday would be important to our customers, and as such, we wanted to ensure that our infrastructure was able to handle it. We decided to move to new hosting infrastructure the week before Thanksgiving, using servers on the East Coast of the US. We did this for three reasons: improved application performance; lower latency; and, the ability to scale faster in the event of high load.

Buycraft has grown organically over time, which means that some of the codebase is around 4-5 years old. The new infrastructure revealed some of this legacy code wasn't up to today's standards, which was putting a strain on our database servers.

Rectifying and updating this legacy code of course took time. Even with our developers working full time on it, this ran into the Black Friday weekend. Combining unstable code with high load (averaging 45,000 requests per minute vs. our usual 5,000) put us in a situation that we should never be in.

The next week involved doing our utmost to keep the website running while refactoring significant parts of our codebase. The result: a more stable application with more consistent response times. Something still didn't seem right - deep down we were still not happy, and we knew our customers felt the same.

On Sunday evening everything went down again. We have developers on call 24/7/365, so we were investigating within 5 minutes. We assumed this would be caused by the new infrastructure again, but then we noticed something else. Over the past few days, it became clear that we were being targeted with a large DDOS attack which had been ongoing for 10 days or more.

What have we done to fix it?

As well as refactoring our codebase where needed, we've invested in extra hardware and caching to handle any increases in load.

We've also worked with Cloudflare to mitigate these attacks. This has involved adding systems to detect and block high-volume attacks against our platform.

During the times of worst performance, it was necessary to reduce our plugins 'phone home' frequency to every 15 minutes. This has now been reset to its standard duration.

Is anybody there?

While we were doing everything we could to solve these issues, there was one thing we didn't do enough of - keeping you informed. We know you were being asked difficult questions by your players, and we were not giving you the answers. We need to be better at letting you know what's going on. We dropped the ball on this, and we're sorry.

To ensure that we communicate better, we have changed most of our internal policies:

  • We've improved our status page to provide more detail and given all our developers access to provide updates on any ongoing issues. This is accessible at http://status.buycraft.net/
  • We've improved our deployment process, reducing the amount of downtime required to deploy new code from 5+ minutes to 30 seconds
  • We've made changes to procedures for our support team, so they know which developer to contact to get ongoing status updates which they can pass on to you
  • We've committed to making better use of Twitter to provide real-time updates of any issues

The last couple of weeks have been tough. Not just for us, but for our customers who have had tough questions asked of them from their players and we are sorry to everyone who was affected. At the same time, we thank you for your patience and loyalty throughout this, and we are pleased we've ended up with a stronger, more stable platform for everyone.

More from Tebex Blog

  • Tebex and Nodecraft Announce Game Server Hosting Partnership

    Tebex and Nodecraft Announce Game Server Hosting Partnership

    Tebex, a game monetization engine for game studios and game servers from Overwolf, announces a partnership to elevate products and services for new and existing customers of game server hosting provider, Nodecraft, Inc. Nodecraft, a premier game hosting provider, will provide a one-click integration for its customers in their server

    by Pedro Esparza Posted on 22 September 2023
  • Tebex Newsletter [Q1 2023 Edition]

    Tebex Newsletter [Q1 2023 Edition]

    Can you believe 2023 is already ¼  of the way behind us?  With that said, here is the first edition of our Tebex quarterly newsletter where we’ll be sharing some important matters, and updates regarding our platform. First, a look back at 2022 - we had some amazing accomplishments: * We

    by Pedro Esparza Posted on 06 April 2023
  • We're Acquiring Analyse.net

    We're Acquiring Analyse.net

    Since joining the Overwolf pack in March this year, our mission to create a new profession - the in game creator, has led us to researching many products across the game server industry. One product and team which stood out was Analyse.net, a new analytics platform designed at providing

    by Lee McNeil Posted on 18 November 2022