Originally Posted by Ov Collyer View Post
Ok, here is what has happened so far, for those who would like to know.
- early this morning Marc Duffy started receiving alerts about issues with some of the servers
- at the same time people started posting in the forums that they were unable to login
- we quickly realised it was an issue with the hosting company and connectivity to the server farm and they were contacted
- they had been doing some scheduled work this morning which wasn't meant to have any effect on our servers, but obviously did
- they gave us an ETA of 15 minutes to reboot the necessary hardware, a time that Marc Duffy passed onto this thread
- it turned out that they reboot, which they were attempting to do remotely couldn't be done, so they sent someone to the site to do it manually
- after the reboot, we shutdown all our servers and began restarting them
- we wanted to allow the servers to process for a short time to "catch up" so that there wasn't any extra lag when everyone signed in, and I estimated 30 minutes for this (and posted as such) when normally it would only take 10-15 minutes, but I added some extra in to be on the safe side (or so I thought). As far as I was aware at this time, it was just a normal startup of the servers, and there were no further issues.
- we then noticed that this was catch-up processing was taking more time than usual, and in addition that there were errors in our game server output indicating connection problems, both from the game servers to the login servers, and amongst the game servers themselves
- further investigation revealed that there were more network issues at our hosting company, traced to another piece of hardware needing rebooting remotely
- this failed again, and we were told, again, someone was going on site to deal with the issue manually
- this investigation is ongoing, and I can assure you that the anger levels of some of those dealing with this are way beyond anything in this thread
I cannot say "it will be down all day" and I cannot say "it will be up in X amount of time" because both would be a lie.
I simply do not know, and we are waiting for guidance from our hosting company to confirm the issue has been resolved, after which we will verify things at our end and make sure our servers are able to communicate with each other, after which we will open the Gameworlds.
For those talking about contingency plans of housing servers elsewhere as a backup:
This is simply unworkable, the amount of data and the time-critical nature of it means it wouldn't work, as I believe Graeme posted earlier in the thread.
Instead we pay a premium to have our servers hosted where they are, and this is meant to include extremely high levels of availability, and all manner of power backup facilities etc.