Author Topic: Server Outage  (Read 2940 times)

Offline Mez

  • Hero Member
  • *****
  • Posts: 648
Server Outage
« on: January 03, 2007, 05:14:29 PM »
The server that hosted all the OPU services was down for 36 hours.

All services have now been restored.

Offline Leviathan

  • Hero Member
  • *****
  • Posts: 4055
Server Outage
« Reply #1 on: January 04, 2007, 01:00:02 AM »
Great to be back :D

In future people should be sure to log on to QuakeNet if they cant get onto our IRC server. We still have our channel there as a backup.

You can use /server irc.quakenet.org -j #outpost2 command to connect on mIRC.

Offline BlackBox

  • Administrator
  • Hero Member
  • *****
  • Posts: 3093
Server Outage
« Reply #2 on: January 04, 2007, 02:08:14 PM »
I think one thing that would be worth looking into (someone else mentioned this, I think it was Tellaris) would be a way to cause the client to try a different network such as Quakenet after 3 consecutive failures of connecting to OPU. Most of the new members were not aware that we use Quakenet as a backup and thus were completely unable to access us either through the site or IRC.

As for the reason for the outage, we haven't been able to pinpoint this. We do know that the server seemed to have been shut down (or crashed) for some reason, although we aren't sure why. Similar outages have occurred before but have not been much of a problem since we were able to respond quickly.

We are considering ideas such as spreading services onto different physical servers to lower the impact that an outage of any one server would have.

We apologize for the inconvenience.

Offline Eddy-B

  • Hero Member
  • *****
  • Posts: 1186
    • http://www.eddy-b.com
Server Outage
« Reply #3 on: January 04, 2007, 02:25:01 PM »
:lol:  i don't turn on my pc for a day, and i miss all the fun ?!
Rule #1:  Eddy is always right
Rule #2: If you think he's wrong, see rule #1
--------------------

Outpost : Renegades - Eddy-B.com - Electronics Pit[/siz

Offline instigator

  • Jr. Member
  • **
  • Posts: 89
Server Outage
« Reply #4 on: January 04, 2007, 04:54:51 PM »
i've heard stuff about servers just pooping out all over... like internet is really clogged or something. gaming servers have been laggy too lately

oh but i dont know anything :P

Offline Leviathan

  • Hero Member
  • *****
  • Posts: 4055
Server Outage
« Reply #5 on: January 04, 2007, 04:56:01 PM »
We need to add QuakeNet to the servers.ini file of our irc downloads.

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Server Outage
« Reply #6 on: January 05, 2007, 10:44:29 AM »
I heard something about an earthquake taking out some telecommunications equipment. I don't see why it would have affected this place though, even if the internet is being flooded with rerouted traffic.
 

Offline BlackBox

  • Administrator
  • Hero Member
  • *****
  • Posts: 3093
Server Outage
« Reply #7 on: January 05, 2007, 02:04:16 PM »
We have looked into the problem and there was no failure of the telecom system or the datacenter (we were still able to reach the datacenter through the internet).. the problem was with the VMM software we run on the main server. From the logs, we noticed that the VMM software was having "resource shortages" (I don't know any details, that's what the output from the logs said) starting at 4 am CST January 2nd. At approximately 11 am CST the system went down.

There's no way for us to get any other details regarding the outage past that point but my opinion is that the VMM kernel crashed and took down the whole system.

We're looking at the prospect of using different VMM software because this is not the first time something like this has happened.

Offline instigator

  • Jr. Member
  • **
  • Posts: 89
Server Outage
« Reply #8 on: January 05, 2007, 06:40:06 PM »
any possiblility of some malicious activity?
 

Offline BlackBox

  • Administrator
  • Hero Member
  • *****
  • Posts: 3093
Server Outage
« Reply #9 on: January 05, 2007, 07:01:34 PM »
I don't think so, it appears to just be some internal bugs in the VMM software.

Right now, we use an OS-level VMM, which is loaded as part of the Linux kernel, to separate services on the machine. For example the web and IRC services run in separate VM's on the machine, so if one was compromised, the other couldn't be affected since they would be unable to  access the other VM. For all intents and purposes, each VM is a separate machine to the internet even though they all run on the physical machine.

Personally I haven't been _that_ impressed with it -- it can be a pain when one of the VM's crashes, sometimes taking up to 20 minutes for a VM restart to get the affected services back online. The creator of the software claims 50 VMs could easily be supported on one machine such as ours, but it always seemed to bog down after only a few VMs began running.

Galactic and I have looked at switching to a different VMM.. for example, we have thought about using a 'hypervisor' model that runs directly on the hardware, and runs the OS inside of it.. this would allow different OSes to be run on the same machine simultaneously. I feel this might be a bit more reliable since the hypervisor is not 'entangled' with the operating system as it is now -- if one VM goes down, restarting it is as simple as restarting the operating system it contains.

Anyway, I'm on a bit of a rant. The main point is, we plan to try to do something to decrease the problems in the future. Most will see little or no change in how things work if we do switch to a different VMM software or get rid of it entirely.
« Last Edit: January 05, 2007, 07:03:27 PM by op2hacker »

Offline Tellaris

  • Sr. Member
  • ****
  • Posts: 460
Server Outage
« Reply #10 on: January 05, 2007, 08:45:24 PM »
Yea, I was thinking something like this..

Detect Retry # 3.
If return 0 (say 0 = fail, or no) then connect Quakenet.whatever.whatever, else return 1.
End.  (1 means nothing happens with this script)
Thats what I'd think it'd look like.   The exact code I don't know, and I think NNScript would be able to handle such an addition.   I'm not a programmer, so I'm not sure exactly how to do it.   I think MIRC is capeable of it though, it may be more involved then what I wrote.
Spell Checker!   The PoWeR tOoL
Click Here For Coolness
Self Proclaimed OPU Help desk.