instability

My server, which runs Planet Lisp, xach.com, etc, has been flaking out with increasing frequency lately.

Here's the setup:

  • Relion 1XT 1U Pentium 4 3.0Ghz from Penguin Computing
  • 2GB ECC memory
  • Two 80GB SATA drives in md-raid 1 mounted on /
  • Fedora Core 4, kernel 2.6.13-1.1532_FC4

It's gotten to the point where it is locking up every few days. I can't even compile a kernel; it either segfaults or I get this in random include files:

error: static or type qualifiers in non-parameter array declarator

Everything screams "hardware problem". I'm extremely bummed about it. I didn't save any of the material to ship the unit back, if that proves necessary, and I don't really want to have weeks of downtime waiting for some resolution. Anyone have any comments or suggestions?

UPDATE The server will be going down for overnight maintenance today. Planet Lisp should be back up sometime on Thursday.

Comments

Personally, I'd start with running a memory test (my usual tool is memtest86, written out as a boot sector on a CD, but it works just fine on a floppy). I'd suggest running the memory test without any OS running on the machine. Obviously, this means there will be some downtime. Switch it tio exhaustive testing, I think it's set for a slightly less complete array of tests by defaulty. It'll most probably take a while to complete, though.

If that's fine, I'd start suspecting the disk controller. If the CPU was fried, you'll most probably have a crash trying to run memtest86.

September 2014

S M T W T F S
 123456
78910111213
14151617181920
21222324252627
282930    
Powered by LiveJournal.com