Log in

No account? Create an account


My server, which runs Planet Lisp, xach.com, etc, has been flaking out with increasing frequency lately.

Here's the setup:

  • Relion 1XT 1U Pentium 4 3.0Ghz from Penguin Computing
  • 2GB ECC memory
  • Two 80GB SATA drives in md-raid 1 mounted on /
  • Fedora Core 4, kernel 2.6.13-1.1532_FC4

It's gotten to the point where it is locking up every few days. I can't even compile a kernel; it either segfaults or I get this in random include files:

error: static or type qualifiers in non-parameter array declarator

Everything screams "hardware problem". I'm extremely bummed about it. I didn't save any of the material to ship the unit back, if that proves necessary, and I don't really want to have weeks of downtime waiting for some resolution. Anyone have any comments or suggestions?

UPDATE The server will be going down for overnight maintenance today. Planet Lisp should be back up sometime on Thursday.


I would suspect RAM. How much free RAM are you running on the system? Perhaps you have bad memory in a high range like at 1.5GB, and it isn't noticable until you try to compile a kernel.

If not RAM, it may be bad L2 cache.

memtest86 is really the right way to test this.

Is the system crashing with a kernel panic and any stack trace, or just goes dead?
Kernel panic and stacktrace the last few times.
got any details? If it is not bad ram (and I think it will be) then you may have some sort of odd-ball kernel bug to track down. I've had production machines that ran beautifuly for a year and then got past a certain workload threshold and had kernel panics due to SCSI/RAID controller driver bugs. I have also seen kernel panics (not in recent years) with multiprocessor machines with certain drivers that were not properly spinlocked, that only showed up under the right set of conditions (but you don't have MP machine).

sometimes the only way to really catch things like this is to redirect console to a serial port, and setup a laptop with minicom logging the serial port output. and wait.