X/Windows Lockups in Ubuntu 10.04 Lucid 5

Posted by JD 08/17/2010 at 17:15

For the last few months, X/Windows has locked up without warning on one of my fastest systems (Core i5). This is very unusual. I’ve run Linux systems for over 17 years and X has never been this bad. Never. About every 3 years X/Windows would lockup, but it has happened at least every 4 days for the last 2 months. Killing the Xorg process doesn’t work. That X process is using 100% of a core for multiple hours. It never recovers. The GUI is locked, but remote access from other systems works as do the background processes. Still, X can’t be killed, only a remote reboot brings the X-GUI back.

I don’t use Gnome or KDE. I’m running LXDE on Ubuntu Server x64 !0.04 LTS. It is patched weekly.

It sure would be nice if Ubuntu had not disabled the cntl-alt-backspace keystroke to kill the X-Server, wouldn’t it?

setxkbmap -option terminate:ctrl_alt_bksp 
fixes that issue.

Possible Causes

I can’t tell exactly what the cause of the problem is. The syslog, messages, and other logs are less than helpful. From those logs, it appears to be an nVidia driver issue. I was using the proprietary nvidia drivers that are built into the ubuntu repositories (universe? I think it was the “196” version). I’ve search the ether for probable causes and came up with these ideas:

  1. nvidia drivers
  2. Firefox / Flash issues
  3. VLC direct video issues
  4. x64 issues – they always blame 64-bit issues, don’t they?
  5. Pulse Audio issues
  6. Software RAID driver issues
  7. VirtualBox graphics issues – I use video editing software inside virtualbox.

The lockup seems to happen only after virtualbox has run. The last time, it was not running, but the 3 kernel modules were loaded and had been used. VLC was being used to watch some recorded TV on a 2nd monitor.

First Corrective Action

The first thing I did was update the nvidia drivers using the latest available directly from nvidia’s website, NVIDIA-Linux-x86_64-256.44.run is the filename. So far, so good. When a nasty error was displayed during the pre-installation checks, I searched for the necessary tricks to make it work and found a guide at ubuntugeek. Followed it. Here’s the source that he used. The short guide included blacklisting a few modules, rebooting, running the installer again. Everything seems to be working post-install. The X/Windows login came up at the end and all the dual monitor settings where still there and working.

So far, so good.

Rather than running VLC, I’ll use mplayer for the next few days before trying VirtualBox or VLC again. System stability is important to me. If I wanted a system that needed to be rebooted every few days, I wouldn’t run Linux.

Some new equipment is scheduled to arrive tomorrow, so I’ll be busy enough over the next few days to leave this specific machine alone.

Cross your fingers for me, please!

Trackbacks

Use the following link to trackback from your own site:
https://blog.jdpfu.com/trackbacks?article_id=763

  1. JD 08/21/2010 at 10:16

    So, it has been 4 days since the last lockup on my desktop machine. I haven’t used VirtualBox at all the entire time. There was a kernel update today, so it had to be rebooted anyway.

    OTOH, it could be that the updated nVidia drivers are more stable too.

    Speaking about stability, the new Xen kernel, 2.6.24-28, updated this week won’t boot on the main Xen server here. Gotta love a boot-time kernel panic on Dom0. Worse, they pushed an in-place update, so the old packages were overwritten. I’m forced to run the 2.6.24-27 kernel from months ago, unless I want to recover from a 6 hr old system backup.

    Lastly, I’ll be at DC404 today, if anyone else is going. See you there.J

  2. JD 08/25/2010 at 14:18

    The Xen Kernel Bug that I had is known and there is a fix being tested. Seems the update came in less than 24 hrs before it was released. Because it was listed as security related, it didn’t get looked at very closely.

    There is a fix and I’ve applied it. So far, so good.

    The issue happened in 2.6.24-28.75 and the fix is in 2.6.24-28.77-pre6. 2.6.24-28.73 was fine. I really wish Ubuntu would include the extra version information in their package releases. The package that I installed which caused the issue was an update WITH THE SAME NAME AS WHAT WAS ALREADY INSTALLED. That meant that the old kernel was removed and I couldn’t easily swap back.

    Looking at the VMs started showed some other issues with the changes.

    # xm list
    Traceback (most recent call last):
    File “/usr/sbin/xm”, line 8, in
    from xen.xm import main
    File “/usr/lib/python2.5/site-packages/xen/xm/main.py”, line 54, in
    from xen.util.acmpolicy import ACM_LABEL_UNLABELED_DISPLAY
    File “/usr/lib/python2.5/site-packages/xen/util/acmpolicy.py”, line 30, in
    import xen.util.xsm.acm.acm as security
    File “/usr/lib/python2.5/site-packages/xen/util/xsm/acm/acm.py”, line 39, in
    security_dir_prefix = XendOptions.instance().get_xend_security_path()
    File “/usr/lib/python2.5/site-packages/xen/xend/XendOptions.py”, line 442, in instance
    inst = XendOptionsFile()
    File “/usr/lib/python2.5/site-packages/xen/xend/XendOptions.py”, line 130, in init
    self.configure()
    File “/usr/lib/python2.5/site-packages/xen/xend/XendOptions.py”, line 145, in configure
    self.loglevel_default))
    File “/usr/lib/python2.5/site-packages/xen/xend/XendLogging.py”, line 137, in init
    logfilename = tempfile.mkstemp(“-xend.log”)1
    File “/usr/lib/python2.5/tempfile.py”, line 295, in mkstemp
    dir = gettempdir()
    File “/usr/lib/python2.5/tempfile.py”, line 262, in gettempdir
    tempdir = _get_default_tempdir()
    File “/usr/lib/python2.5/tempfile.py”, line 209, in _get_default_tempdir
    (“No usable temporary directory found in %s” % dirlist))
    IOError: [Errno 2] No usable temporary directory found in [‘/tmp’, ‘/var/tmp’, ‘/usr/tmp’, ‘/root’]

    The /tmp directory definitely DOES exist as does /var/tmp, so I’m not happy. I’ve already taken production servers down today during work hours, so any troubleshooting will need to wait until early tomorrow AM.

  3. JD 08/27/2010 at 07:06

    The fix for the Xen Kernel issue has been released.
    http://www.ubuntu.com/usn/usn-974-2

    I tested rc6 of the patch and it worked for me, although a few VMs did show disk corruption issues.

  4. JD 08/30/2010 at 16:02

    Looks like nVidia drivers may have been the stability issue too. Read more about the latest beta driver released. I haven’t run VirtualBox on the problem machine in a few weeks. It has been stable.

  5. JD 03/07/2011 at 13:15

    I swapped out the nvidia drivers for the nouveaux drivers hoping for greater stability. It was an unfulfilled dream and the nouveaux drivers allowed a complete system lockup within a few hours of use – google maps was proven to cause the lockup.

    • Bad X/Windows,
    • bad Nouveaux,
    • Bad Google.