Troubleshooting has always been one of the most frustrating aspects of computer ownership. Due to the practically infinite number of potential problems, it would be utterly impossible to write a how-to guide to fix all of them, but in this article we are going to address some of the most common problems and then present more generalized guidelines that will help you troubleshoot your own problems in an emergency.
One of the most common problems in Linux is a broken GUI configuration. The X Windowing System (Xorg) is the most common GUI system in use on Linux systems today. Unlike the Windows GUI, (which is practically inextricable from the rest of the operating system) Xorg is simply a program that runs on top of the base Linux system. Because of this, it can be easily repaired.
The X Windowing System uses a file called xorg.conf to maintain the GUI configuration. It contains information about your graphics hardware, the driver it uses, (in the case of NVIDIA or ATI devices) your available screen resolutions, and even settings for your input devices. The best way to avoid any problem is prevention, so you should always have at least one backup copy of important files like xorg.conf and be sure to save a copy of the current working version each you make any modifications to it.
Don't panic if the worst happens and you find yourself without a working xorg.conf. The base system is probably still operational and it is possible to carry on without the GUI, although some distros might complain a bit if Xorg refuses to start. Many modern distros (like Mandriva, Ubuntu, and others) have a safe mode or recovery mode that provides a root-level command prompt. From there, you have access to the bash-friendly utilities and tools that are essential to fix your system. Even better than that, your distro probably has a special tool which can repair or completely regenerate xorg.conf.
During our test where we deliberately deleted xorg.conf, Ubuntu 9.04's GUI continued to work, much to our amazement. (we rebooted several times to make sure it was not a glitch) If that is not the case for you, Ubuntu's Recovery Mode has an automated tool called xfix that will try to rebuild xorg.conf without any additional interaction. Once xfix has worked, you will have the option of resuming normal boot. This tool worked perfectly for us on the first try by replacing the xorg.conf file that we erased from our test machine.
For Mandriva, boot into “Safe mode” and run “Xfdrake”.This tool will walk you through the process of rebuilding xorg.conf in a step-by-step interface where you are able to select configuration options from a list. With some knowledge of your hardware and a little luck, this should get you running again.
Many Linux users have dual-boot systems, and this works very well for the most part. Since the Windows bootloader is not very friendly to non-Microsoft operating systems, most distros use a different bootloader called GRUB (Grand Unified Bootloader) that is capable of working with multiple operating systems, including Windows.
Since Windows installations tend to degrade over time, (a phenomenon known as “Windows Rot”) it is inevitable that you must reinstall Windows at some point. If you are using GRUB or a different bootloader, your other operating systems will be inaccessible (but still intact) after a Windows reinstall since the Windows setup process will replace the bootloader without asking you. Although it is possible to reinstall GRUB manually (often through a Live CD distro) there is a tool called Super GRUB that can make the process much easier.
Super GRUB is an extension of the regular GRUB bootloader. It comes as an ISO image which may be burned to a CD or placed on a USB stick. (an ideal use for any old low-capacity USB flash drives you may still have) Although it is absolutely tiny by today's standards (weighing in at a little over 4 MB) Super GRUB works like a miniature operating system with several pre-defined tools that can handle many boot-related processes. Super GRUB can automatically find your operating system partitions and use this data to reinstall and configure conventional GRUB. Super GRUB can even even do the exact opposite: remove conventional GRUB and restore the normal Windows bootloader.
To use Super GRUB, download and prepare the ISO image for the media format you want to use. Once you have done that, reboot the computer with the Super GRUB disc or USB stick. You will be greeted with a conventional GRUB boot menu with the Super GRUB option on it. You will soon see the Super GRUB main menu. To attempt automated repair, run the “GRUB => MBR & !Linux! (1) Auto ;-) “ option. If that fails, you have the option of repairing GRUB manually. Our only criticism of Super GRUB is that its menus can be rather hard to understand, but it works very well in spite of that.
If you're lucky, your wireless device will work out of the box on Linux. Some devices from certain manufacturers (like Atheros) are very Linux-friendly. Unfortunately, some of these components are used in products under different brand names, so it can be difficult to know what parts any given device has in it without doing extensive research. Even if your wireless device is not one of the most Linux-friendly out there, it may be possible to get it working.
Some devices require additional firmware to work. The firmware is what makes the wireless device work at all; without it, a device is little more than a series of interconnected parts with no real guidelines that allow it to function. In the old days, firmware was usually built into a device; these days, it is more common to see software-driven firmware. One of the most notable examples of this is Broadcom, which at the time of this writing has made no real efforts to accommodate Linux users and does not seem to be interested in doing so. Broadcom devices are common in many notebook models from a variety of manufacturers. Fortunately, the B43 Project has reverse-engineered a method to extract the necessary firmware as “binary blobs” from the Windows drivers and then use this firmware to activate the wireless device in Linux.
Once the firmware is in place, the device is able to interact with any operating system that has the right drivers. The Linux kernel has built-in support for Broadcom, but this is useless without the firmware. Ubuntu features restricted drivers that contain the necessary firmware while other distro users are able to obtain the firmware themselves with the “firmware-cutter” tool. Firmware should always be placed in /lib/firmware. A reboot is often required to activate it. If you have a Broadcom card, you should try this method first.
Sometimes, the only real option is to run the Windows driver under Linux with a tool called NDISwrapper. This is essentially a kludge that implements some parts of the Windows environment just enough for Windows wireless drivers to function. It is possible to work with NDISwrapper through the command line interface or through a GTK-based frontend. Some distros that provide centralized control centers have built-in tools to use NDISwrapper. (For Ubuntu, install the “ndisgtk” package.) In any case, NDISwrapper will need the .inf file along with the actual driver for any device you want to enable. Keep in mind that NDISwrapper requires drivers that match the architecture of the host system; if the Linux host system is 64 bit, the Windows drivers you use must also be 64-bit.
The ability to solve a problem through deductive reasoning is a valuable skill since it can be applied to any operating system (not just Linux) in virtually any situation, provided that enough information is available to figure out the problem. This part of our guide uses the standard scientific method to diagnose and repair computer problems: formation of a hypothesis, experimentation, and observation of the end result. Our goal with this technique is to help you learn how to think both creatively and logically when it comes to solving problems, since not every problem out there has a step-by-step procedure to help you solve it.
There are several qualifications that you must have before being able to troubleshoot effectively, which is why we recommend this technique for more advanced users. Unless you meet these prerequisites to some extent, you are probably not going to get very far in your efforts.
First, you must have the patience and willingness to fix the problem. Although your time is better spent doing other things, it is beneficial to learn how to fix your own problems since it makes you more self-sufficient in addition to saving money. Furthermore, you can't always assume that someone will be there to help you with your computer problems in an emergency.
Next, you should have a fairly good understanding of your computer's software and hardware. Although you don't have to understand everything there is to know about how it all fits together, you should at least be able to identify hardware like graphics cards, RAM, hard drives, etc. and be familiar with the inner workings of your operating system. For Linux, you should know your way around the directory structure and the terminal in addition to the most important configuration/log files. For Windows, you should know the directory structure and have a nominal to deep understanding of device drivers and the registry.
Last, you should be willing to experiment. Many new users are afraid to enter unfamiliar territory because they think they will only end up making the situation worse. It is important that you get over that fear if you have it; any problems you are experiencing will only persist or get worse if you ignore them. If something is already broken, your repair efforts are not likely to make it worse than it already is if the proper measures (like recording any changes you make) are taken.
Before you can solve a problem, you must be aware of what the problem is. Although this sounds ridiculously simple, it is often easier said than done. The key is to understand the nature of a problem in a comprehensive sense rather than abstractly. Fortunately, there are always clues which can help you achieve this level of understanding. These clues are the symptoms of the underlying problem, and they are your main guide. Ultimately, you must look at each problem in a forensic or diagnostic sense like a physician or a detective. When you encounter a problem, try to apply the following questions to it and then answer them:
You must first identify what the specific problem is. On the surface, this sounds trivial and even self-evident, but there is often more to it than reading an error message. Error messages are often less than ideal at describing the particular problem they are supposed to address, so you must learn to look past them and identify what has really happened.
Ideally, you should try to understand the problem as specifically as possible; instead of saying “Program A doesn't work”, you should think of it as “Program A produces Result B when I try to perform Task C under Condition D”. Once you are able to accurately fill in the variables, you will have a more concrete understanding of the situation than if you try to deal with the problem more abstractly. Googling this concise description of the problem once you have it figured out often yields results that may help you fix the problem or may even offer a solution outright. Try to think of the reasons as to why you are seeing a particular error message, and then form a hypothesis based on that assumption.
When forming your hypothesis, you must take into account any changes that have been made to the system. Quite often, problems arise due to a change in the system rather than for unrelated reasons. As a result, any system changes should play a central role in your hypothesis. In this instance, the system logs would likely hold useful information, so that would be a good place to start looking.
Next, you must understand how the problem is manifested. The best place to start is with the symptoms of the problem. You must ask yourself what the symptoms are trying to tell you; is there something wrong with the hardware, or is it a software problem? How is this problem being shown, and does the situation fit with your assumptions? This is the reason why knowing about your hardware and software is a prerequisite to troubleshooting, since deductive reasoning relies on your ability to discard a hypothesis if it turns out to be unlikely or even impossible when weighed against what you know to be true.
A good hypothesis stands up when weighed against the evidence. It is important to answer all the questions listed above when forming your hypothesis, since you need as much observation data as possible to fine-tune the hypothesis. Failure to do so results in a bad hypothesis. For example, if you believe that a problem must be hardware-based because it happens consistently at a certain time, your hypothesis is far from sound. You may be right in that particular instance, but it is difficult to approach the problem with any certainty since you failed to take other factors into account. To see whether your hypotheses are true, you must be prepared to test them.
Problems must often be addressed in a specific order. For example, you may have to fix something separate yet related to the main problem before you are able to attack the main problem itself. Based on your observations and your prerequisite knowledge of how your computer operates, you must decide what to focus on first and where to go on from there until everything is fixed. Your hypothesis plus your planned repair method becomes your strategy for fixing the problem.
Quite often, the key to solving a problem is a process of elimination in which one strategy after another is tested and then discarded or revised until the correct solution is found. Keep in mind that there may be more than one way to fix a problem, and the least risky methods should be attempted first. It is possible that testing may reveal more clues that were not present before, and you should definitely take these into account and revise your hypotheses and strategies accordingly.
Another thing you must consider is whether you have seen this problem before in another situation. No two separate problems are exactly alike, but comparing the present situation to a past experience may grant you additional insight as to what is going on. Try to focus on similarities and differences between this situation and previous ones; the more factors you are able to correlate or eliminate, the better. After doing that, you should try to remember what you did to solve the previous problem(s) and then try to find out if the same solution can be applicable to your current problem. If this tactic is successful, you could be well on your way to solving the problem. If not, you have uncovered more clues that you can use to help understand the situation further. For example, if a previous solution will not work on the current problem even though it may be similar, try to understand why. Ask the following questions and try to answer them:
1. How does this situation differ from the previous one?
2. How is this situation similar to the previous one?
3. What happens when the previous solution is applied to this problem?
4. What does the answer to question #3 tell me?
At this stage, it is critical that you document each repair attempt before you do it. Take note of any change you make, and make a note of the previous setting. If you change a registry value or edit a configuration file, you should make a note of how it was before. Many configuration files allow lines to be commented out; it is generally a good idea to use the commenting system to make notes of how the file used to be whenever you change a setting. If you replace files, make a copy of the original file before you replace it. That way, if your repair strategy backfires, you will be able to go back to the previous configuration and try something else without having made the problem irrevocably worse.
To be effective, you should also document the results of each experiment so you will be able to go back and remember what worked and what didn't. Your experimentation attempts will allow you to pinpoint or eliminate specific factors as the cause of the underlying problem. This works on the principle that if you eliminate all the wrong possibilities, you will eventually arrive at the correct solution.
Although you can solve many computer problems through deductive reasoning, there are still limits to what it can accomplish. Some computer problems are beyond any degree of troubleshooting, since it is often impossible to know whether or not the problem has been completely fixed. In such cases, your only real strategy is to recover what data you can and start over with a clean operating system installation.
You have to know when to stop working on a problem or your attempts will become counter-productive. You can spend hours on a single problem, but if you make no real progress, you are just wasting your time. Also, you must know if your attempts are effective. If your repair attempts are causing more problems, it is time to stop since you will never be able to fix all of them.