Linux Troubleshooting Guide: Fix the Most Common Problems
Understand the Nature of Problem
Before you can solve a problem, you must be aware of what the problem is. Although this sounds ridiculously simple, it is often easier said than done. The key is to understand the nature of a problem in a comprehensive sense rather than abstractly. Fortunately, there are always clues which can help you achieve this level of understanding. These clues are the symptoms of the underlying problem, and they are your main guide. Ultimately, you must look at each problem in a forensic or diagnostic sense like a physician or a detective. When you encounter a problem, try to apply the following questions to it and then answer them:
- What is the specific problem?
- Why am I seeing this problem?
- What has changed in the time between now and when everything last worked perfectly?
- How is this problem manifested?
- What are the symptoms?
- What is the extent of the problem?
- Under what conditions does the problem appear?
You must first identify what the specific problem is. On the surface, this sounds trivial and even self-evident, but there is often more to it than reading an error message. Error messages are often less than ideal at describing the particular problem they are supposed to address, so you must learn to look past them and identify what has really happened.
Ideally, you should try to understand the problem as specifically as possible; instead of saying “Program A doesn't work”, you should think of it as “Program A produces Result B when I try to perform Task C under Condition D”. Once you are able to accurately fill in the variables, you will have a more concrete understanding of the situation than if you try to deal with the problem more abstractly. Googling this concise description of the problem once you have it figured out often yields results that may help you fix the problem or may even offer a solution outright. Try to think of the reasons as to why you are seeing a particular error message, and then form a hypothesis based on that assumption.
When forming your hypothesis, you must take into account any changes that have been made to the system. Quite often, problems arise due to a change in the system rather than for unrelated reasons. As a result, any system changes should play a central role in your hypothesis. In this instance, the system logs would likely hold useful information, so that would be a good place to start looking.
Next, you must understand how the problem is manifested. The best place to start is with the symptoms of the problem. You must ask yourself what the symptoms are trying to tell you; is there something wrong with the hardware, or is it a software problem? How is this problem being shown, and does the situation fit with your assumptions? This is the reason why knowing about your hardware and software is a prerequisite to troubleshooting, since deductive reasoning relies on your ability to discard a hypothesis if it turns out to be unlikely or even impossible when weighed against what you know to be true.
A good hypothesis stands up when weighed against the evidence. It is important to answer all the questions listed above when forming your hypothesis, since you need as much observation data as possible to fine-tune the hypothesis. Failure to do so results in a bad hypothesis. For example, if you believe that a problem must be hardware-based because it happens consistently at a certain time, your hypothesis is far from sound. You may be right in that particular instance, but it is difficult to approach the problem with any certainty since you failed to take other factors into account. To see whether your hypotheses are true, you must be prepared to test them.
Problems must often be addressed in a specific order. For example, you may have to fix something separate yet related to the main problem before you are able to attack the main problem itself. Based on your observations and your prerequisite knowledge of how your computer operates, you must decide what to focus on first and where to go on from there until everything is fixed. Your hypothesis plus your planned repair method becomes your strategy for fixing the problem.
Experiment to solve the problem
Quite often, the key to solving a problem is a process of elimination in which one strategy after another is tested and then discarded or revised until the correct solution is found. Keep in mind that there may be more than one way to fix a problem, and the least risky methods should be attempted first. It is possible that testing may reveal more clues that were not present before, and you should definitely take these into account and revise your hypotheses and strategies accordingly.
Another thing you must consider is whether you have seen this problem before in another situation. No two separate problems are exactly alike, but comparing the present situation to a past experience may grant you additional insight as to what is going on. Try to focus on similarities and differences between this situation and previous ones; the more factors you are able to correlate or eliminate, the better. After doing that, you should try to remember what you did to solve the previous problem(s) and then try to find out if the same solution can be applicable to your current problem. If this tactic is successful, you could be well on your way to solving the problem. If not, you have uncovered more clues that you can use to help understand the situation further. For example, if a previous solution will not work on the current problem even though it may be similar, try to understand why. Ask the following questions and try to answer them:
1. How does this situation differ from the previous one?
2. How is this situation similar to the previous one?
3. What happens when the previous solution is applied to this problem?
4. What does the answer to question #3 tell me?
At this stage, it is critical that you document each repair attempt before you do it. Take note of any change you make, and make a note of the previous setting. If you change a registry value or edit a configuration file, you should make a note of how it was before. Many configuration files allow lines to be commented out; it is generally a good idea to use the commenting system to make notes of how the file used to be whenever you change a setting. If you replace files, make a copy of the original file before you replace it. That way, if your repair strategy backfires, you will be able to go back to the previous configuration and try something else without having made the problem irrevocably worse.
To be effective, you should also document the results of each experiment so you will be able to go back and remember what worked and what didn't. Your experimentation attempts will allow you to pinpoint or eliminate specific factors as the cause of the underlying problem. This works on the principle that if you eliminate all the wrong possibilities, you will eventually arrive at the correct solution.
Know When to Stop
Although you can solve many computer problems through deductive reasoning, there are still limits to what it can accomplish. Some computer problems are beyond any degree of troubleshooting, since it is often impossible to know whether or not the problem has been completely fixed. In such cases, your only real strategy is to recover what data you can and start over with a clean operating system installation.
You have to know when to stop working on a problem or your attempts will become counter-productive. You can spend hours on a single problem, but if you make no real progress, you are just wasting your time. Also, you must know if your attempts are effective. If your repair attempts are causing more problems, it is time to stop since you will never be able to fix all of them.