Example: On the GCC compiler, the -O flag can tell the compiler to optimize either for execution speed, size, or try for both.
When optimizing for execution speed, one thing the compiler may do is what's called loop unrolling. Basically, it does exactly what you think it may do.
Lets say that you're going through an array of length 10. Therefore, you use a for loop.
When you tell the compiler to compile, with not -O flag, the generated code will be some instructions (whatever you want to do to your array), with a branch instruction at the bottom that decides whether or not to branch back up and perform those instructions again, or if it's finished, then to continue.
If you were to tell the compiler to optimize for speed, then the compiler might find that no matter what the conditions are, you're going to go through this loop ten times. Therefore, since speed is most important (you told the compiler this by using the -O3 flag), the compiler will "unroll" the loop. Instead of using a branch statement, it will simply use a brute-force approach. Basically it generates every instruction that would have been executed in the above solution, and strings them together. This can greatly eliminate the number of branch instructions that are executed.
So why does this matter? Branch instructions are BAD. Why are they so bad? Because every time that the processor reaches a branch instruction, it must make a decision. Modern day processors however, have something called a pipeline. Basically, it allows for each part of the processor to be used every clock cycle. It does this by looking ahead in the code and actually load instructions ahead of the instruction currently being executed. However, when the processor comes to a branch, it has to make a decision. Which set of instructions should it start putting in the pipeline? If it guesses correctly, then there's no problem. However, when it guess wrong, it has to flush the pipeline (throwing away precious clock cycles), and start over with the correct path.
Code:
Loop:
for(int a=0; a<10; a++){
array[a]=array[a]*2;
}
Now to unroll the loop:
array[1]=array[1]*2;
array[2]=array[2]*2;
array[3]=array[3]*2;
....
array[9]=array[9]*2;
By unrolling the loop and avoiding having to execute branch instructions, we can greatly speed up the execution of an application. (However, this speedup unfortunately comes at the cost of size.) Wikipedia can probably give you a better description of loop unrolling than I can...it's been a couple semesters lol.
Basically by having a good understanding of how microprocessors operate, and how machine code works, you can gain very useful knowledge that will be a great tool to you, even when programming in very high level languages. By being able to visualize what assembly code may look like for a certain statement or group of statements, you can modify your code so that it runs much more efficiently. Knowing how your code is compiled, and then how it will run on the microprocessor that you're programming is an awesome tool!
Btw, if you want a great book to learn more about microprocessors and assembly language, consider "Computer Organization and Design: The Hardware/Software Interface" by Patterson and Hennessey. I used this book a couple semesters ago (College Junior level course in Computer Architecture), and loved it. Prof was terrible, book was amazing. I would recommend getting the one that goes over ARM instead of MIPS though, since the ARM architecture is so prevalent these days (pretty much every cellphone etc), so it will be relevant. Also, if you search ebay etc for an international edition, you can get if for like $30.