This forum post from an ex-NVIDIA driver engineer is full of goodies. Let me cherry pick some good ones for you:
"The driver is gigantic. Think 1-2 million lines of code dealing with the hardware abstraction layers, plus another million per API supported."
"...within the existing drivers and APIs it is impossible to get more than trivial gains out of any application side multithreading. If Futuremark can only get 5% in a trivial test case, the rest of us have no chance."
"Why are games broken? Because the APIs are complex, and validation varies from decent (D3D 11) to poor (D3D 9) to catastrophic (OpenGL)."
"Threading is just a catastrophe..."