aye, sse is not good for dot products (unlike 3dnow), which are pretty common in 3d engines.
supposedly this stuff is meant to be improved with the sse4 instructions.
don't underestimate random transposes.

but yeah, avoid copying data from x87 to sse and back. if you use a little sse on your data, use a lot instead.
and yeah, glsl is prefered for calculating dotproducts for lighting, as well as skeletal transforms and stuff.
.