Hm. I've spent a fair amount of time adding threading to Quetoo (previously Quake2World). Let me offer a couple tips based on what I've found:
1) If you're already using libSDL, then you should use SDL_Thread. It provides portable thread creation, non-busy waiting, mutexes, conditions, etc. Very easy to use, and works across the board. Win.
2) BSP recursion (cmodel.c) can be made thread-safe by adding just a few thread-local variables (__thread keyword for GCC and Clang). Basically, there's a global trace state structure that is common to the various tracing functions that are called throughout the recursion. By making that state local to each thread, you're then free to run multiple threads through CM_BoxTrace in parallel. Be careful if your engine has any optimizations that cache things like "current trace number" or "check count" on BSP nodes or leafs, as these will have to be moved into that shared thread-local state bucket in order to be made safe. But by and large, making cmodel.c thread-safe isn't too hard.
3) Parallelizing the Quake engine to any measurable benefit is actually really difficult. This is why tech3 and tech4 use only a pair of threads, and basically run the game in one thread, and draw everything in another. If you parallelize at a more fine-grained level, any benefit is lost on the overhead of coordinating the threads. So the right way to do, as far as I can tell, is to encapsulate an entire renderable scene into a struct, and flip-flop between two instances of that struct at each frame (think of how GL's front and back buffers work). This way, you run a frame of game logic in one thread, populating the scene, and then render the result of that frame asynchronously in the other. Note that doing this safely will require that you literally copy objects into the scene struct. Holding pointers to things that the main thread will modify would be a recipe for crashes

Quetoo doesn't quite work this way as of yet. Instead, I've tried to parallelize scene population itself. So, for example, all particle thinking and entity culling happens in one thread while the main thread recurses the BSP and prepares it for rendering. So far, I've only had minuscule and inconsistent gains using this method. Which is what leads me to believe that 3) above is the way to go.