There are three ways it can be done:
Firstly, by modelling and animating the guns directly onto the player model, having one player model for each weapon. As you guessed, this is how it's usually done in most VWEP mods.
Secondly by having the guns in seperate model files, with matching animation frames to the player, and superimposing one on top of the other by having an entity that follows the player around and matches frames. This only works for a 3rd person view, or an engine that supports DP_SV_NODRAWTOCLIENT or similar, else the player will see the weapons floating around in front of him in 1st person view. This is how Prydon Gate does it.
Thirdly is via MD3 tagging, which is obviously only available for engines that support MD3 models. This is probably the best method if available, because the weapon models don't each need to be animated to match frames, the tagging takes care of it.