a modern malloc implementation will generally have an alignment of 16 bytes as this is the alignment required for the largest register size (sse registers). aligning to 4 bytes would be slow for such registers (although if you use the faster instructions it'll outright crash).
misaligned access is slower, as for any writes, the cpu must read the two blocks to each side, replace the central bytes, and write the whole thing back. on arm, you'll get an exception and the kernel will emulate this really really slowly (or just outright crash).
if you're using random access, like linked lists, you would generally want to ensure that all your data fits within a single cache line.
large global objects are bad in terms of code design. they wreck modularity and if you use them for passing data around then you're really evil. but in terms of memory use, they're not that bad.
Remember that in any system with virtual memory (ie: anything running windows or non-crippled linux) the system will only allocate memory when those pages are actually poked. if you do not poke them, they consume no physical memory even if they consume 100mb of your address space.
the actual problems only come when you no longer need that data (switching from some really huge map to a really tiny one), but for games the working set's maximum capacity is the only real concern.
I repeat though... AVOID USING GLOBALS! THEY'RE EVIL.

progressively reallocing larger and larger blocks of memory has its own issues. malloc+free are slow, as is the memcpy required to move all the data into it.
if you have such allocations coupled with smaller allocations, you can find that you run out of chunks large enough, as each free block has some small chunk of data sitting in the middle which cannot be moved (java has the upper hand here!).
Thus for both reasons, you should always over-allocate and consume the extra as needed, to avoid excessive reallocs.
.