I think the reason the speed is better in 16 bit is because they don't have to use palettes to show all the colors at once. The graphics are 8 bit but they use palettes to show more colors on screen at once.
The reason for using DirectDraw in 8, 24, and 32 bit environments is because the palette manipulation with DirectX is far faster.
Also the reason I made this is because in "hacking" op2, I have to be able to switch between several programs at once, quickly. (like OP2, debugger, disassembler, etc)