The question is, I've noticed that this system is very similar to what Win32 does with its event driven system. It's a LOT faster as a lot of the information associated with input is not present. It just says what's being pressed, what's not, mouse movements etc.
What is a lot faster? And why?
The question is, what's the best listing method for this type of system? Should I use linked lists? An internal array that can only hold so many items? Some sort of memory buffer?
Be careful with the linked list idea. You'll want to avoid frequent memory allocation and deallocation. This can be a rather slow process. To get around this, you can either allocate memory for a bunch of nodes at once (i.e. allocate an array of nodes) or, have each node hold an array of data elements. In either case, you're essentially holding a pool of allocated memory to store data into as it appears without having to call memory allocation routines for each new input event. You can also avoid call memory deallocation routines every time an input event is poped. Although, I should point out that the STL linked lists may already do this sort of thing by default, to an extent. Just make sure to keep an eye on memory allocation, and if you're keeping a pool of unused nodes, make sure it's quick and easy to maintain it. If you're spending a lot of time keeping track of allocated nodes, you might as well have let the memory system do it for you.
A circular buffer also works nicely. (The array and buffer are kinda the same things here.) This limits you to a fixed number of input events, unlike above, but it guarantees you'll never have to call memory allocation/deallocation routines. You could potentially allocate a larger buffer and copy an old one into it, but this is quite so, so you should avoid this. If you want dynamic sized input queues, you should probably go with a linked list idea. But then, how much do you want to remember? After a certain point unprocessed input gets kinda useless if there is a huge backlog.
Also, try to avoid fragmenting your memory. The pool idea inherent to the above methods will help with this. The circular buffer is especially attractive in this regard. The nice thing about your input class, is it lends itself well to circular lists. It should be fairly easy to keep track of free memory (unused, but allocated) and to cycle it through in such a way that you keep things largely in order. This will also help with caching issues. Try to keep things on the same memory page if possible.
You can of course combine the above ideas. One of the most attractive is a linked list of circular arrays. If the array fills up, you can allocate (or already have around) another node containing a circular buffer. Stay on the current buffer if possible. That is, if the current buffer accepting input isn't full, keep filling to that one. Then just extract from one circular buffer normally. If all goes well, the head will be chasing the tail, and you'll only ever have that one buffer. If the buffer is less than a page size, you should have no reason for using a different page and you'll avoid trashing your cache. (Not that these things move all that quickly, since they do depend on user input after all.) If one queue fills up, you start filling the next (possibly a new one) and once the one you're extracting events from empties, follow it's link to the next circular buffer. If you can catch up with the input, you'll again have the head chasing the tail, but in a new buffer. If all goes well, it'll be at most one cache miss every time you switch buffers. (Although, the rest of the game engine will probably push the input queue out of the cache anyways). Note that you'll want to stay in whatever buffer you're currently working on. There is no benefit in trying to switch back to the original buffer. (Just another cache miss for switching buffers again).
Also, I figure that by using a list of some sort, input from the user that may be happening but the system is unable to respond to at that very moment in time (for instance, the user has pressed and released a key before a frame has finished being rendered) will still be in the queue waiting to be processed by the core making the input very responsive (as cInput will probably be running in its own thread and whatnot).
I don't think having extra threads will make this any more responsive. Extra threads often just add extra overhead and can lead to slower performance. Threads are not some magical way to speed up a program (although so many sources tout them as such :angry: ), and since most computers only have a single CPU, you don't normally gain any sort of speed benefit. If two things can be done at the same time, there is potential for lower latency on a quick operation if at least one of the two operations takes a long time. However, if you're only extracting and processing input between frames, there is no point in having the input queue run in a different thread. The same things have to be done, and the effect is felt at the same time. Making it multithreaded would only require locking overhead to make sure you don't destroy your queue by trying to read and write to it at the same time. That overhead is felt whether or not both operations are occuring at the same time.
Also, keep in mind this is all about user experience. If you fill an input queue as soon as the input arrives, the user doesn't know about it. The app won't appear to be responsive. But if you do some sort of processing, even just a little, that gives user feedback, then they'll know something happened. That will make things seem more responsive. So basically, do you have to wait for a certain point before you can process user input? Or can you actually make use of it as soon as it's available (in a way the user sees)?
I'd also like to leave an example of threading, and how it'll actually lower total throughput and increase latency overall. The time it takes for a thread switch is significant, and so it'll decrease throughput. This is especially bad if a thread gets control and has nothing to do but to give up control. But the overall latency increase does not depend on this overhead. For instance, if two operations come in at the same time, and each takes 2 time units, consider the case where they are done back to back, versus the case where one prempts another.
Back to back: A runs for 2 time units and finishes. B runs for 2 time units and finishes.
Here A finishes in 2 time units. B has to wait 2 time units to start, and then takes 2 time units to finish. Thus B takes a total of 4 time units to complete.
Threaded: A runs for 1 time unit, B runs for 1 time unit, A runs for 1 time unit and finishes, B runs for 1 time unit and finishes.
Here A finishes in 3 time units (2 time units executing, and 1 time unit waiting). B finishes in 4 time units (2 time units executing, and 2 time units waiting).
You'll notice that although in both cases they finish as the same time (since we ignore switching overhead), it's takes a longer time on average for each task to complete in the threaded example. (A took longer to complete here because it was waiting while B started).
Now, if event A took 20 time units to do, and event B only took 2 time units, and if event B was something the user requested and can be done while event A is being processed, then it will make the app look more responsive if event B is handled in short order (provided it gives user feedback). This is about the only way threading helps, unless you're certain a machine has multiple CPUs, and can reasonably do two seperate tasks divded among the two.
I hope I'm not rambling on about stuff you already know. :unsure: