Author Topic: Op:mia/op3 Programming Concepts  (Read 1458 times)

Offline leeor_net

  • Administrator
  • Hero Member
  • *****
  • Posts: 2350
  • OPHD Lead Developer
    • LairWorks Entertainment
Op:mia/op3 Programming Concepts
« on: November 12, 2005, 10:23:40 AM »
Well, I've been pondering something and it turns out that my idea is actually rather complex.

For OP:MIA, I want to use a similar concept that I had come up with for OP3's Input Module Design. Essentially, I want to create a class (we'll call it cInput here) that handles all of the input processing. However, as it is a class and I don't want it to be programmed specifically for OP:MIA/OP3 and I want it to be general. Essentially, I want it to provide an interface in which the host/core program looks at a 'queue' of sorts provided by cInput (let's call it cInput::InputQueue(); which is a function that returns an array of input events). Each input 'event' essentially describes what type of input it is (e.g., KEYBOARD, MOUSE, JOYSTICK, etc.). It then says why is contained in the event (e.g., for an event of type KEYBOARD we will have whether a key has been pressed or released and what particular key it is such as 'A', 'B', '5', 'ENTER' and so on). The core will look at the first item in this list and then determine what to do with it. After it has read the event, it then flags cInput to tell it that it has read the most current 'event' using the function cInput::EventPop(); (the function tells cInput to remove the first event in the list).

cInput looks at all incoming input and then generates a list. Essentially it creates a list of input events each of which follow the last event that occured. As new events occur, they are added to the end of the list in the order in which they are caught. The Core/Host program signals cInput to remove events from the front of the list as they have already been recieved.

The question is, I've noticed that this system is very similar to what Win32 does with its event driven system. It's a LOT faster as a lot of the information associated with input is not present. It just says what's being pressed, what's not, mouse movements etc.

The question is, what's the best listing method for this type of system? Should I use linked lists? An internal array that can only hold so many items? Some sort of memory buffer?

I want to create cInput to be as independant of the rest of the system as possible so that it could be plugged into another program/application/game/whatever. I also want it to be extremely fast so that's why there won't be any sort of translate/dispatch crap and most of the extra information about a keypress/mouseclick/whatever has been simply sheared away. I just want very basic information setup in the list.

Also, I figure that by using a list of some sort, input from the user that may be happening but the system is unable to respond to at that very moment in time (for instance, the user has pressed and released a key before a frame has finished being rendered) will still be in the queue waiting to be processed by the core making the input very responsive (as cInput will probably be running in its own thread and whatnot).

I don't know if I'm clear but feedback would be great.
 

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4954
Op:mia/op3 Programming Concepts
« Reply #1 on: November 13, 2005, 10:58:49 PM »
Quote
The question is, I've noticed that this system is very similar to what Win32 does with its event driven system. It's a LOT faster as a lot of the information associated with input is not present. It just says what's being pressed, what's not, mouse movements etc.
What is a lot faster? And why?


Quote
The question is, what's the best listing method for this type of system? Should I use linked lists? An internal array that can only hold so many items? Some sort of memory buffer?
Be careful with the linked list idea. You'll want to avoid frequent memory allocation and deallocation. This can be a rather slow process. To get around this, you can either allocate memory for a bunch of nodes at once (i.e. allocate an array of nodes) or, have each node hold an array of data elements. In either case, you're essentially holding a pool of allocated memory to store data into as it appears without having to call memory allocation routines for each new input event. You can also avoid call memory deallocation routines every time an input event is poped. Although, I should point out that the STL linked lists may already do this sort of thing by default, to an extent. Just make sure to keep an eye on memory allocation, and if you're keeping a pool of unused nodes, make sure it's quick and easy to maintain it. If you're spending a lot of time keeping track of allocated nodes, you might as well have let the memory system do it for you.

A circular buffer also works nicely. (The array and buffer are kinda the same things here.) This limits you to a fixed number of input events, unlike above, but it guarantees you'll never have to call memory allocation/deallocation routines. You could potentially allocate a larger buffer and copy an old one into it, but this is quite so, so you should avoid this. If you want dynamic sized input queues, you should probably go with a linked list idea. But then, how much do you want to remember? After a certain point unprocessed input gets kinda useless if there is a huge backlog.

Also, try to avoid fragmenting your memory. The pool idea inherent to the above methods will help with this. The circular buffer is especially attractive in this regard. The nice thing about your input class, is it lends itself well to circular lists. It should be fairly easy to keep track of free memory (unused, but allocated) and to cycle it through in such a way that you keep things largely in order. This will also help with caching issues. Try to keep things on the same memory page if possible.

You can of course combine the above ideas. One of the most attractive is a linked list of circular arrays. If the array fills up, you can allocate (or already have around) another node containing a circular buffer. Stay on the current buffer if possible. That is, if the current buffer accepting input isn't full, keep filling to that one. Then just extract from one circular buffer normally. If all goes well, the head will be chasing the tail, and you'll only ever have that one buffer. If the buffer is less than a page size, you should have no reason for using a different page and you'll avoid trashing your cache. (Not that these things move all that quickly, since they do depend on user input after all.) If one queue fills up, you start filling the next (possibly a new one) and once the one you're extracting events from empties, follow it's link to the next circular buffer. If you can catch up with the input, you'll again have the head chasing the tail, but in a new buffer. If all goes well, it'll be at most one cache miss every time you switch buffers. (Although, the rest of the game engine will probably push the input queue out of the cache anyways). Note that you'll want to stay in whatever buffer you're currently working on. There is no benefit in trying to switch back to the original buffer. (Just another cache miss for switching buffers again).


Quote
Also, I figure that by using a list of some sort, input from the user that may be happening but the system is unable to respond to at that very moment in time (for instance, the user has pressed and released a key before a frame has finished being rendered) will still be in the queue waiting to be processed by the core making the input very responsive (as cInput will probably be running in its own thread and whatnot).
I don't think having extra threads will make this any more responsive. Extra threads often just add extra overhead and can lead to slower performance. Threads are not some magical way to speed up a program (although so many sources tout them as such  :angry: ), and since most computers only have a single CPU, you don't normally gain any sort of speed benefit. If two things can be done at the same time, there is potential for lower latency on a quick operation if at least one of the two operations takes a long time. However, if you're only extracting and processing input between frames, there is no point in having the input queue run in a different thread. The same things have to be done, and the effect is felt at the same time. Making it multithreaded would only require locking overhead to make sure you don't destroy your queue by trying to read and write to it at the same time. That overhead is felt whether or not both operations are occuring at the same time.

Also, keep in mind this is all about user experience. If you fill an input queue as soon as the input arrives, the user doesn't know about it. The app won't appear to be responsive. But if you do some sort of processing, even just a little, that gives user feedback, then they'll know something happened. That will make things seem more responsive. So basically, do you have to wait for a certain point before you can process user input? Or can you actually make use of it as soon as it's available (in a way the user sees)?


I'd also like to leave an example of threading, and how it'll actually lower total throughput and increase latency overall. The time it takes for a thread switch is significant, and so it'll decrease throughput. This is especially bad if a thread gets control and has nothing to do but to give up control. But the overall latency increase does not depend on this overhead. For instance, if two operations come in at the same time, and each takes 2 time units, consider the case where they are done back to back, versus the case where one prempts another.

Back to back: A runs for 2 time units and finishes. B runs for 2 time units and finishes.
Here A finishes in 2 time units. B has to wait 2 time units to start, and then takes 2 time units to finish. Thus B takes a total of 4 time units to complete.

Threaded: A runs for 1 time unit, B runs for 1 time unit, A runs for 1 time unit and finishes, B runs for 1 time unit and finishes.
Here A finishes in 3 time units (2 time units executing, and 1 time unit waiting). B finishes in 4 time units (2 time units executing, and 2 time units waiting).

You'll notice that although in both cases they finish as the same time (since we ignore switching overhead), it's takes a longer time on average for each task to complete in the threaded example. (A took longer to complete here because it was waiting while B started).


Now, if event A took 20 time units to do, and event B only took 2 time units, and if event B was something the user requested and can be done while event A is being processed, then it will make the app look more responsive if event B is handled in short order (provided it gives user feedback). This is about the only way threading helps, unless you're certain a machine has multiple CPUs, and can reasonably do two seperate tasks divded among the two.


I hope I'm not rambling on about stuff you already know.  :unsure:  

Offline BlackBox

  • Administrator
  • Hero Member
  • *****
  • Posts: 3093
Op:mia/op3 Programming Concepts
« Reply #2 on: November 16, 2005, 10:53:27 AM »
I was thinking a circular buffer just the same.

A linked list is going to waste too much time in traversing the list. A circular buffer is also nice because you don't have to deal with memory allocations / deallocations during the game loop as you might with a linked list.

Avoid dereferencing as much as you can. For example, it's less efficient to store a circular buffer of pointers to 'event structures', since you're still having to allocate memory for those structures, and then dereference the pointers to get at the structures.

If the structures were allocated in a different memory page you also have the issue of a cache miss slowing down your program further.

A stack could also work, for queuing and dequeing input data. This way, you don't have a lot of memcopies to move things around in a list, you can just work with a pointer to the current element and increment / decrement that pointer.

And yes, like Hooman said, there's no need to check input in a different thread. Since the check is a relatively short, non blocking process (probably querying DirectInput or some other system to see if there are events) it would waste time (also consider the fact that you have to synchronize shared memory writes by using things like critical sections, InterlockedIncrement/Decrement, etc).

And yes, just give the user an indication that their command was stored, do your other processing, then return to the command processing later if you can't carry out the command right at that instant.

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4954
Op:mia/op3 Programming Concepts
« Reply #3 on: November 17, 2005, 10:14:07 PM »
A stack won't really work here. You'd want to process the events in the order they came in. With a stack, you'd reverse the order (unless you never had more than 1 item queued).


With a circular buffer, there is no need to move things around in memory. You only change the indexes/pointers to the first and last item, and do checking against each other to see if the list is empty or full. (As opposed to checking against 0 and the size of the list). Also, if the buffer size (number of elements) is a power of 2, you can use a simple AND with an index to wrap around back to the beginning.

Oh, and to tell the difference between empty and full, circular buffers usually leave one entry blank, and so they end up storing one less item then they have space for. There are ways to make use of this space, but only really by complicating the indexing, which isn't usually worth it. (Especially since people who do this usually allocate 32 bits, for essentially 1 bit of information, and slow down their indexing, while increasing the size of their code, which can push things out of the code cache).


Gah, ever get the feeling there is just too much to consider sometimes?