Well, I've been taking a look at the tile groups found at the end of .map files again recently. When it was first discovered what that data represented, it seemed very useful for people making new maps. But even with the use of those tile groups, it's still very laborious to make new maps. I took a second look at them recently to see how useful they might be for more automated tools for map generation. The main complaint here being that the tilesets and tile groups don't really have any meta data associated with them that's useful for an automated tool. Such as, what are all the tiles of a given terrain type? What tileset transitions are appropriate where? Some findings were good, others not so much.
First of all, every tile in all the tilesets is found in some tile group. Yes, there are no unused tiles in the tilesets (with respect to the tile groups). (Even though there are some tiles that are never used in any of the maps).
Secondly, aside from a few tiles in the lava tileset, no tile appears more than once. That is, every tile (outside of the lava tileset) appears exactly once, in exactly one tile group.
Another observation is that only tiles in the lava tileset are animated.
Ok, so what does this mean? Well, if we ignore the lava tileset for the moment, then things get simpler. No animated tiles, and every tile appears exactly once in all the tile groups. Also, since no map uses the tiles in the lava tileset (at the initial startup stored in the map file), it's quite reasonable for someone making a map to ignore this tileset. (Yes, you COULD place lava on a map. But that doesn't usually get placed until after a volcano has erupted after the start of a level.) So, I'll just ignore the oddities of the lava tileset for the rest of this.
That means the tile groups form a nice partition of all available tiles into groups of related tiles. This is useful for automated generation, since you can know what the exact use of each tile is, (that is, what group it belongs to), and place it where it's best suited. Granted, some more work could be done grouping tile groups into groups. Some groups represent small sections of ridges. These ridge groups might be combined into a collection of all ridges for a certain terrain type. Then when ridges need to be placed on a certain terrain type, the tool would know where to look. You might even go farther, and order the groups according to the directionality of the ridges.
Terrain transitions are also grouped nicely. So when trying to get edges between say, rock and mud, or mud and sand, you have only to look in the group that contains all tiles for that purpose. Each transition type has an associated group of 48 tiles. They appear to be mostly in a set order. There are a few oddball entries where the directionality among the different transition groups doesn't seem to match up between corresponding tiles. That's a big pain in the butt for an automated tool. But maybe it's close enough that you wouldn't notice it too much. Pretty much all the built in maps have areas with rough edges that don't quite look like they match up right. I guess you can at least hope to do no worse.
Another problem with the terrain transitions is that there isn't an obvious ordering to them. They appear largely consistent across the different transition groups, but it's not obvious how to automatically select the right tile from any group given a specific transition situation.
There are various "doodads" that can be placed on the map. (I'll use the SC editor terminology here). Some of them are usually marked as impassible in the built in maps, while other are usually marked as passable. This type of information is not present anywhere in the tilesets or tile groups. But given a tile group, it generally wouldn't be too hard for a person looking at it to guess and assign a value. If this data were saved, it could be used in an automated tool.
Cliffs also have a low side and a high side associated with them. That data isn't present anywhere in the tilesets or tile groups, and is needed to place tile correctly and have them function as expected in the game. This is similar to the above passability issue though, since it's the same data field that controls it.
An automated scan of the built in map files could help reassociate this data, but when attempted a while back, it was found that any given tile could have a number of celltypes (passability, cliff high side/low side, etc.) associated with it. Combine that with the fact that not every tile was used in the original maps means that not every tile will have an associated celltype from such a scan. Thus we have the problem of tiles have 0, or more than 1 celltype associated with it. How would an automated tool select the right celltype? And would you trust it if it did?
I guess that leads me to think we should try to regenerate this data by hand so that each tile has exactly 1 celltype associated with it. That would make it much easier for a tool to properly handle celltypes.
Another issues is that some tile groups are meant to be placed in one big lump together, while other tile groups are meant to have a single tile from them placed. This is the different between placing a large volcanoe, or a large crater, or a large boulder, as opposed to placing a small crater (there are a bunch of single tile, small craters, all placed in the same group), or a terrain transition tile (all transition tiles between a given set of terrain types are in the same group). To be really useful, these two types of tile groups will need to be sorted out.
There is also the issue of overlapping tile groups. That is, you can place a tile group down on a map, and then place another tile group on the map that partially overlaps the first one. It might still look good, or it might not. Most of the ridges that appear in the game actually fall into the case of overlapping tile groups. It allows for much more varied ridge configurations, and branching than non overlapping tile groups would allow. This means that a tool to automatically place ridges is either going to be limited, or likely produce very ugly results. Definately some more work will be needed to auto place nice ridges.
The above discussion on ridge placement is also ignoring directionality. Even if two ridge tile groups don't overlap, they might not look good palced next to each other. More meta data is needed to specify where adjacent ridge tile groups can/need to be placed. At least the directionality should be obvious from where placement is needed. (I hope).
I also did some analysis of builtin maps. I wanted to see how well they could be decomposed into tile groups. I started by identifying all groups that appear exactly the same in a map. Then I would remove them and replace those sections with the blue tile (tile 0). Then I repeated the tile group finding algorithm, where a blue tile matched as a don't care condition. A helpful speed up to this matching process was that each tile appeared in exactly one tile group in exactly one place. So given a non blue tile, I could lookup in an array what tile group it belonged to, and at which (x,y) offset within that group, and then do a compare with that tile group on the surrounding tiles (at an offset so the given tile matched up with where it should be in the group). If a match was found, using the blue tile as a don't care condition, then I replaced the non blue tiles in that match with blue tiles. (Had to ignore the dummy tile groups containing only blue tiles, as they were becomming a problem). I also allowed single tiles in specific groups to always match. So terrain transitions would always match on single tiles, since you've never find the entire group of transitions all at once. I did this for a few other groups where it seemed obvious, or possibly likely. I basically iterated this procedure a few times and took a look at how much of the map would go away. I did this for a few maps to see what the results would be.
One thing I noticed is that ridges almost always remained. They never appeared to be a simple overlap case. That is, they never appeared to be a case where a finite number of tile groups could be overlapped to form the graphics for the ridges. In other words they did a lot of partial mixing where part of one tile group was placed next to part of another tile group, rather than an entire tile group being placed next to, or partly over another tile group. This seems to be more bad news for ridge placement.
The other thing that didn't tend to go away, was vegetation. Often the vegetation didn't seem right and was graphically ugly, but not always. It seems vegatation was placed in a haphazard way, somewhat similar to ridges. There was also the ugly boulder overlaps that you can usually tell just weren't done right and are graphically ugly.
Another thing I noticed, was there were a significant number of groups that matched for the base terrain types, but unless you allowed single tiles in these groups to match, most of the terrain didn't go away. I haven't checked into the probability of the entire groups matching given terrain that was placed randomly one tile at a time though. Maybe those matches are just a case of dumb luck. Although, from my initial observations, I'm led to believe they occur much too frequently, and terrain was placed (at least some of the time) using the tile groups to place big blocks all at once. (Often overlapping blocks). However, like I said earlier, much of the terrain didn't go away without individual tile matching, so there was obviously a lot of single tile placement, or partial tile group placement.
Hopefully someone else can use some of these findings to further development of automated tools for map making.