Update - Phase 3G finished (what we planned vs what we got)Below is
Outpost2.exe only (the main game binary). Subsystem counts are index
rows (Ghidra can emit more than one `.c` row per VA).
Where we started vs where we landed| Metric | Before Phase 3G | After Phase 3G (now) |
| Named (not `FUN_*`) | 1,751 (57.9%) | 1,837 (60.8%) |
| Auto `FUN_*` | 1,271 | 1,185 |
| `unclassified` subsystem rows (index) | ~1,390 | 1,242 |
| New `crt` bucket (runtime tail tagged) | (none) | 46 rows on EXE |
So:
+86 readable names on the EXE,
-86 `FUN_*`, and
~148 fewer `unclassified` rows - not the original stretch goal of 80-90% named, but real progress with everything scripted and repeatable.
Step by step (expectation vs reality)3G-1 - CRT / compiler tail- Done: FID pass on the CRT address tail only, then rename leftover tail `FUN_*` to `CRT::r_<hex>`; subsystem indexer gets a `crt` bucket (`crt_ranges.json`, `apply_fidb.py`, `identify_crt.py`).
- Expected: hundreds of CRT names/classifications.
- Actual: FID barely helps on this VC++5-era EXE; most of the win is **address-based tail + `CRT::r_*`
. 46
EXE rows land in `crt`. DLLs pick up more FID hits in-range.
3G-2 - RTTI / vtables
- Done:
PE-first RTTI parse (`recover_rtti.py`, `rtti_pe.py`), optional Ghidra apply, `rtti-classes.md` + JSON.
- Expected:
100-300 functions renamed from RTTI alone.
- Actual:
Community Tethys
VAs already named most stream/GFX vtable slots. RTTI confirms
hierarchies and only 6** extra `FUN_*` renames on apply.
OP2Shell /
op2ext in the OPU tree:
no MSVC RTTI anchor found.
3G-3 - MSVC demangle (exports)- Done: Vendored demangler, `demangled-symbols.json` for all
509 EXE exports, `apply_demangled.py` in Ghidra.
- Expected: 50-200 renames.
- Actual: 23 Ghidra renames - mostly normalizing decorated leftovers;
509/509 exports demangle in JSON. Most export entry points were already named; skips
115 existing `Tethys__*` labels.
3G-4 - Call-graph subsystem propagation- Done: Parse pseudo-C call edges, iterative neighbor voting, update `subsystem-index.json` (`callgraph-classification.md`, edge JSON).
- Expected: 500-800 index rows moved off `unclassified`.
- Actual: 153 unique VAs newly classified; `unclassified`
1389 -> 1234 rows. Graph is
sparse (only calls visible in decompiler output) and the
70% vote threshold avoids mis-labeling hubs - so yield is much lower than the sketch.
3G-5 - Singleton globals (`DAT_` + raw hex)- Done: Extended `DAT_*` rules in `subsystem_index.py`, full-file scanner `tag_by_globals.py` against `singletons.json`.
- Expected: dozens to ~130 new classifications.
- Actual: Only
2 new `global_xref` VAs; almost everything touching known singletons was already classified.
171 `dat_global` rows after regen; `unclassified` down to
~1228 after 4+5 pipeline.
3G-6 - String xrefs -> names- Done: Ghidra `name_by_strings.py` - defined string data, xrefs into `FUN_*`, scored labels with `_msg` suffix, `--apply` + re-decompile.
- Expected: ~50-100 renames.
- Actual: 34 `FUN_*` renamed on EXE (census
1219 -> 1185 `FUN_*` at that snapshot). Many string refs sit in
already-named functions; strict scoring drops format noise.
3G-7 - Scenario DLL import audit- Done: `harvest_scenario_imports.py` - union every shipping DLL's imports from `Outpost2.exe`, diff vs exports + Tethys + index (PE only, no Ghidra).
- Expected: maybe 10-50 new names from gaps.
- Actual: 0 new Ghidra names -
validation only:
279 unique imports,
0 orphan vs export table,
0 `FUN_*` at those import VAs. Proves mission API surface is already covered.
230 /
509 exports are never imported by any scanned DLL (engine-internal / unused).
Bottom linePhase
3G-1 through 3G-7 are
done. The payoff is a
reproducible pipeline (decompile, index, call graph, globals, string pass, PE cross-check) more than a single headline percentage. There is still a large `FUN_*` and `unclassified` tail on the EXE - what is left is mostly harder or riskier than these automated passes.
Cheers
Jonathan