Author Topic: Want Outpost 2 on my Mac - learning the hard way with AI  (Read 216 times)

Offline jonathangoorin

  • Newbie
  • *
  • Posts: 5
Want Outpost 2 on my Mac - learning the hard way with AI
« on: April 02, 2026, 04:11:48 PM »
Hey -

I want Outpost 2 to run on my Mac, properly, someday. I don't have a background in decompilers or this codebase - I'm flying blind. What I'm doing is throwing AI agents at the problem: peel enough understanding out of Outpost2.exe and the data files that I can turn around and tell those same agents how to rebuild pieces in something native. I have no idea if that ever works at scale. I'm going to try anyway.

How I set the project up:
- Notes, scripts, format writeups, Ghidra exports in one workspace so agents (and future me) have context.
- Python with uv for small tools: PE sniffing, harvesting addresses/names from community headers and patches, parsers/extractors for maps and archives where I've gotten that far.
- Ghidra for the binary (with whatever labels/types the community has already surfaced).
- pygame-ce for a map viewer so I can prove I'm not misreading map/tile data.
- Markdown docs: subsystems, PE notes, links to wiki / bei.pm / forum threads.
- LLM APIs for drafting RE notes from census dumps

I'll probably post progress updates in this thread now and then - not a formal devlog, just for the kicks and to keep myself honest.

Hop on if you want:
If anyone thinks this sounds fun or wants to point me at landmines before I step on them, you're welcome. Ping me if you want to look at it - message me here or say so in the thread.

Cheers
Jonathan
« Last Edit: April 04, 2026, 07:34:08 AM by jonathangoorin »

Offline jonathangoorin

  • Newbie
  • *
  • Posts: 5
Progress: ~4.7k decompiles in - here's the map
« Reply #1 on: April 03, 2026, 02:24:40 AM »
What I've done so far: run the decompiler over the shipping binaries and keep one greppable tree of per-function C to hang notes on. It is not pretty source, but it is a floor plan I can extend: search, cross-reference, and attach an indexer to instead of reinventing context every time I sit down.

At a glance

- 4663 translation units across Outpost2.exe, op2ext.dll, and OP2Shell.dll - raw material for search and cross-reference, not pretty code.
- Outpost2.exe also gets an auto subsystem index (game loop, units, UI, saves, ...) against community Tethys VAs and a few other hints.
-- DLLs land in the same export batch as the EXE; the numbered breakdowns below are EXE-only until I fold the DLLs into the same indexer.

Table: Export volume

Binary`.c` units
Outpost2.exe4278
op2ext.dll173
OP2Shell.dll212
Total4663

Why 4277 rows but 3022 "places" on the map?

- The index has 4277 rows, but only 3022 distinct entry points.
-- Many addresses appear twice (e.g. `FUN_...` and a decorated name for the same VA).
- Any percentages below are against those 3022 uniques, not the raw row count.

Table: Mapped vs still fuzzy (EXE)

-AddressesShare
In a named subsystem (units, UI, game loop, ...)171156.6%
Only "unclassified" so far131143.4%

On the EXE side I have already put a bit more than half of the entry points into a subsystem I can reason about in write-ups; the remainder is still unclassified in my tooling and is the main backlog for me to work through.

Where the names come from

For each address I keep one best row and prefer anything that has a proper Tethys `Class::member` string.

Table: Evidence (% of 3022)

EvidenceCount% of 3022
Tethys VA table (OPU / headers)99432.9%
Curated known-address list2006.6%
HFL-style DAT globals1615.3%
Heuristic filename1665.5%
Heuristic content1505.0%
Keyword scrape401.3%
Has a subsystem assignment171156.6%
Still default / unclassified131143.4%

993 of the 3022 uniques (~33%) carry a Tethys symbol on the row I keep - the strongest naming signal in my rollup.

Table: Confidence (same rollup)

LevelAddressesShare
high119439.5%
medium47115.6%
low135744.9%

What those confidence rows mean for my index: most addresses already sit in a named subsystem; about one in three has a Tethys symbol on the row I keep; low and unclassified overlap, so 1311 addresses (~43%) are still unfinished labeling on my side - expected at this stage, not the end state.

Table: Subsystem heat (row counts, duplicates allowed)

SubsystemRows
units794
ui392
save_load391
scenario327
game_loop230
map210
rendering192
sim_tick152
unclassified1390
networking40
research49
buildings57
audio53
Sum4277

Going forward: fold op2ext.dll and OP2Shell.dll into the same subsystem indexer (the tables above stay EXE-focused until that lands), keep burning down the unclassified tail with more Ghidra time, and grow the parallel notes: markdown per area, VOL / map / saves write-ups, symbol scrapers, and a small pygame map view as a sanity check so I do not drift on tiles.

If a bucket looks wrong or you know a thread that would have saved me a week, I will read it; glad for pointers.

Cheers, 
Jonathan
« Last Edit: April 03, 2026, 02:42:14 AM by jonathangoorin »

Offline jonathangoorin

  • Newbie
  • *
  • Posts: 5
Deepening the decompilation - Phase 3G: more RE steps (planned vs actual)
« Reply #2 on: April 04, 2026, 07:33:19 AM »
Update - Phase 3G finished (what we planned vs what we got)

Below is Outpost2.exe only (the main game binary). Subsystem counts are index rows (Ghidra can emit more than one `.c` row per VA).

Where we started vs where we landed

MetricBefore Phase 3GAfter Phase 3G (now)
Named (not `FUN_*`)1,751 (57.9%)1,837 (60.8%)
Auto `FUN_*`1,2711,185
`unclassified` subsystem rows (index)~1,3901,242
New `crt` bucket (runtime tail tagged)(none)46 rows on EXE

So: +86 readable names on the EXE, -86 `FUN_*`, and ~148 fewer `unclassified` rows - not the original stretch goal of 80-90% named, but real progress with everything scripted and repeatable.

Step by step (expectation vs reality)

3G-1 - CRT / compiler tail

- Done: FID pass on the CRT address tail only, then rename leftover tail `FUN_*` to `CRT::r_<hex>`; subsystem indexer gets a `crt` bucket (`crt_ranges.json`, `apply_fidb.py`, `identify_crt.py`).
- Expected: hundreds of CRT names/classifications.
- Actual: FID barely helps on this VC++5-era EXE; most of the win is **address-based tail + `CRT::r_*`. 46 EXE rows land in `crt`. DLLs pick up more FID hits in-range.

3G-2 - RTTI / vtables

-
Done: PE-first RTTI parse (`recover_rtti.py`, `rtti_pe.py`), optional Ghidra apply, `rtti-classes.md` + JSON.
-
Expected: 100-300 functions renamed from RTTI alone.
-
Actual: Community Tethys VAs already named most stream/GFX vtable slots. RTTI confirms hierarchies and only 6** extra `FUN_*` renames on apply. OP2Shell / op2ext in the OPU tree: no MSVC RTTI anchor found.

3G-3 - MSVC demangle (exports)

- Done: Vendored demangler, `demangled-symbols.json` for all 509 EXE exports, `apply_demangled.py` in Ghidra.
- Expected: 50-200 renames.
- Actual: 23 Ghidra renames - mostly normalizing decorated leftovers; 509/509 exports demangle in JSON. Most export entry points were already named; skips 115 existing `Tethys__*` labels.

3G-4 - Call-graph subsystem propagation

- Done: Parse pseudo-C call edges, iterative neighbor voting, update `subsystem-index.json` (`callgraph-classification.md`, edge JSON).
- Expected: 500-800 index rows moved off `unclassified`.
- Actual: 153 unique VAs newly classified; `unclassified` 1389 -> 1234 rows. Graph is sparse (only calls visible in decompiler output) and the 70% vote threshold avoids mis-labeling hubs - so yield is much lower than the sketch.

3G-5 - Singleton globals (`DAT_` + raw hex)

- Done: Extended `DAT_*` rules in `subsystem_index.py`, full-file scanner `tag_by_globals.py` against `singletons.json`.
- Expected: dozens to ~130 new classifications.
- Actual: Only 2 new `global_xref` VAs; almost everything touching known singletons was already classified. 171 `dat_global` rows after regen; `unclassified` down to ~1228 after 4+5 pipeline.

3G-6 - String xrefs -> names

- Done: Ghidra `name_by_strings.py` - defined string data, xrefs into `FUN_*`, scored labels with `_msg` suffix, `--apply` + re-decompile.
- Expected: ~50-100 renames.
- Actual: 34 `FUN_*` renamed on EXE (census 1219 -> 1185 `FUN_*` at that snapshot). Many string refs sit in already-named functions; strict scoring drops format noise.

3G-7 - Scenario DLL import audit

- Done: `harvest_scenario_imports.py` - union every shipping DLL's imports from `Outpost2.exe`, diff vs exports + Tethys + index (PE only, no Ghidra).
- Expected: maybe 10-50 new names from gaps.
- Actual: 0 new Ghidra names - validation only: 279 unique imports, 0 orphan vs export table, 0 `FUN_*` at those import VAs. Proves mission API surface is already covered. 230 / 509 exports are never imported by any scanned DLL (engine-internal / unused).

Bottom line

Phase 3G-1 through 3G-7 are done. The payoff is a reproducible pipeline (decompile, index, call graph, globals, string pass, PE cross-check) more than a single headline percentage. There is still a large `FUN_*` and `unclassified` tail on the EXE - what is left is mostly harder or riskier than these automated passes.

Cheers
Jonathan
« Last Edit: April 04, 2026, 08:11:26 AM by jonathangoorin »

Offline jonathangoorin

  • Newbie
  • *
  • Posts: 5
Progress: optional Gemini pass on Ghidra C (readability sidecar)
« Reply #3 on: April 05, 2026, 12:55:06 AM »
Short update - Phase 3H-8 (automated RE backlog)

The project repo is now open on GitHub and publicly accessible (clone, browse, issues welcome): 
https://github.com/jonathangoorin/outpost2-dd-re

I added an optional pipeline that takes the existing bulk Ghidra pseudo-C (one `.c` per function under `decompiled/`) and asks Google Gemini for a readability-only rewrite. Output goes to a parallel tree `decompiled_refined/` so the raw export is never overwritten. Nothing here replaces Ghidra or the binary; it is assistive text for reading and search.

What actually runs

- Script: `tools/gemini_refine_decompiled.py` (repo), using the same API key pattern as my earlier subsystem batches.
- Gates: the reply must still mention the same function name as the export banner; optionally `clang -fsyntax-only` with a small stub header so obviously broken C gets dropped. I can turn syntax check off for huge Win32-heavy functions where the model drifts into types my stubs do not know.

What happened in practice

- So far I have run it on a subset of the shipping tree (EXE + DLLs). Hundreds of functions already have a refined file when the API and the gates succeed.
- I plan to keep going and run it over the rest of the corpus (~5k translation units) in the near future, using `--skip-existing` so anything already refined is skipped.
- Many candidates still end as validation_failed (mostly syntax stub gaps / model inventing types). Those stay Ghidra-only until I widen stubs, tighten prompts, or accept a no-syntax-check pass for specific files.

Where to read more

- `docs/re-notes/phase-3h-automated-re-techniques.md` - section 3H-8 and changelog.

Cheers, 
Jonathan
« Last Edit: April 05, 2026, 12:57:21 AM by jonathangoorin »

Offline jonathangoorin

  • Newbie
  • *
  • Posts: 5
Update: decomp wall, LoRA musings, and Project Smart at Night
« Reply #4 on: April 05, 2026, 10:06:14 AM »
Short update -

I think I've hit the decompilation wall. I kind of expected that. Still hope a miracle drops out of the sky anyway.

This whole thing is something I do out of pure curiosity - the JFK line, not because it's easy but because it's hard. I started with basically bupkis for RE and decompilers; I'm learning little by little.

The thing I've been chewing on: could you train something like a LoRA (or similar) on top of an LLM so it learns to turn Ghidra's pseudo-C - the glue-y decompiler output - into something closer to real source?

The problems are obvious:

- There isn't enough real Outpost 2 90s era source lying around to build a serious dataset.
- Even if there were tons of random C++ in the world, it's all over the place - different styles, eras, engines - so a wide corpus wouldn't be very specific to this one game.

But here's the twist: I don't actually care about breadth. Screw the idea of a giant generic dataset. What I need is a narrow one. I even know roughly what toolchain built the shipping binary - so in fantasy-land the training data would be *same as original code*, not random GitHub noise.

Another thought that clicked: the big arc of this project is still "gather enough to rebuild the game from scratch." So why not rebuild it (or chunks of it) using the same kind of tools and stack they would have used back then?

I don't have the C++ chops to crank that out by hand - and honestly I'm not planning to. Agents can write code and automate; I mostly watch over, nudge, and keep the plumbing honest.

So the two-track idea: as I build a clone in that tech stack and compile with same toolchain, I also build pairs - my source vs what a decompiler would spit at the same logic. That's dataset material. Train something on that, and maybe later it helps read the original EXE better. Two birds, one bucket of glue - recreation feeds the model, the model feeds insight back into the old binary.

It's not the straight-line approach, but it's the interesting one. Everyone's got their thing; mine is toolchains and compilers (and containerized cross-builds, CI, that side of the house). I couldn't write pretty C++ to save my life, but I *can* wire environments.

I've christened this side effort Project Smart at Night - which means Smarty Pants where i come from.

Not promising anything. I do it as long as I get a kick out of it. More posts when there's something to tell.

Cheers, 
Jonathan
« Last Edit: April 05, 2026, 10:16:22 AM by jonathangoorin »