Author Topic: Want Outpost 2 on my Mac - learning the hard way with AI (Read 6031 times)

jonathangoorin · « **on:** April 02, 2026, 04:11:48 PM »

Hey -

I want Outpost 2 to run on my Mac, properly, someday. I don't have a background in decompilers or this codebase - I'm flying blind. What I'm doing is throwing AI agents at the problem: peel enough understanding out of Outpost2.exe and the data files that I can turn around and tell those same agents how to rebuild pieces in something native. I have no idea if that ever works at scale. I'm going to try anyway.

How I set the project up:
- Notes, scripts, format writeups, Ghidra exports in one workspace so agents (and future me) have context.
- Python with uv for small tools: PE sniffing, harvesting addresses/names from community headers and patches, parsers/extractors for maps and archives where I've gotten that far.
- Ghidra for the binary (with whatever labels/types the community has already surfaced).
- pygame-ce for a map viewer so I can prove I'm not misreading map/tile data.
- Markdown docs: subsystems, PE notes, links to wiki / bei.pm / forum threads.
- LLM APIs for drafting RE notes from census dumps

I'll probably post progress updates in this thread now and then - not a formal devlog, just for the kicks and to keep myself honest.

Hop on if you want:
If anyone thinks this sounds fun or wants to point me at landmines before I step on them, you're welcome. Ping me if you want to look at it - message me here or say so in the thread.

Cheers
Jonathan

jonathangoorin · « **Reply #1 on:** April 03, 2026, 02:24:40 AM »

What I've done so far: run the decompiler over the shipping binaries and keep one greppable tree of per-function C to hang notes on. It is not pretty source, but it is a floor plan I can extend: search, cross-reference, and attach an indexer to instead of reinventing context every time I sit down.

At a glance

- 4663 translation units across Outpost2.exe, op2ext.dll, and OP2Shell.dll - raw material for search and cross-reference, not pretty code.
- Outpost2.exe also gets an auto subsystem index (game loop, units, UI, saves, ...) against community Tethys VAs and a few other hints.
-- DLLs land in the same export batch as the EXE; the numbered breakdowns below are EXE-only until I fold the DLLs into the same indexer.

Table: Export volume

Binary	`.c` units
Outpost2.exe	4278
op2ext.dll	173
OP2Shell.dll	212
Total	4663

Why 4277 rows but 3022 "places" on the map?

- The index has 4277 rows, but only 3022 distinct entry points.
-- Many addresses appear twice (e.g. `FUN_...` and a decorated name for the same VA).
- Any percentages below are against those 3022 uniques, not the raw row count.

Table: Mapped vs still fuzzy (EXE)

-	Addresses	Share
In a named subsystem (units, UI, game loop, ...)	1711	56.6%
Only "unclassified" so far	1311	43.4%

On the EXE side I have already put a bit more than half of the entry points into a subsystem I can reason about in write-ups; the remainder is still unclassified in my tooling and is the main backlog for me to work through.

Where the names come from

For each address I keep one best row and prefer anything that has a proper Tethys `Class::member` string.

Table: Evidence (% of 3022)

Evidence	Count	% of 3022
Tethys VA table (OPU / headers)	994	32.9%
Curated known-address list	200	6.6%
HFL-style DAT globals	161	5.3%
Heuristic filename	166	5.5%
Heuristic content	150	5.0%
Keyword scrape	40	1.3%
Has a subsystem assignment	1711	56.6%
Still default / unclassified	1311	43.4%

993 of the 3022 uniques (~33%) carry a Tethys symbol on the row I keep - the strongest naming signal in my rollup.

Table: Confidence (same rollup)

Level	Addresses	Share
high	1194	39.5%
medium	471	15.6%
low	1357	44.9%

What those confidence rows mean for my index: most addresses already sit in a named subsystem; about one in three has a Tethys symbol on the row I keep; low and unclassified overlap, so 1311 addresses (~43%) are still unfinished labeling on my side - expected at this stage, not the end state.

Table: Subsystem heat (row counts, duplicates allowed)

Subsystem	Rows
units	794
ui	392
save_load	391
scenario	327
game_loop	230
map	210
rendering	192
sim_tick	152
unclassified	1390
networking	40
research	49
buildings	57
audio	53
Sum	4277

Going forward: fold op2ext.dll and OP2Shell.dll into the same subsystem indexer (the tables above stay EXE-focused until that lands), keep burning down the unclassified tail with more Ghidra time, and grow the parallel notes: markdown per area, VOL / map / saves write-ups, symbol scrapers, and a small pygame map view as a sanity check so I do not drift on tiles.

If a bucket looks wrong or you know a thread that would have saved me a week, I will read it; glad for pointers.

Cheers,
Jonathan

jonathangoorin · « **Reply #2 on:** April 04, 2026, 07:33:19 AM »

Update - Phase 3G finished (what we planned vs what we got)

Below is Outpost2.exe only (the main game binary). Subsystem counts are index rows (Ghidra can emit more than one `.c` row per VA).

Where we started vs where we landed

Metric	Before Phase 3G	After Phase 3G (now)
Named (not `FUN_*`)	1,751 (57.9%)	1,837 (60.8%)
Auto `FUN_*`	1,271	1,185
`unclassified` subsystem rows (index)	~1,390	1,242
New `crt` bucket (runtime tail tagged)	(none)	46 rows on EXE

So: +86 readable names on the EXE, -86 `FUN_*`, and ~148 fewer `unclassified` rows - not the original stretch goal of 80-90% named, but real progress with everything scripted and repeatable.

Step by step (expectation vs reality)

3G-1 - CRT / compiler tail

- Done: FID pass on the CRT address tail only, then rename leftover tail `FUN_*` to `CRT::r_<hex>`; subsystem indexer gets a `crt` bucket (`crt_ranges.json`, `apply_fidb.py`, `identify_crt.py`).
- Expected: hundreds of CRT names/classifications.
- Actual: FID barely helps on this VC++5-era EXE; most of the win is **address-based tail + `CRT::r_*`. 46 EXE rows land in `crt`. DLLs pick up more FID hits in-range.

3G-2 - RTTI / vtables

- Done: PE-first RTTI parse (`recover_rtti.py`, `rtti_pe.py`), optional Ghidra apply, `rtti-classes.md` + JSON.
- Expected: 100-300 functions renamed from RTTI alone.
- Actual: Community Tethys VAs already named most stream/GFX vtable slots. RTTI confirms hierarchies and only 6** extra `FUN_*` renames on apply. OP2Shell / op2ext in the OPU tree: no MSVC RTTI anchor found.

3G-3 - MSVC demangle (exports)

- Done: Vendored demangler, `demangled-symbols.json` for all 509 EXE exports, `apply_demangled.py` in Ghidra.
- Expected: 50-200 renames.
- Actual: 23 Ghidra renames - mostly normalizing decorated leftovers; 509/509 exports demangle in JSON. Most export entry points were already named; skips 115 existing `Tethys__*` labels.

3G-4 - Call-graph subsystem propagation

- Done: Parse pseudo-C call edges, iterative neighbor voting, update `subsystem-index.json` (`callgraph-classification.md`, edge JSON).
- Expected: 500-800 index rows moved off `unclassified`.
- Actual: 153 unique VAs newly classified; `unclassified` 1389 -> 1234 rows. Graph is sparse (only calls visible in decompiler output) and the 70% vote threshold avoids mis-labeling hubs - so yield is much lower than the sketch.

3G-5 - Singleton globals (`DAT_` + raw hex)

- Done: Extended `DAT_*` rules in `subsystem_index.py`, full-file scanner `tag_by_globals.py` against `singletons.json`.
- Expected: dozens to ~130 new classifications.
- Actual: Only 2 new `global_xref` VAs; almost everything touching known singletons was already classified. 171 `dat_global` rows after regen; `unclassified` down to ~1228 after 4+5 pipeline.

3G-6 - String xrefs -> names

- Done: Ghidra `name_by_strings.py` - defined string data, xrefs into `FUN_*`, scored labels with `_msg` suffix, `--apply` + re-decompile.
- Expected: ~50-100 renames.
- Actual: 34 `FUN_*` renamed on EXE (census 1219 -> 1185 `FUN_*` at that snapshot). Many string refs sit in already-named functions; strict scoring drops format noise.

3G-7 - Scenario DLL import audit

- Done: `harvest_scenario_imports.py` - union every shipping DLL's imports from `Outpost2.exe`, diff vs exports + Tethys + index (PE only, no Ghidra).
- Expected: maybe 10-50 new names from gaps.
- Actual: 0 new Ghidra names - validation only: 279 unique imports, 0 orphan vs export table, 0 `FUN_*` at those import VAs. Proves mission API surface is already covered. 230 / 509 exports are never imported by any scanned DLL (engine-internal / unused).

Bottom line

Phase 3G-1 through 3G-7 are done. The payoff is a reproducible pipeline (decompile, index, call graph, globals, string pass, PE cross-check) more than a single headline percentage. There is still a large `FUN_*` and `unclassified` tail on the EXE - what is left is mostly harder or riskier than these automated passes.

Cheers
Jonathan

jonathangoorin · « **Reply #3 on:** April 05, 2026, 12:55:06 AM »

Short update - Phase 3H-8 (automated RE backlog)

The project repo is now open on GitHub and publicly accessible (clone, browse, issues welcome):
https://github.com/jonathangoorin/outpost2-dd-re

I added an optional pipeline that takes the existing bulk Ghidra pseudo-C (one `.c` per function under `decompiled/`) and asks Google Gemini for a readability-only rewrite. Output goes to a parallel tree `decompiled_refined/` so the raw export is never overwritten. Nothing here replaces Ghidra or the binary; it is assistive text for reading and search.

What actually runs

- Script: `tools/gemini_refine_decompiled.py` (repo), using the same API key pattern as my earlier subsystem batches.
- Gates: the reply must still mention the same function name as the export banner; optionally `clang -fsyntax-only` with a small stub header so obviously broken C gets dropped. I can turn syntax check off for huge Win32-heavy functions where the model drifts into types my stubs do not know.

What happened in practice

- So far I have run it on a subset of the shipping tree (EXE + DLLs). Hundreds of functions already have a refined file when the API and the gates succeed.
- I plan to keep going and run it over the rest of the corpus (~5k translation units) in the near future, using `--skip-existing` so anything already refined is skipped.
- Many candidates still end as validation_failed (mostly syntax stub gaps / model inventing types). Those stay Ghidra-only until I widen stubs, tighten prompts, or accept a no-syntax-check pass for specific files.

Where to read more

- `docs/re-notes/phase-3h-automated-re-techniques.md` - section 3H-8 and changelog.

Cheers,
Jonathan

jonathangoorin · « **Reply #4 on:** April 05, 2026, 10:06:14 AM »

Short update -

I think I've hit the decompilation wall. I kind of expected that. Still hope a miracle drops out of the sky anyway.

This whole thing is something I do out of pure curiosity - the JFK line, not because it's easy but because it's hard. I started with basically bupkis for RE and decompilers; I'm learning little by little.

The thing I've been chewing on: could you train something like a LoRA (or similar) on top of an LLM so it learns to turn Ghidra's pseudo-C - the glue-y decompiler output - into something closer to real source?

The problems are obvious:

- There isn't enough real Outpost 2 90s era source lying around to build a serious dataset.
- Even if there were tons of random C++ in the world, it's all over the place - different styles, eras, engines - so a wide corpus wouldn't be very specific to this one game.

But here's the twist: I don't actually care about breadth. Screw the idea of a giant generic dataset. What I need is a narrow one. I even know roughly what toolchain built the shipping binary - so in fantasy-land the training data would be *same as original code*, not random GitHub noise.

Another thought that clicked: the big arc of this project is still "gather enough to rebuild the game from scratch." So why not rebuild it (or chunks of it) using the same kind of tools and stack they would have used back then?

I don't have the C++ chops to crank that out by hand - and honestly I'm not planning to. Agents can write code and automate; I mostly watch over, nudge, and keep the plumbing honest.

So the two-track idea: as I build a clone in that tech stack and compile with same toolchain, I also build pairs - my source vs what a decompiler would spit at the same logic. That's dataset material. Train something on that, and maybe later it helps read the original EXE better. Two birds, one bucket of glue - recreation feeds the model, the model feeds insight back into the old binary.

It's not the straight-line approach, but it's the interesting one. Everyone's got their thing; mine is toolchains and compilers (and containerized cross-builds, CI, that side of the house). I couldn't write pretty C++ to save my life, but I *can* wire environments.

I've christened this side effort Project Smart at Night - which means Smarty Pants where i come from.

Not promising anything. I do it as long as I get a kick out of it. More posts when there's something to tell.

Cheers,
Jonathan

BlackBox · « **Reply #5 on:** April 07, 2026, 07:41:32 PM »

Okay, I'll bite assuming that the first post was actually written by a person, since the rest of this thread appears to be copy pasted from an LLM verbatim (at least no person I know actually writes that way)... and the github link doesn't actually work?

First of all I would point out that the most realistic way to run OP2 on a Mac is by using Wine, or running Windows in a virtual machine (the latter will have way better performance), assuming this isn't "just for fun" and you just want the game to work on a Mac.

Secondly I would also point out that many folks here (self included) are likely to be pretty skeptical of this approach actually working; there have been many attempts at remakes / sequels over the years here and none have really gotten anywhere. This is definitely not a beginner project and doesn't seem likely to be feasible without some degree of reverse engineering knowledge (or at least software dev experience).

That said, addressing a few other points made in the thread:

Quote

There isn't enough real Outpost 2 90s era source lying around to build a serious dataset.

There isn't any actual OP2 source known to exist, everything that you find is going to be stuff that the community has created via RE over the years. The only exception here are the SIGS headers from the Homeworld 1 source, and parts of a couple of libraries that were also used in Tribes 1 (source for T1 is out there, but OP2's version of said libs is a little different).

Quote

So why not rebuild it (or chunks of it) using the same kind of tools and stack they would have used back then?

Not sure I understand this, unless you want to target MacOS classic (i.e. it'd run on 68k / PPC macs and absolutely nothing modern) - the OS environment and programming model is very different from modern MacOS. Using awful-by-modern-standards tooling from 1996 doesn't seem to make a whole lot of sense unless the goal is to torture oneself, I guess? There are also a lot of strange limitations/unhandled edge cases and "interesting" design choices in OP2s engine that would make very little sense to preserve, some of which may have made sense given the hardware of the day and some of which was clearly half baked to rush the game out the door.

Quote

3G-2 - RTTI / vtables

- Done: PE-first RTTI parse (`recover_rtti.py`, `rtti_pe.py`), optional Ghidra apply, `rtti-classes.md` + JSON.
- Expected: 100-300 functions renamed from RTTI alone.
- Actual: Community Tethys VAs already named most stream/GFX vtable slots. RTTI confirms hierarchies and only 6** extra `FUN_*` renames on apply. OP2Shell / op2ext in the OPU tree: no MSVC RTTI anchor found.

This section seems a bit technobabbly to be honest... only a couple parts of the game were compiled with RTTI and there isn't anywhere near hundreds of RTTI descriptors.

Also, FWIW, op2ext is a community developed library that we have source for, so there isn't a whole lot of point in RE'ing it.

Quote

HFL-style DAT globals

Huh? HFL as in the community developed mission dev library?

Quote

Table: Subsystem heat (row counts, duplicates allowed)

Subsystem   Rows
units   794
ui   392
save_load   391
scenario   327

...

There isn't particularly great separation of concerns in the compiled OP2 code. A lot of code is inlined all over the place and there's a lot of code that is difficult to bucket in this manner. The TethysAPI headers are probably the most recent/complete description we have. Not everything is documented yet in these headers though to be fair, we have documented a lot more in private IDA databases that we can't distribute for copyright/legal reasons.

I'm also not sure there would be much value in reverse engineering any of the stuff relating to VOLs, etc. All that has been long-ago reverse engineered and we have reimplementations of the libraries; further, the community update extracts all the files, so there isn't much point in working with VOLs anymore.

As I know I've said in other posts like this in the past, I will be pleasantly surprised if this works but I don't know that an LLM is a great substitute for being able to do a lot of the work yourself.

jonathangoorin · « **Reply #6 on:** April 14, 2026, 02:47:53 AM »

@BlackBox first of all, thanks for the reply - really appreciate it.

I am a real person. Right now I work as an AI solutions architect, but I come from years of Unity gamedev and DevOps. Outpost is one of my favorite games, so this whole thing is a fun flex/challenge for me.

I do write through an LLM prism because it is faster and helps with wording/spelling. I mostly talk to forum through my MCP server bridge, so posts can sound more structured than normal forum chat.

Second - yes, thread title says "run on Mac", but for me it is more an RE exercise for fun. Basically: let's see if I can wing it and learn while doing it.

I admit I am nowhere near the RE knowledge level of many people here. I am still learning basics (including how to do proper debug sessions). A lot of this I learn from agents step by step. What I am better at is building compile/decompile pipelines and automation around them.

I keep this thread as a log of effort.

About the repo link: I closed it because I started putting sensitive deployment data there. I do plan to publish a clean copy once I get things into a safer state.

So the crazy idea is:
- Set up an OP2-era compile environment.
- Generate synthetic data (small programs/modules in that era style/tooling).
- Decompile with Ghidra.
- Build source <-> decompile training pairs.
- Train/evaluate decompilation-focused models on those pairs.

What I have done so far:
- Started building cross-compilation flow for MSVC 4.20.
- Using Win2K VM on QEMU with VS 4.20 (started with NT4, but telnet there was painful).
- Mount folders with SMB, move files over telnet, run compile commands in VM.
- Got test scripts working for this loop.
- Now containerizing system so it is more encapsulated/coherent.

Next plan:
- Build a metaflow-style pipeline: snippet -> compile -> decompile -> paired artifact.
- Measure base LLM delta against these pairs.
- Then fine-tune LoRAs specialized for this compiler/style.
- Track experiments in MLflow on K8s with GPU cloud infra.

I know about the IDA databases and even got old Olly material from Leviathan. I plan to incorporate that later. Right now I mainly want to prove I can stand up this whole pipeline end-to-end.

Anyway - this is part curiosity, part learning project. I am not claiming I found some magic shortcut. I am posting progress as I go.

Arklon · « **Reply #7 on:** April 30, 2026, 10:41:54 PM »

Quote from: jonathangoorin on April 14, 2026, 02:47:53 AM

I am a real person. Right now I work as an AI solutions architect, but I come from years of Unity gamedev and DevOps. Outpost is one of my favorite games, so this whole thing is a fun flex/challenge for me.

What did you do with Unity, if you don't mind me asking? I worked with some of their proper engine developers on 3D GFX.

zerodown524 · « **Reply #8 on:** May 06, 2026, 06:19:08 PM »

Claude got this working with Wine on my Mac super quick. I'm currently rebuilding the map editor in python (cause C/C++ is pure pain for my eyes).

I'm a SRE with a few years of SWE in my background. Will keep you all posted.

**edit**
IDK why but my attachment is getting denied? I'm able to build out new maps with a tileset editor with minerals, structures, etc...

Wish the AI wasn't so terrible in this game. I knew it when I was 8yo - at 36 it's still ringing like a bell how bad it is.

Leviathan · « **Reply #9 on:** May 28, 2026, 07:24:25 AM »

Cool project. I don't know loads about reverse engineering but was always super happy for the work done by the community over the years to reverse engineer Outpost 2 so that we can patch the game and create new content for it. For running on MacOS indeed Wine is the best way.

Cool to hear your working on a project zerodown524! BlackBox also started work on a map editor in python.

For map editors there are currently three working projects:
OP2Mapper2 - The VB6 editor from 2005
OP2Mapper3 - A new VB.NET editor from 2026
OP2MissionEditor- Unity / C# map editor from 2019

News:

Author Topic: Want Outpost 2 on my Mac - learning the hard way with AI (Read 6031 times)

jonathangoorin

Want Outpost 2 on my Mac - learning the hard way with AI

jonathangoorin

Progress: ~4.7k decompiles in - here's the map

jonathangoorin

Deepening the decompilation - Phase 3G: more RE steps (planned vs actual)

jonathangoorin

Progress: optional Gemini pass on Ghidra C (readability sidecar)

jonathangoorin

Update: decomp wall, LoRA musings, and Project Smart at Night

BlackBox

Re: Want Outpost 2 on my Mac - learning the hard way with AI

jonathangoorin

Re: Want Outpost 2 on my Mac - learning the hard way with AI

Arklon

Re: Want Outpost 2 on my Mac - learning the hard way with AI

zerodown524

Re: Want Outpost 2 on my Mac - learning the hard way with AI

Leviathan

Re: Want Outpost 2 on my Mac - learning the hard way with AI