1
Outpost 2 Programming & Development / Update: decomp wall, LoRA musings, and Project Smart at Night
« Last post by jonathangoorin on April 05, 2026, 10:06:14 AM »Short update -
I think I've hit the decompilation wall. I kind of expected that. Still hope a miracle drops out of the sky anyway.
This whole thing is something I do out of pure curiosity - the JFK line, not because it's easy but because it's hard. I started with basically bupkis for RE and decompilers; I'm learning little by little.
The thing I've been chewing on: could you train something like a LoRA (or similar) on top of an LLM so it learns to turn Ghidra's pseudo-C - the glue-y decompiler output - into something closer to real source?
The problems are obvious:
- There isn't enough real Outpost 2 90s era source lying around to build a serious dataset.
- Even if there were tons of random C++ in the world, it's all over the place - different styles, eras, engines - so a wide corpus wouldn't be very specific to this one game.
But here's the twist: I don't actually care about breadth. Screw the idea of a giant generic dataset. What I need is a narrow one. I even know roughly what toolchain built the shipping binary - so in fantasy-land the training data would be *same as original code*, not random GitHub noise.
Another thought that clicked: the big arc of this project is still "gather enough to rebuild the game from scratch." So why not rebuild it (or chunks of it) using the same kind of tools and stack they would have used back then?
I don't have the C++ chops to crank that out by hand - and honestly I'm not planning to. Agents can write code and automate; I mostly watch over, nudge, and keep the plumbing honest.
So the two-track idea: as I build a clone in that tech stack and compile with same toolchain, I also build pairs - my source vs what a decompiler would spit at the same logic. That's dataset material. Train something on that, and maybe later it helps read the original EXE better. Two birds, one bucket of glue - recreation feeds the model, the model feeds insight back into the old binary.
It's not the straight-line approach, but it's the interesting one. Everyone's got their thing; mine is toolchains and compilers (and containerized cross-builds, CI, that side of the house). I couldn't write pretty C++ to save my life, but I *can* wire environments.
I've christened this side effort Project Smart at Night - which means Smarty Pants where i come from.
Not promising anything. I do it as long as I get a kick out of it. More posts when there's something to tell.
Cheers,
Jonathan
I think I've hit the decompilation wall. I kind of expected that. Still hope a miracle drops out of the sky anyway.
This whole thing is something I do out of pure curiosity - the JFK line, not because it's easy but because it's hard. I started with basically bupkis for RE and decompilers; I'm learning little by little.
The thing I've been chewing on: could you train something like a LoRA (or similar) on top of an LLM so it learns to turn Ghidra's pseudo-C - the glue-y decompiler output - into something closer to real source?
The problems are obvious:
- There isn't enough real Outpost 2 90s era source lying around to build a serious dataset.
- Even if there were tons of random C++ in the world, it's all over the place - different styles, eras, engines - so a wide corpus wouldn't be very specific to this one game.
But here's the twist: I don't actually care about breadth. Screw the idea of a giant generic dataset. What I need is a narrow one. I even know roughly what toolchain built the shipping binary - so in fantasy-land the training data would be *same as original code*, not random GitHub noise.
Another thought that clicked: the big arc of this project is still "gather enough to rebuild the game from scratch." So why not rebuild it (or chunks of it) using the same kind of tools and stack they would have used back then?
I don't have the C++ chops to crank that out by hand - and honestly I'm not planning to. Agents can write code and automate; I mostly watch over, nudge, and keep the plumbing honest.
So the two-track idea: as I build a clone in that tech stack and compile with same toolchain, I also build pairs - my source vs what a decompiler would spit at the same logic. That's dataset material. Train something on that, and maybe later it helps read the original EXE better. Two birds, one bucket of glue - recreation feeds the model, the model feeds insight back into the old binary.
It's not the straight-line approach, but it's the interesting one. Everyone's got their thing; mine is toolchains and compilers (and containerized cross-builds, CI, that side of the house). I couldn't write pretty C++ to save my life, but I *can* wire environments.
I've christened this side effort Project Smart at Night - which means Smarty Pants where i come from.
Not promising anything. I do it as long as I get a kick out of it. More posts when there's something to tell.
Cheers,
Jonathan

Recent Posts