51
Outpost 2 Programming & Development / Progress: optional Gemini pass on Ghidra C (readability sidecar)
« Last post by jonathangoorin on April 05, 2026, 12:55:06 AM »Short update - Phase 3H-8 (automated RE backlog)
The project repo is now open on GitHub and publicly accessible (clone, browse, issues welcome):
https://github.com/jonathangoorin/outpost2-dd-re
I added an optional pipeline that takes the existing bulk Ghidra pseudo-C (one `.c` per function under `decompiled/`) and asks Google Gemini for a readability-only rewrite. Output goes to a parallel tree `decompiled_refined/` so the raw export is never overwritten. Nothing here replaces Ghidra or the binary; it is assistive text for reading and search.
What actually runs
- Script: `tools/gemini_refine_decompiled.py` (repo), using the same API key pattern as my earlier subsystem batches.
- Gates: the reply must still mention the same function name as the export banner; optionally `clang -fsyntax-only` with a small stub header so obviously broken C gets dropped. I can turn syntax check off for huge Win32-heavy functions where the model drifts into types my stubs do not know.
What happened in practice
- So far I have run it on a subset of the shipping tree (EXE + DLLs). Hundreds of functions already have a refined file when the API and the gates succeed.
- I plan to keep going and run it over the rest of the corpus (~5k translation units) in the near future, using `--skip-existing` so anything already refined is skipped.
- Many candidates still end as validation_failed (mostly syntax stub gaps / model inventing types). Those stay Ghidra-only until I widen stubs, tighten prompts, or accept a no-syntax-check pass for specific files.
Where to read more
- `docs/re-notes/phase-3h-automated-re-techniques.md` - section 3H-8 and changelog.
Cheers,
Jonathan
The project repo is now open on GitHub and publicly accessible (clone, browse, issues welcome):
https://github.com/jonathangoorin/outpost2-dd-re
I added an optional pipeline that takes the existing bulk Ghidra pseudo-C (one `.c` per function under `decompiled/`) and asks Google Gemini for a readability-only rewrite. Output goes to a parallel tree `decompiled_refined/` so the raw export is never overwritten. Nothing here replaces Ghidra or the binary; it is assistive text for reading and search.
What actually runs
- Script: `tools/gemini_refine_decompiled.py` (repo), using the same API key pattern as my earlier subsystem batches.
- Gates: the reply must still mention the same function name as the export banner; optionally `clang -fsyntax-only` with a small stub header so obviously broken C gets dropped. I can turn syntax check off for huge Win32-heavy functions where the model drifts into types my stubs do not know.
What happened in practice
- So far I have run it on a subset of the shipping tree (EXE + DLLs). Hundreds of functions already have a refined file when the API and the gates succeed.
- I plan to keep going and run it over the rest of the corpus (~5k translation units) in the near future, using `--skip-existing` so anything already refined is skipped.
- Many candidates still end as validation_failed (mostly syntax stub gaps / model inventing types). Those stay Ghidra-only until I widen stubs, tighten prompts, or accept a no-syntax-check pass for specific files.
Where to read more
- `docs/re-notes/phase-3h-automated-re-techniques.md` - section 3H-8 and changelog.
Cheers,
Jonathan

Recent Posts