The Touhou PC-98 Restoration Project
| Reverse-engineered: | | | | | | | |-----------------------:|---|---|---|---|---|--| | Position independence: | | | | | | |
This project aims to perfectly reconstruct the source code of the first five Touhou Project games by ZUN Soft (now Team Shanghai Alice), which were originally released exclusively for the NEC PC-9801 system.
The original games in question are:
Since we only have the binaries, we obviously can't know how ZUN named any variables and functions, and which comments the original code was surrounded with. Perfect therefore means that the binaries compiled from the code in the ReC98 repository are indistinguishable from ZUN's original builds, making it impossible to disprove that the original code couldn't have looked like this. This property is maintained for every Git commit along the way.
Aside from the preservation angle and the resulting deep insight into the games' mechanics, the code can then serve as the foundation for any type of mod, or any port to non-PC-98 platforms, developed by the community. This is also why ReC98 values readable and understandable code over a pure decompilation.
There are a number reasons why achieving moddability via full decompilation seems to be more worthwhile for the PC-98 games, in contrast to a PyTouhou-style black-box reimplementation:
Definitely. During the development of the static English patches for these games, we identified two main libraries used across all 5 games, and even found their source code. These are:
ZUN.COM -4) is a rebranded version of Promisence Soft's
SPRITE16.COM, a 16-color PC-98 EGC display driver, version 0.04, which was bundled with the sample game StormySpace.
master.lib and the C/C++ runtime alone make up a sizable amount of the code in all the executables. In TH05, for example, they amount to 74% of all code in
OP.EXE, and 40% of all code in
MAIN.EXE. That's already quite a lot of code we do not have to deal with. Identifying the rest of the code shared across the games will further reduce the workload to a more acceptable amount.
With DOSBox-X and the Debug edition of Neko Project II, we now also have two open-source PC-9821 emulators capable of running the games. This will greatly help in understanding all hardware-specific code.
Crossed-out files are identical to their version in the previous game. ONGCHK.COM is part of the PMD sound driver by KAJA, and therefore doesn't need to be disassembled either; we only need to keep the binary to allow bit-perfect rebuilds of ZUN.COM.
This was the compiler ZUN originally used, so it's the only one that can deterministically compile this code to executables that are bit-perfect to ZUN's original ones.
Borland never made a cross compiler targeting 16-bit DOS that runs on 32-bit Windows, so the C++ parts have to be compiled using a 16-bit DOS program. The not yet decompiled ASM parts of the code, however, can be assembled using a 32-bit Windows tool. This not only way outperforms any 16-bit solution that would have to be emulated on modern 64-bit systems, making build times, well, tolerable. It also removes any potential EMS or XMS issues we might have had with
TASMX.EXEon these emulators.
These advantages were particularly relevant in the early days of ReC98, when the ASM files were pretty huge. That's also when I decided to freely use long file names that don't need to conform to the 8.3 convention… As a result, the build process still starts with a separate 32-bit part (
build32b.bat), which must be run in Windows (or Wine).
In the end though, we'd definitely like to have a single-step 16-bit build process that requires no 32-bit tools. This will probably happen some time after reaching 100% position independence over all games.
Released as freeware, and as of July 2020, still sort of officially downloadable from
Needed to fulfill the role of being "just any native C++ compiler" for our own tools that either don't necessarily have to run on 16-bit DOS, or are required by the 32-bit build step, as long as that one still exists (see above).
Currently, this category of tools only includes the converter for hardcoded sprites. Since that one is written to be as platform-independent as possible, it could easily be compiled with any other native C compiler you happen to have already installed. (Which also means that future port developers hopefully have one less thing to worry about.) So, if you dislike additional dependencies, feel free to edit the
bmp2arris compiled with any other C compiler of your choice.
However, choosing Borland C++ 5.5 as a default for everyone else fits ReC98 very well for several reasons:
A sane, parallel build system, used to ensure minimal rebuilds during the 32-bit build part. Provides perfect tracking of dependencies via code injection and hooking a compiler's file opening syscalls, allowing it to automatically add all
#included files to the build dependency graph. This makes it way superior to most
makeimplementations, which lack this vital feature, and are therefore inherently unsuited for pretty much any programming language imaginable. With no abstractions for specific compilers, Tup also fits perfectly with the ancient Borland tools required for this project.
As of September 2020, the Windows version of Tup requires Vista or higher. In case Tup can't run or isn't installed, the build process falls back on a dumb batch file, which always fully rebuilds the entire 32-bit part.
For the most part, it shouldn't matter whether you use the original DOSBox or your favorite fork. A DOSBox with dynamic recompilation is highly recommended for faster compilation, though. Make sure to enable that feature by setting the following options in its configuration file (
dosbox.conffor the original version):
[cpu] core=dynamic cycles=max
The most performant OS for building ReC98 is therefore a 32-bit Windows ≥Vista, where both the 32-bit and 16-bit build parts can run natively from a single shell. The build process was tested and should work reliably on pretty much every system though – from modern 64-bit Windows and Linux, down to Windows 95, which you might use on actual PC-98 hardware.
bin/ilink32.cfgfiles for Borland C++ 5.5, as pointed out in its
readme.txtfile. This fixes errors like
Error E2209 Pipeline/bmp2arrl.c 12: Unable to open file 'io.h'
that you will encounter otherwise.
build32b.batin a Windows shell, followed
build16b.batin your DOSBox of choice.
All batch files will abort with an error if any of the necessary tools can't be found in the
The final executables will be put into
bin\th0?, using the same names as the originals.
Error: Unable to execute command 'tlink.exe'
Cause: To locate TLINK, TCC needlessly copies the
PATHenvironment variable into a statically allocated 128-byte buffer. It then constructs absolute
tlink.exefilenames for each of the semicolon- or
\0-terminated paths, writing these into a buffer that immediately follows the 128-byte
PATHbuffer in memory. The search is finished as soon as TCC found an existing file, which gives precedence to earlier paths in the
PATH. If the search didn't complete until a potential "final" path that runs past the 128 bytes, the final attempted filename will consist of the part that still fit into the buffer, followed by the previously attempted path.
Workaround: Make sure that the
BIN\path to Turbo C++ 4.0J is fully contained within the first 127 bytes of the
PATHinside your DOS system. (The 128th byte must either be a separating
;or the terminating
Loader error (0000): Unrecognized Erroron 32-bit Windows ≥Vista
This can be fixed by configuring the NTVDM DPMI driver to be loaded into conventional memory rather than upper memory, by editing
REM Install DPMI support -LH %SystemRoot%\system32\dosx +%SystemRoot%\system32\dosx
Requires a reboot after that edit to take effect.