fallout2-docs/dat.html at master · rotators/fallout2-docs · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
<html>
<head>
    <title>DAT file format</title>
    <link rel="stylesheet" href="style.css" type="text/css" />
</head>
<body>
<div style="width: 1200px; margin: auto;">
<h1>DAT file format</h1>

<p>DAT files are archived data files. They contain the bulk of the data for Fallout and Fallout 2, including all game artwork, critters, scripts, message/dialogue files, sounds and speech audio files, and much more. The two DAT files used by the games are master.dat and critter.dat, both of which are located in the root game folder. </p>
<h3>DAT1 vs DAT2</h3>
<p>There were two different DAT file formats used for the Fallout games. Both Fallout 1 and Fallout 2 used different formats but used the same file ending: *.dat. To avoid misunderstandings, we'll refer to DAT1 (for the Fallout 1 DAT format) and DAT2 (for the Fallout 2 version) in this document. Note that DAT2 is not an improved version of DAT1, but a completely rewritten file type that shares little in common with DAT1. </p>
<h3>DAT1</h3>
<p>This specific DAT file type stores data in the <a href="http://en.wikipedia.org/wiki/Endianness">big-endian</a> format.</p>
<h4>Structure</h4>
<pre style="background: #222; padding: 10px;">
int32   4   DirectoryCount
int32   4   FolderAllocationHint. Usually 0x0A (0x5E for master.dat). Must not be less than DirectoryCount or Fallout will crash.
int32   4   Reserved1. Always 0.
int32   4   Timestamp. Unix timestamp of when the archive was created.

// Directory Name Block - for each directory (DirectoryCount times)
// master.dat starts its listing with a root node, called '.'. This node contains COLOR.PAL and several font files; be careful not to skip it, as it may appear like two extraneous bytes
byte    1   Length (number of charactes) in DirectoryName
char    *   DirectoryName

// Directory Content Block - for each directory (DirectoryCount times)
int32   4   FileCount. Number of files in the directory.
int32   4   FileAllocationHint. Similar to FolderAllocationHint, must be at least FileCount.
int32   4   FixedMetadataSize. Always 0x10 (16 bytes), which is the size of the 4 DWORDS in the File List Block.
int32   4   Timestamp. Creation timestamp for this directory block.
    // File List Block - for each file in directory (FileCount times).
    byte    1   Name length (number of characters)
    char    *   Name
    int32   4   Attributes. 0x20 means plain-text, 0x40 - compressed with LZSS.
    int32   4   Offset. Position in the file (from the beginning of the DAT file), where the file contets start.
    int32   4   Size. Original (uncompressed) file size.
    int32   4   PackedSize. Size of the compressed file in dat. If file is not compressed, PackedSize is 0.

// Data block
byte    *   File data for all files (use Offset and [Packed]Size to find where a specific file starts and ends).
</pre>

<h4>Technical Analysis of "Unknown" Fields</h4>
<p>Through community reverse-engineering (notably by TeamX and the Falltergeist project), the purposes of the previously unknown header fields in the DAT1 format have been identified:</p>

<h5>Allocation Hints (Unknown1 & Unknown4)</h5>
<p>The Fallout engine uses these fields as **memory allocation hints** rather than dynamic counts. When the engine calls its internal <code>db_init</code> function, it uses <code>FolderAllocationHint</code> to determine the size of the memory buffer to allocate for the directory list (e.g., <code>malloc(FolderAllocationHint * sizeof(DirEntry))</code>). It then loops <code>DirectoryCount</code> times to read the data. If <code>DirectoryCount</code> exceeds the allocation hint, the engine writes out of bounds, leading to a heap corruption crash. This explains why <code>master.dat</code> has exactly 94 folders and a hint of 94, while <code>critter.dat</code> has 10 folders and a hint of 10.</p>

<h5>FixedMetadataSize (Unknown5)</h5>
<p>This field is always 16 (<code>0x10</code>). This corresponds exactly to the size of the four 32-bit integers (Attributes, Offset, Size, PackedSize) that define a file entry. The engine likely uses this value as an offset jump or a validation check for the record size.</p>

<h5>Timestamps (Unknown3 & Unknown6)</h5>
<p>These fields store a standard 32-bit Unix Timestamp (seconds since Jan 1, 1970). For example, the <code>master.dat</code> in the original Fallout release contains <code>0x34110398</code>, which translates to **September 5, 1997**, just weeks before the game's gold release. While the engine reads these values, it does not validate them, which is why modding tools can safely ignore them or set them to zero without causing issues.</p>

<h4>Fallout 1 LZSS uncompression algorithm</h4>
<p>Originally <a href="https://www.nma-fallout.com/threads/fallout-dat-files.160366/">written by Shadowbird</a> on NMA forum.</p>
<ul>
<li>This is a <b>decompression algorithm for files compressed with Fallout's LZSS algorithm</b>, not a file extraction algorithm for getting them out of the DAT file! DAT unpackers already incorporate this.</li>
<li>It's pretty much a generic LZSS decompression algorithm, with a possible difference from other implementations in that it doesn't prevent overwriting dictionary values while they're being output (see the loop in @FLeven).</li>
</ul>
<hr/>
<pre style="background: #222; padding: 10px;">
DICT_SIZE = 4096; // Dictionary (a.k.a. sliding window / ring / buffer) size
MIN_MATCH = 3;
MAX_MATCH = 18;

Int16 N = 0;                      // number of bytes to read
Int16 DO = 0;                     // Dictionary offset - for reading
Int16 DI = DICT_SIZE - MAX_MATCH; // Dictionary index - for writing
Byte L = 0;                       // Match length
Byte FL = 0;                      // Flags indicating the compression status of up to 8 next characters.

@Start
* If at the end of file, exit.
* Read N from input. The absolute value of N is how many bytes of data to read (if N=0, exit).
* Go to @N<0 or @N>0 accordingly.

@N<0
* Take the absolute value of N (or multiply N by -1), and write that many bytes directly from input to output (without
        putting anything in Dictionary).
* Go to @Start.

@N>0
* Clear dictionary (fill with spaces — 0x20)
* DI = DICT_SIZE - MAX_MATCH;
* Go to @Flag.

@Flag
* Read FL from input.
* If N bytes have been read from input, go to @Start, otherwise, go to @Next.

@Next
* If this is the 9th time here since last @Flag, go to @Flag.
* Go to @FLeven or @FLodd as appropriate.

@FLodd
* Read 1 byte from input, write it to output and to Dictionary (at position DI).
* If N bytes have been read from input, go to @Start.
* DI = DI + 1, or DI = 0 (if past the end of Dictionary).
* Goto @FlagNext.

@FLeven
* Read 1 byte from input to DO.
* If N bytes have been read from input, go to @Start (in a correctly compressed file this should not ever happen).
* Read L from input.
* Prepend the high-nibble (first 4 bits) from L to DO (DO = DO | ((L & 0xF0) << 4)) and remove it from L (L = L & 0x0F).
* (L + MIN_MATCH) times:
  * Read a byte from dictionary at offset DO (wrap to the start of dictionary if past the end), and write to the output.
  * Write the byte to the Dictionary also, at position DI.
  * DI = DI + 1, or DI = 0 (if past the end of Dictionary).
  * DO = DO + 1.
* Go to @FlagNext.

@FlagNext
* Divide FL by 2, rounding down (FL = FL >> 1).
</pre>
<hr/>
<a href="https://github.com/rotators/Fo1in2/blob/master/Tools/UndatUI/src/dat.cs#L12-L158">C# implementation of the above</a><br/><br/>
<b>Read more</b><br/>
<a href="https://web.archive.org/web/20160110174426/https://oku.edu.mie-u.ac.jp/~okumura/compression/history.html">History of Data Compression in Japan</a><br/>
<a href="https://en.wikipedia.org/wiki/Lempel-Ziv-Storer-Szymanski">LZSS compression</a>

<h3>DAT2 (Little-endian)</h3>
<p>DAT2 files are divided in 3 parts, Data Block, Directory Tree and Fixed DAT Information block. Data Blocks contains all the files stored in the DAT, some of them are compressed with zlib streams, others are stored as-is. The Directory Tree contains all the information about each file stored in Data Block, as well as the offset where it's located, if it's compressed or not, packed/unpacked sizes, etc. And finally the Fixed DAT Information block contains the size in bytes of both the archive data and the Directory Tree. Here you can see a small scheme of DAT's structure:</p>
<table style="background: #222; margin: 10px;">
<thead>
<tr>
<th style="width: 150px;">Part</th>
<th style="width: 150px;">Location</th>
<th>Description</th>
</tr>
</thead>
<tbody>
	<tr>
		<td>DataBlock</td>
		<td>.............X </td>
		<td>Files stored in the archive</td>
	</tr>
	<tr>
		<td>FilesTotal</td>
		<td>X+1</td>
		<td>Number of files in the archive</td>
	</tr>
	<tr>
		<td>DirTree </td>
		<td>X+5.............Z</td>
		<td>Directory listing entries</td>
	</tr>
	<tr>
		<td>TreeSize</td>
		<td>Z+1</td>
		<td>Size of DirTree in bytes</td>
	</tr>
	<tr>
		<td>DataSize</td>
		<td>Z+5</td>
		<td>Full size of the archive in bytes</td>
	</tr>
</tbody>
</table>
<ul>
	<li>FilesTotal + DirTree corresponds to Directory Tree block</li>
	<li>TreeSize + DataSize corresponds to Fixed DAT Information block</li>
</ul>
<h4>The Data Block</h4>
<p>The Data Block contains the actual file data. Technical information for these files is located in the Directory Tree. In normal Fallout 2 DATs the Data Block starts from the very beginning of a DAT file. The engine also supports an archive data section that begins later in the file: it calculates <code>dataOffset = physicalFileSize - DataSize</code>, then adds each entry's <code>Offset</code> to that base.</p>
<p>Files can be compressed or uncompressed. Compressed entries are zlib streams (RFC-1950/RFC-1951), not gzip files. Common compressed data begins with the zlib bytes <code>78 DA</code>, but parsers should trust the directory entry's <code>Type</code> flag and feed the entry data to a zlib inflate stream rather than relying only on those first two bytes.</p>

<h4>The Directory Tree</h4>
<p>The Directory Tree contains entries that specify the properties of each file stored in the Data Block. These entries vary in size depending on the FilenameSize (Path + Filename). As shown in the scheme above, the Directory Tree is divided into two parts: <b>FilesTotal</b> and the <b>DirTree</b> entries. FilesTotal is a DWORD (4 bytes) indicating how many files are stored in the DAT, read in little-endian format. The DirTree entries follow, each with a variable length structure. All numeric entries are DWORDs unless specified otherwise. Entries are sorted alphabetically for case-insensitive binary search by the engine. All DWORDs are read in little-endian format.</p>

<table style="background: #222; margin: 10px;">
<thead>
<tr>
  <th style="width: 150px;">Name</th>
  <th style="width: 125px;">Type</th>
  <th>Description</th>
</tr>
</thead>
<tbody>
	<tr>
		<td>FilenameSize</td>
		<td>Dword</td>
		<td>Length of the ASCII filename. </td>
	</tr>
	<tr>
		<td>Filename</td>
		<td>String</td>
		<td>Path and name of the file, For example, "text\english\game\WORLDMP.MSG". The length of the Filename is <b>FilenameSize</b>.</td>
	</tr>
	<tr>
		<td>Type</td>
		<td>Byte</td>
		<td>1 = Compressed 0 = Decompressed</td>
	</tr>
	<tr>
		<td>RealSize </td>
		<td>Dword</td>
		<td>Size of the file without compression.</td>
	</tr>
	<tr>
		<td>PackedSize</td>
		<td>Dword</td>
		<td>Size of the compressed file.</td>
	</tr>
	<tr>
		<td>Offset</td>
		<td>Dword</td>
		<td>Offset of the entry's data, relative to the start of the archive data section (<code>physicalFileSize - DataSize</code>).</td>
	</tr>
</tbody>
</table>
<ul>
<li>Dword stands for 4 bytes/long integers 0xNN NN NN NN </li>
<li>Word stands for 2 bytes integers 0xNN NN </li>
<li>Byte stands for 1 byte integer 0xNN  </li>
<li>String stands for common string bytes "ABCDEF123456!@#$%/][\", etc. </li>
</ul>

<b>Declaration of a DirEntry (C)</b>
<pre style="background: #222; padding: 10px;">
struct DirEntry {
    uint32_t filenameSize;
    char     filename[filenameSize];
    uint8_t  type; // 1 for compressed
    uint32_t realSize;
    uint32_t packedSize;
    uint32_t offset;
};
</pre>

<b>How to find the Directory Tree</b>
<p>The Directory Tree is located at the end of the DAT data section, just before the 8-byte footer. To find it, read the last 8 bytes of the physical file: the first 4 bytes are the <b>TreeSize</b>, and the next 4 bytes are the <b>DataSize</b>. In normal DAT2 archives <code>DataSize</code> matches the total file size. The engine uses the physical file size when seeking, so the most robust formula is <code>physicalFileSize - TreeSize - 8</code>.</p>

<h4>Engine behavior</h4>
<ul>
<li>The engine opens only entries whose mode starts with <code>r</code>; DAT2 archives are read-only to the game.</li>
<li>Compressed entries use a 0x400-byte input buffer and zlib <code>inflate</code>. Seeking backwards in compressed entries rewinds and inflates forward to the requested uncompressed offset.</li>
<li>Text reads normalize CRLF to LF for both compressed and uncompressed DAT entries.</li>
</ul>

<h3>Credits</h3>
<p>DAT1 format reverse engineered by Shadowbird (gmail.com, account "shadowbird.lv").</p>
<p>DAT2 format reverse engineered by MatuX (matip@fibertel.com.ar).</p>

<h3>Tools</h3>
<a href="https://fodev.net/files/mirrors/teamx-utils/dat_explorer.rar">Dat Explorer 1.43</a><br/>
<a href="https://fodev.net/files/mirrors/teamx-utils/!_INDEX_en.html#dat">Various TeamX tools</a>

<h3>Source code</h3>
<a href="https://github.com/rotators/Fo1in2/blob/master/Tools/UndatUI/src/dat.cs">Fallout 1 DAT unpacking - C#</a><br/>
<a href="https://github.com/ghost2238/2055/tree/master/tools/undat">Fallout 1 DAT-file extractor by Abel - Pascal / ASM</a><br/>
<a href="https://github.com/rotators/tools/tree/master/DATLib">Fallout 2 DAT reading - C#</a><br/>
<a href="https://github.com/rotators/Fo1in2/blob/master/Tools/Packrat/src/dat.cs">Fallout 2 DAT creation - C#</a><br/>
<a href="https://github.com/berenm/game-data-reverse-engineering/tree/master/python">DAT unpacking for both Fallout 1 and 2 DATs - Python</a><br/>
<a href="https://github.com/falltergeist/dat-unpacker">Console utility for unpacking 1 &amp; 2 DATs - C++</a><br/>
<a href="https://github.com/alexbatalov/fallout2-ce/blob/main/src/dfile.cc">Fallout 2 Community Edition DAT2 reader - C++</a><br/>
<a href="https://github.com/alexbatalov/fallout2-ce/blob/main/src/dfile.h">Fallout 2 Community Edition DAT2 structures - C++</a>


<h2>History</h2>
<p>
2019-12-15 - Ported from <a href="https://falloutmods.fandom.com/wiki/DAT_file_format">https://falloutmods.fandom.com/wiki/DAT_file_format</a> by <a href="https://github.com/ghost2238">ghost</a><br/>
</p>

</div>
</body>
</html>