The next 16 bits after the header are unknown at the moment. In the version of data.wz that I have, it's 0x0061.
After that, we begin the directory blocks. In my version there are 45 directories, and I had to fix my extractor to use that value. The program could be smarter and count the number of subdirectories and stop once it no longer finds them, or I could figure out the encoding used for file/directory offsets. If anyone else figures it out, I'd be glad to know.
The first value is the count of how many files are in the directory. Wizet tries to compactly store sizes. If the size is less then 128, it just gives you the size as a single byte. If it's >= then 128, it flags the value as a large value (hex code 0x80), and the next 4 bytes make up the actual count/size.
Code: Select all
def rPackNum(f):
size = rU8(f)
if(size == 0x80): size = rU32(f)
return size
Once you know the number of files, start iterating over all of the entries. Each entry has a: Identifier byte, filename (or link), filesize, and then two unknown values.
The identifier byte can be one of the following three values: 0x04 for files, 0x03 for directories, and 0x02 for symbolic links.
After the identifer, comes the filename. The filenames and strings in the data.wz are tricky because wizet "encrypts" them. (Look! Scare quotes! I'm shaking.)
The decryption key begins with: 0xAA (0101 0101). If the value I'm encrypting is 0110 0111, and the key is 0101 0101, then the key will flip any bits where there is a 1 in the key; where there is a 0 in the key, the bit. So with our current key, we flip bits 1,3,5, and 7, for the final value of: 0010 0010. The decryption process works the same way.
After each character, the decryption key gets incremented by 1 (from 0xAA to 0xAB to 0xAC, etc...
The above complicated description boils down to the following 9 lines of code:
Code: Select all
def transStr(str):
s = ''
p = 0xAA
for cc in str:
a = ord(cc)
val = ((a & ~p) ^ (~a & p)) & 0xff
s += struct.pack("B", val )
p += 1
return s
Strings are guarenteed to be less then 256 characters.
Before each string is the negation/not of the string length. So if the string length is 3, the value read in will actually be -3. Since strings are less then 256 characters, the length is just a signed byte.
This leads to the following 3 lines of code:
Code: Select all
def rUStr(f):
size = rU8(f)
return transStr(f.read(256 - size))
Next is the filesize, and unknown1 (both PackNums), and then a 32 bit something for unknown2.
Symbolic links are the only oddities. Instead of a string, it gives a 32 bit link to the file that it is pointing to.
Put it all together, and you get:
Code: Select all
import struct
from sys import exit
def dumpFileLoc(f): print "<tr><td>FileLocation</td><td>%d</td></tr>" % f.tell()
def dumpFileSize(f): print "<tr><td>FileSize</td><td>%d</td></tr>" % getFileSize(f)
def check(flag, val, f):
if(flag != val):
print "<tr><td>Unknown: %s</td><td>Expecting: %s</td></tr>" % (repr(flag), repr(val))
dumpFileLoc(f)
exit(-1)
def checkA(flag, vals, f):
if(not flag in vals):
print "<tr><td>Unknown: %s</td><td>Expecting: %s</td></tr>" % (repr(flag), repr(vals))
dumpFileLoc(f)
exit(-1)
def rU8 (f): return struct.unpack('B', f.read(1))[0]
def rU16(f): return struct.unpack('H', f.read(2))[0]
def rU32(f): return struct.unpack('I', f.read(4))[0]
def rU64(f): return struct.unpack('Q', f.read(8))[0]
def rF32(f): return struct.unpack('f', f.read(4))[0]
def rPackNum(f):
size = rU8(f)
if(size == 0x80): size = rU32(f)
return size
def transStr(str):
s = ''
p = 0xAA
for cc in str:
a = ord(cc)
val = ((a & ~p) ^ (~a & p)) & 0xff
s += struct.pack("B", val )
p += 1
return s
def rUStr(f):
size = rU8(f)
return transStr(f.read(256 - size))
def dumpHeader(f):
ident = f.read(4)
size = rU64(f)
offset = rU32(f)
copy = f.read(offset - f.tell())
print "<tr><td>Header.FileIdentifier</td><td>%s</td></tr>" % ident
print "<tr><td>Header.DataSize</td><td>%d</td></tr>" % size
print "<tr><td>Header.DataStartOffset</td><td>0x%08x</td></tr>" % offset
print "<tr><td>Header.Copyright</td><td>%s</td></tr>" % copy
def dumpDirInfo(f, sname):
filename = rUStr(f)
filesize = rPackNum(f)
unknown1 = rPackNum(f)
unknown2 = rU32(f)
print "<tr><td>%s.DirName</td><td>%s</td></tr>" % (sname, filename)
print "<tr><td>%s.DirSize</td><td>%d</td></tr>" % (sname, filesize)
print "<tr><td>%s.Unknown1</td><td>0x%08x</td></tr>" % (sname, unknown1)
print "<tr><td>%s.Unknown2</td><td>0x%08x</td></tr>" % (sname, unknown2)
def dumpLinkInfo(f, sname):
fileloc = rU32(f)
pos = f.tell()
f.seek(fileloc + 0x3c + 1, 0)
filename = rUStr(f)
f.seek(pos)
filesize = rPackNum(f)
unknown1 = rPackNum(f)
unknown2 = rU32(f)
print "<tr><td>%s.LinkName (from %08x)</td><td>%s</td></tr>" % (sname, fileloc + 0x3c, filename)
print "<tr><td>%s.LinkSize</td><td>%d</td></tr>" % (sname, filesize)
print "<tr><td>%s.Unknown1</td><td>0x%08x</td></tr>" % (sname, unknown1)
print "<tr><td>%s.Unknown2</td><td>0x%08x</td></tr>" % (sname, unknown2)
def dumpFileInfo(f, sname):
filename = rUStr(f)
filesize = rPackNum(f)
unknown1 = rPackNum(f)
unknown2 = rU32(f)
print "<tr><td>%s.FileName</td><td>%s</td></tr>" % (sname, filename)
print "<tr><td>%s.FileSize</td><td>%d</td></tr>" % (sname, filesize)
print "<tr><td>%s.Unknown1</td><td>0x%08x</td></tr>" % (sname, unknown1)
print "<tr><td>%s.Unknown2</td><td>0x%08x</td></tr>" % (sname, unknown2)
def dumpDirTree(f, sname):
fileCount = rPackNum(f)
for i in range(fileCount):
fileType = rU8(f)
if (fileType == 0x04): dumpFileInfo(f, sname + "[" + str(i) + "]")
elif(fileType == 0x03): dumpDirInfo(f, sname + "[" + str(i) + "]")
elif(fileType == 0x02): dumpLinkInfo(f, sname + "[" + str(i) + "]")
else: exit(-1)
def dumpDirTrees(f):
unknown = rU16(f)
print "<tr><td>Unknown16</td><td>0x%04x</td></tr>" % unknown
for i in range(45):
dumpDirTree(f, "Dir[" + str(i) + "]")
def dump():
f = open("data.wz", "rb")
print "<html>"
print " <head>"
print " <title>data.wz File Dump</title>"
print " </head>"
print " <body>"
print " <table>"
dumpHeader(f)
dumpDirTrees(f)
dumpFileLoc(f)
print " </table>"
print " </body>"
print "</html>"
dump()
The basic encryption that they used took me a long time to figure out. But it was a fun puzzle.