I try now to go through the files by hand. Maybe I make a new observation that helps to better the understanding of the files.
And I already made a step:
The first unknown value in the DirectoryTree of an img files: This is the Checksum-32:
Code: Select all
v = {
"Name": baseName + "/" + filename,
"Size": filesize,
"Unk1": unknown1, #### This is a Checksum-32
"Unk2": unknown2
}
I tried this in a hex editor that can calculate checksums. I copied the whole img files and stored it in a new file and let the hex editor calculate the Checksum-32 of it. It was the same as "Unk1".
"type-3" directorys have no checksum (one byte with 0x00).
Now I go for the second unknown value.
edit: I don't know if this is important, but the second value seems alwys to be dividable by 8 (I picked some random ones). I tried this because I thought that this could also be a number of bytes or bits.
But it could just be random.
Another idea was, that Unk2 is not a 32bit value, but maybe 4 single bytes or two words or a [byte word byte] sequence. But I could not refer any value of these 4 bytes to any property of the img file.
Maybe I'll put a new version of the dump.py up, when it's tested enough with the new results (renaming, removing the prevstring part, these small things). But I am testing everything in my Delphi application (what is mainly a copy of large parts of the Python script), that's why I almost never touched the script for testings until now.
And I'll try to rewrite the whole extract00type from the beginning. Maybe I find a shorter version (shorter = faster and means less bugs possible)
edit2:
Ah.. now I understand what all the prevstring and subsize thing was for. The "link" (extract00typeAt) points to another, already existing file, because it has the same name (but other content). And the subsize reads the size from the name of this file again.
I think with this knowledge I can rewrite all the string functions.
edit3:
So, here's a shorter code for the strings. But I am still not totally happy with that. But at least it's a lot cleaner than before. I tried it on String.wz (what should be the "ultimate" test fpr a string function
)
Code: Select all
def extract00typeAt(f, loc, baseOffset):
pos = f.tell()
f.seek(loc + baseOffset, 0)
rU8(f)
value = extract00type(f)
f.seek(pos)
return value
def extract00type(f):
size = rS8(f)
if(size == 0):
return ""
if(size < 0):
if(size == -128): size = rS32(f)
else: size = -size
return transStr(f.read(size))
if(size > 0):
if(size == 127): size = rS32(f)
return transStr16(f.read(size*2))
The size==1 thing isn't needed. If it's Unicode always 2 byte have to be read (what the function does). With the size==1 check the function either does reading two bytes and return the character or does reading the two bytes twice and return the character or the string.
So this could be removed compeltly.
My idea that it could be UTF-8 seemed false, by the way. It seems only to be either ASCII or the Basic Multilingual Plane of Unicode (what HTML uses, too) with the first ~65.000 characters (that also include the Korean Hanguel characters). So this seems fine.
But besides that, I tried also some things with the two unknown bytes directly after the header. But found out nothing until now.
But I can say that it's no 16 bit checksum or something like that, I tried a lot of combinations with that. And then all wz files should have different two bytes (what they don't have).
My idea on that two bytes is, that they are an "internal" version check, so that the client recognizes, if the wz files are the correct ones for the Client version.
edit4:
If someone is interested in the checksum calculation:
the mathematical form goes like that:
Ckecksum = Sum(byte_0..byte_EOF) modulo 2^32
In words: Sum up all bytes, devide that by 2^32 and take the modulo.