.wz archive

Omega · Post by **Omega** » Sat Jul 07, 2007 1:01 am

kolli wrote:I'm making large steps in extracting the file tree.

But I have one more question:

On some point (in extract73type() when iname is "UOL") in the Python script you use a function "computeName". and at this point for the output you have a string with the content "image*" and the cumputed name.
What type is this? Is this an image or a MP3 (because it seems only to appear in Sound.wz)?

edit: The whole structure of the *.img does not go into my head...
Is the structure of the imgs the same as the folder structure?
The folder structure is like: [all level0 folders, sorted alphabetically] [all level1 folders, s. alph.] [all level2 folders, s. alph.] And every subfolder as a byte that tells us, how many there are --> easy recursion.

Is this the same in the imgs for the PNGs, MP3s and other type (numbers, text, etc) files?
Are the imgs build up in a way for that we can use an easy recursion?

That compute name just goes back up the directory structure to figure out what the location of the image is so that it gets a unique name (or something like that) I don't think it's that necessary.

each .wz is divided into folders which contain .img files. Each .img contains multiple blocks. There are 3 "types" of blocks. Name blocks, value blocks, and content blocks (my terms...) name blocks = 0x00 and 0x01, content blocks = 0x79, 0x1b, and the rest are value blocks (I classify 0x09/08 as value even though they are really just containers for blocks). Essentially you can have multiple pictures within each .img. It's somewhat recursive, it's just that you go between three different functions.

kolli · Post by **kolli** » Sat Jul 07, 2007 1:59 pm

But the "pictures" with computeName thing are no pictures. I wanted to know what this is.
If you extract the Sound.wz, you have a lot of this stuff, for example:
1012000.1.Hit.image* 1012000/0/Hit
1012000.2.Hit.image* 1012000/0/Hit
1012000.3.Hit.image* 1012000/0/Hit

There are no pictures, only these strings.

And for the recursion:
Yes, it is an recursion, I just wanted to know if someone found a easier one. Or if someone could explain, how exactly the imgs are built up.

Is it like:
0001.img
- name1(0x00 or 0x01) with content1(0x73 or 0x1B)
- name2(0x00 or 0x01) with content2(0x73 or 0x1B)
- name3(0x00 or 0x01) with content3(0x08 or 0x09)
--subname1(0x00 or 0x01) with subcontent1(0x73 or 0x1B)
--subname2(0x00 or 0x01) with subcontent2(0x73 or 0x1B)
- name4(0x00 or 0x01) with content4(0x08 or 0x09)

Or did I mixed up (0x73 or 0x1B) with (0x08 or 0x09)?

And is a "block" (how you call it) already the file (like png or mp3)? Or is the "entryValue" the file?

Omega · Post by **Omega** » Sat Jul 07, 2007 5:19 pm

kolli wrote:But the "pictures" with computeName thing are no pictures. I wanted to know what this is.
If you extract the Sound.wz, you have a lot of this stuff, for example:
1012000.1.Hit.image* 1012000/0/Hit
1012000.2.Hit.image* 1012000/0/Hit
1012000.3.Hit.image* 1012000/0/Hit

There are no pictures, only these strings.

And for the recursion:
Yes, it is an recursion, I just wanted to know if someone found a easier one. Or if someone could explain, how exactly the imgs are built up.

Is it like:
0001.img
- name1(0x00 or 0x01) with content1(0x73 or 0x1B)
- name2(0x00 or 0x01) with content2(0x73 or 0x1B)
- name3(0x00 or 0x01) with content3(0x08 or 0x09)
--subname1(0x00 or 0x01) with subcontent1(0x73 or 0x1B)
--subname2(0x00 or 0x01) with subcontent2(0x73 or 0x1B)
- name4(0x00 or 0x01) with content4(0x08 or 0x09)

Or did I mixed up (0x73 or 0x1B) with (0x08 or 0x09)?

And is a "block" (how you call it) already the file (like png or mp3)? Or is the "entryValue" the file?

Those are definately not in sound.wz....Those look more like reactor.wz.

I haven't looked if there was a set order in the files, though there probably is. Some blocks are "interchangeable". For example, the only entry in a 0x08 will always be a "Name" block (so 0x00 or 0x01). Other locations will have either a 0x1b or 0x73. The third location will either have either 0x00, 0x02, 0x03, 0x04, 0x05, 0x08, or 0x09.

All files do have a "standard" structure. Each will first have a short header (check the py file for what that is) then a number telling how many subblocks. Each of these subblocks has a name and then info block (0x02 - 0x09 group). Each of those info blocks can then have subblocks and you have to figure out what they are by reading the first byte of the block.

Do you have a specific .img you need help with?

kolli · Post by **kolli** » Sat Jul 07, 2007 5:34 pm

Thank you. I don't have problems with a specific img file. I just want to figure out the general structure. Because the python code looks very complex (even if it only is a rather simple recursion). I try to make a simplier one. Because I don't think that the inventor of the file structure made the structure complex on purpose (the MapleStory client must be able to access the wz files fast, so there must be a simplier way).

But I think I can work with your explaination for the time being. (but any further information is still welcome

)

edit, I looked it up: These image* are in sound.wz\reactor.img

edit 2: does each subblock also have a name?

Omega · Post by **Omega** » Sat Jul 07, 2007 7:22 pm

kolli wrote:Thank you. I don't have problems with a specific img file. I just want to figure out the general structure. Because the python code looks very complex (even if it only is a rather simple recursion). I try to make a simplier one. Because I don't think that the inventor of the file structure made the structure complex on purpose (the MapleStory client must be able to access the wz files fast, so there must be a simplier way).

But I think I can work with your explaination for the time being. (but any further information is still welcome )

edit, I looked it up: These image* are in sound.wz\reactor.img

edit 2: does each subblock also have a name?

Oh, I see which you are talking about. Didn't realize that code was used elseewhere.

UOL code means that the actual data is stored elsewhere. For example in 1012000, it has 0, 1, 2, 3, 4
0/Hit contains an MP3
1 to 3/Hit contains a "link"
They each contain the text "../0/Hit" meaning that they use the same mp3 as 0/Hit.

Most subblocks have names, there are some which don't, though I can't remember where exactly.

kolli · Post by **kolli** » Sun Jul 08, 2007 11:08 am

Ah, I see.

Now I have a question about the data types you are using:
Some numbers you read as signed 32 bit integers, some as unsigned. What tells you what type they are? A signed integer could have the adress up to 2 Gigabyte, but none of the wz files goes even over one Gigabyte.

I only use 32 bit signed integers to store the numbers (when it comes to values with 64 bit floats, I convert them directly to string, but for adressing positiosn in the file should signed 32 integer be enough).

edit: one more question:
Did someone already found out what some (or all) of the "unknown" bytes are used for?
Could the two bytes after the header be saying how much "type 3" directorys there are (in all wz files together)?

edit2: Does someone has Europe MapleStory? Do they already have the splited wz files or do they still have the Data.wz?

edit3: I just noticed that in the Python script: in extract00type you read the size. Isn't that a packnum? At least it seems that the beginning of the function does the same as packnum.

Omega · Post by **Omega** » Tue Jul 10, 2007 5:12 am

kolli wrote:Ah, I see.

Now I have a question about the data types you are using:
Some numbers you read as signed 32 bit integers, some as unsigned. What tells you what type they are? A signed integer could have the adress up to 2 Gigabyte, but none of the wz files goes even over one Gigabyte.

I only use 32 bit signed integers to store the numbers (when it comes to values with 64 bit floats, I convert them directly to string, but for adressing positiosn in the file should signed 32 integer be enough).

edit: one more question:
Did someone already found out what some (or all) of the "unknown" bytes are used for?
Could the two bytes after the header be saying how much "type 3" directorys there are (in all wz files together)?

edit2: Does someone has Europe MapleStory? Do they already have the splited wz files or do they still have the Data.wz?

edit3: I just noticed that in the Python script: in extract00type you read the size. Isn't that a packnum? At least it seems that the beginning of the function does the same as packnum.

The choice of data type was mostly arbitrary, whatever worked. Some of them I'm not too sure about (for example 0x02 and 0x04) but if the numbers come out right, it must be right (I haven't verified)

I highly doubt it's the number of directories, since that number seemed to change a lot between versions.

No idea about EMS, it's been a while.

They look pretty close so you might be right. I'd suggest you try it out and see what happens. The only part that seems to mess with it is the extra check if it's positive or negative for unicode strings.

kolli · Post by **kolli** » Tue Jul 10, 2007 10:39 am

EMS has already the splitted wz files, I found out. I was hoping to find a Data.wz to have the pieces put all together (makes the analysing much more easier).

I thought about the packnum/00type again. The first part is a packnum, it works. But if it comes to the unicode part it does not work (I think it is because the whole function had to be changed, what I didn't tried until now).
But as we are now on the 00type: I still don't fully understand why you store some strings in this "previousStrings" array. Are these again something like you called a "link"? It seems these things only exist in the file tree (only do this if parsingFileTree is true)?

edit6 (I deleted edits 1 to 5 because the ideas were wrong):
Does it ever occur that the negative value of packnum is used?
I mean, when the byte is -128 a 32bit integer is read. But what if it is smaller than 0 and larger than -128? Are these numbers used anywhere? In extract00type they are made positive (size = -size) and all other must be positive (because there are no images with a "negative size")

I'll try something out in this direction. Because I am still certain that the wz files are build up with "Occam's razor" in mind (if both of two ideas work, the easier one is the correct one).

edit7: No, as far I can see this, this never occurs, that a value between -128 and 0 (excluding these two) comes up. I'll try to change the packnum function so that it only gives out either positive signed bytes ( >= 0 < 128 ) or the 32 bit signed integer.
Maybe this solves the MP3 problem where some MP3s aren't read. Because there I see the only part where negative values of packnum could infect the output because the size of a MP3 could be negative with the current python code. I'll try that.

edit8:
I tried to change the extract00type. But the longer I think about it the harder it gets...
If the first size in extract00type is a packnum nothing changes. That's all because of the Unicode part with the extra if clauses...
I think, we have to find a way around the previousStrings thing (what I still did not fully understand until now) and then we can just ask if it's unicode or not (size = packnum, if size == 127 read Unicode, else read normal string).

Omega · Post by **Omega** » Wed Jul 11, 2007 3:35 am

kolli wrote:EMS has already the splitted wz files, I found out. I was hoping to find a Data.wz to have the pieces put all together (makes the analysing much more easier).

I thought about the packnum/00type again. The first part is a packnum, it works. But if it comes to the unicode part it does not work (I think it is because the whole function had to be changed, what I didn't tried until now).
But as we are now on the 00type: I still don't fully understand why you store some strings in this "previousStrings" array. Are these again something like you called a "link"? It seems these things only exist in the file tree (only do this if parsingFileTree is true)?

edit6 (I deleted edits 1 to 5 because the ideas were wrong):
Does it ever occur that the negative value of packnum is used?
I mean, when the byte is -128 a 32bit integer is read. But what if it is smaller than 0 and larger than -128? Are these numbers used anywhere? In extract00type they are made positive (size = -size) and all other must be positive (because there are no images with a "negative size")

I'll try something out in this direction. Because I am still certain that the wz files are build up with "Occam's razor" in mind (if both of two ideas work, the easier one is the correct one).

edit7: No, as far I can see this, this never occurs, that a value between -128 and 0 (excluding these two) comes up. I'll try to change the packnum function so that it only gives out either positive signed bytes ( >= 0 < 128 ) or the 32 bit signed integer.
Maybe this solves the MP3 problem where some MP3s aren't read. Because there I see the only part where negative values of packnum could infect the output because the size of a MP3 could be negative with the current python code. I'll try that.

edit8:
I tried to change the extract00type. But the longer I think about it the harder it gets...
If the first size in extract00type is a packnum nothing changes. That's all because of the Unicode part with the extra if clauses...
I think, we have to find a way around the previousStrings thing (what I still did not fully understand until now) and then we can just ask if it's unicode or not (size = packnum, if size == 127 read Unicode, else read normal string).

I think the whole previousStrings code is completely useless. IIRC lambda was trying to store all the previous strings since they are sometimes reused, however it's easier just to get them directly.

semipseudocode:

Code: Select all

if 0x00 - read string
if 0x01 - read u32 (actual string location)
    save current location
    seek to actual string location (don't remember if it's relative or absolute)
    read the string
    seek back to saved location

Code: Select all

size = s8
if <= 0 
  if - 128 read u32
  if <0 size = -size
  read size bytes and decode it
if >0
  if 127 read u32
  read size bytes and decode

and decode is just the basic xor thing

kolli · Post by **kolli** » Wed Jul 11, 2007 9:31 am

If I change the prevString code so that it just returns what it has read from the file (and not from the saved array) it finds a 01072279.img in Weapon and in Shoes in Character.wz.
But opposed to your example code at this point I'm not seeking in the file, just reading sequentially.

With the prevstring code, this img is only in shoes.

The only other wz file where this code is used is Map.wz. But I did not check what it finds/finds not in there.

It seems that the whole prevstring code is not really working properly. I'll try to remove it and just read from the file directly.

edit1: At some earlier point in this thread it was said that strings are always smaller than 256 characters. Was this proven false? Or was this only for the directory tree? Because if all strings are shorter than 256 characters, all the checks followed by 4 byte size reading if they have a specific value are useless (what would lead to a much simplier function).

edit2: also in extract00type, there is this if(size == 1). What is that for? Does strings with only one char exist in the wz files? Does it check if it's unicode or not?

I am currently trying to reduce the extract00type to something like this:

Code: Select all

size = packnum
transStr(read(size))

But I am not an Unicode expert. Is there anything to check if a string is Unicode or not? Because this must be clear, if it's Unicode we have to read(size*2), if it's only ASCII string we have to read(size).

And what about the subsize? If I see this right that sequence is like
[0xXX 0x80 byte,byte,byte,byte] in the longest case, but is always followed by a string.

edit3: I made some researching on string storing:
There is this class called BinaryReader. Don't know if this is part of the WinAPI or if this is just randomly in the common libraries of the languages, but many languages have it.
This class has a method "ReadString()". It reads a sequence of bytes and returns a String. It has no parameters. The length of the String is stored in "7-Bit bytes" before the actual string. Like in the wz files.
If the first Bit of the byte is set (0x80), the next byte is read, too. That goes on until the first bit of a byte is 0.
Then all of the read bytes are added --> you have the size. That's even better than our "packnum".
If Wizet used this method for the strings in wz files it's actually pretty easy to read the strings (no "if if if else if if if else if else" stuff anymore).

And if the Unicode is UTF-8 we also solved the problem with "size*2".

I'll try around with that a bit. If it works this will be a large step in reading the strings. Then the functions transStr, tranStr16, extract00type, extract00typeAt, rUStr and rUStrAt could be combined to a small function called "readStr".

edit4: I tried it --> didn't worked. But I think Wizet used a similar encoding. The if(size = 127) makes me mad... Why is it suddenly 127 and not 128 anymore? I don't get Wizets ideas...
On the one hand they tried to store values as compact as possible (packnum), but on the other hand they build in these unlogical (at least to me) things...

Omega · Post by **Omega** » Thu Jul 12, 2007 2:22 am

kolli wrote:If I change the prevString code so that it just returns what it has read from the file (and not from the saved array) it finds a 01072279.img in Weapon and in Shoes in Character.wz.
But opposed to your example code at this point I'm not seeking in the file, just reading sequentially.

With the prevstring code, this img is only in shoes.

The only other wz file where this code is used is Map.wz. But I did not check what it finds/finds not in there.

It seems that the whole prevstring code is not really working properly. I'll try to remove it and just read from the file directly.

edit1: At some earlier point in this thread it was said that strings are always smaller than 256 characters. Was this proven false? Or was this only for the directory tree? Because if all strings are shorter than 256 characters, all the checks followed by 4 byte size reading if they have a specific value are useless (what would lead to a much simplier function).

edit2: also in extract00type, there is this if(size == 1). What is that for? Does strings with only one char exist in the wz files? Does it check if it's unicode or not?

I am currently trying to reduce the extract00type to something like this:
Code: Select all
size = packnum
transStr(read(size))
But I am not an Unicode expert. Is there anything to check if a string is Unicode or not? Because this must be clear, if it's Unicode we have to read(size*2), if it's only ASCII string we have to read(size).

And what about the subsize? If I see this right that sequence is like
[0xXX 0x80 byte,byte,byte,byte] in the longest case, but is always followed by a string.

edit3: I made some researching on string storing:
There is this class called BinaryReader. Don't know if this is part of the WinAPI or if this is just randomly in the common libraries of the languages, but many languages have it.
This class has a method "ReadString()". It reads a sequence of bytes and returns a String. It has no parameters. The length of the String is stored in "7-Bit bytes" before the actual string. Like in the wz files.
If the first Bit of the byte is set (0x80), the next byte is read, too. That goes on until the first bit of a byte is 0.
Then all of the read bytes are added --> you have the size. That's even better than our "packnum".
If Wizet used this method for the strings in wz files it's actually pretty easy to read the strings (no "if if if else if if if else if else" stuff anymore).

And if the Unicode is UTF-8 we also solved the problem with "size*2".

I'll try around with that a bit. If it works this will be a large step in reading the strings. Then the functions transStr, tranStr16, extract00type, extract00typeAt, rUStr and rUStrAt could be combined to a small function called "readStr".

edit4: I tried it --> didn't worked. But I think Wizet used a similar encoding. The if(size = 127) makes me mad... Why is it suddenly 127 and not 128 anymore? I don't get Wizets ideas...
On the one hand they tried to store values as compact as possible (packnum), but on the other hand they build in these unlogical (at least to me) things...

All the previousstring stuff is rubbish and not needed.

Not too sure what your example was saying, you'll have to be more specific.

Dunno about that <256 length thing, I think they can be as long as they need to be.

Add a message for that size==1 thing and see if it's ever run. I don't think I have it.

I'll have to look into that binary reader thing, sounds cool.

Except for that size==127, there is no way to tell the difference. Unicode takes two bytes per character, regular takes 1 byte per character.

Just to note that -128=0x80, 127=0x79, they're only really wasting 1 bit by doing things this way.

kolli · Post by **kolli** » Thu Jul 12, 2007 11:51 am

I try now to go through the files by hand. Maybe I make a new observation that helps to better the understanding of the files.

And I already made a step:
The first unknown value in the DirectoryTree of an img files: This is the Checksum-32:

Code: Select all

v = {
     "Name": baseName + "/" + filename,
     "Size": filesize,
     "Unk1": unknown1,  #### This is a Checksum-32
     "Unk2": unknown2
}

I tried this in a hex editor that can calculate checksums. I copied the whole img files and stored it in a new file and let the hex editor calculate the Checksum-32 of it. It was the same as "Unk1".

"type-3" directorys have no checksum (one byte with 0x00).

Now I go for the second unknown value.

edit: I don't know if this is important, but the second value seems alwys to be dividable by 8 (I picked some random ones). I tried this because I thought that this could also be a number of bytes or bits.
But it could just be random.
Another idea was, that Unk2 is not a 32bit value, but maybe 4 single bytes or two words or a [byte word byte] sequence. But I could not refer any value of these 4 bytes to any property of the img file.

Maybe I'll put a new version of the dump.py up, when it's tested enough with the new results (renaming, removing the prevstring part, these small things). But I am testing everything in my Delphi application (what is mainly a copy of large parts of the Python script), that's why I almost never touched the script for testings until now.

And I'll try to rewrite the whole extract00type from the beginning. Maybe I find a shorter version (shorter = faster and means less bugs possible)

edit2:
Ah.. now I understand what all the prevstring and subsize thing was for. The "link" (extract00typeAt) points to another, already existing file, because it has the same name (but other content). And the subsize reads the size from the name of this file again.
I think with this knowledge I can rewrite all the string functions.

edit3:
So, here's a shorter code for the strings. But I am still not totally happy with that. But at least it's a lot cleaner than before. I tried it on String.wz (what should be the "ultimate" test fpr a string function

)

Code: Select all

def extract00typeAt(f, loc, baseOffset):
  pos = f.tell()
  f.seek(loc + baseOffset, 0)
  rU8(f)
  value = extract00type(f)
  f.seek(pos)
  return value

def extract00type(f):
  size = rS8(f)
  if(size == 0):
    return ""

  if(size < 0):
    if(size == -128): size = rS32(f)
    else:             size = -size
    return transStr(f.read(size)) 
  
  if(size > 0):
    if(size == 127): size = rS32(f)
    return transStr16(f.read(size*2))

The size==1 thing isn't needed. If it's Unicode always 2 byte have to be read (what the function does). With the size==1 check the function either does reading two bytes and return the character or does reading the two bytes twice and return the character or the string.
So this could be removed compeltly.
My idea that it could be UTF-8 seemed false, by the way. It seems only to be either ASCII or the Basic Multilingual Plane of Unicode (what HTML uses, too) with the first ~65.000 characters (that also include the Korean Hanguel characters). So this seems fine.

But besides that, I tried also some things with the two unknown bytes directly after the header. But found out nothing until now.
But I can say that it's no 16 bit checksum or something like that, I tried a lot of combinations with that. And then all wz files should have different two bytes (what they don't have).
My idea on that two bytes is, that they are an "internal" version check, so that the client recognizes, if the wz files are the correct ones for the Client version.

edit4:
If someone is interested in the checksum calculation:
the mathematical form goes like that:
Ckecksum = Sum(byte_0..byte_EOF) modulo 2^32

In words: Sum up all bytes, devide that by 2^32 and take the modulo.

Omega · Post by **Omega** » Fri Jul 13, 2007 2:42 am

kolli wrote:I try now to go through the files by hand. Maybe I make a new observation that helps to better the understanding of the files.

And I already made a step:
The first unknown value in the DirectoryTree of an img files: This is the Checksum-32:
Code: Select all
v = {
     "Name": baseName + "/" + filename,
     "Size": filesize,
     "Unk1": unknown1,  #### This is a Checksum-32
     "Unk2": unknown2
}
I tried this in a hex editor that can calculate checksums. I copied the whole img files and stored it in a new file and let the hex editor calculate the Checksum-32 of it. It was the same as "Unk1".

"type-3" directorys have no checksum (one byte with 0x00).

Now I go for the second unknown value.

edit: I don't know if this is important, but the second value seems alwys to be dividable by 8 (I picked some random ones). I tried this because I thought that this could also be a number of bytes or bits.
But it could just be random.
Another idea was, that Unk2 is not a 32bit value, but maybe 4 single bytes or two words or a [byte word byte] sequence. But I could not refer any value of these 4 bytes to any property of the img file.

Maybe I'll put a new version of the dump.py up, when it's tested enough with the new results (renaming, removing the prevstring part, these small things). But I am testing everything in my Delphi application (what is mainly a copy of large parts of the Python script), that's why I almost never touched the script for testings until now.

And I'll try to rewrite the whole extract00type from the beginning. Maybe I find a shorter version (shorter = faster and means less bugs possible)

edit2:
Ah.. now I understand what all the prevstring and subsize thing was for. The "link" (extract00typeAt) points to another, already existing file, because it has the same name (but other content). And the subsize reads the size from the name of this file again.
I think with this knowledge I can rewrite all the string functions.

edit3:
So, here's a shorter code for the strings. But I am still not totally happy with that. But at least it's a lot cleaner than before. I tried it on String.wz (what should be the "ultimate" test fpr a string function )
Code: Select all
def extract00typeAt(f, loc, baseOffset):
  pos = f.tell()
  f.seek(loc + baseOffset, 0)
  rU8(f)
  value = extract00type(f)
  f.seek(pos)
  return value

def extract00type(f):
  size = rS8(f)
  if(size == 0):
    return ""

  if(size < 0):
    if(size == -128): size = rS32(f)
    else:             size = -size
    return transStr(f.read(size)) 
  
  if(size > 0):
    if(size == 127): size = rS32(f)
    return transStr16(f.read(size*2))
The size==1 thing isn't needed. If it's Unicode always 2 byte have to be read (what the function does). With the size==1 check the function either does reading two bytes and return the character or does reading the two bytes twice and return the character or the string.
So this could be removed compeltly.
My idea that it could be UTF-8 seemed false, by the way. It seems only to be either ASCII or the Basic Multilingual Plane of Unicode (what HTML uses, too) with the first ~65.000 characters (that also include the Korean Hanguel characters). So this seems fine.

But besides that, I tried also some things with the two unknown bytes directly after the header. But found out nothing until now.
But I can say that it's no 16 bit checksum or something like that, I tried a lot of combinations with that. And then all wz files should have different two bytes (what they don't have).
My idea on that two bytes is, that they are an "internal" version check, so that the client recognizes, if the wz files are the correct ones for the Client version.

edit4:
If someone is interested in the checksum calculation:
the mathematical form goes like that:
Ckecksum = Sum(byte_0..byte_EOF) modulo 2^32

In words: Sum up all bytes, devide that by 2^32 and take the modulo.

Wow, nice work. I knew it was a checksum, but good work figuring out the actual algorithm. So, the function to calculate checksums would to just keep on adding and keep only the lower 32 bits. Why couldn't they just use crc32 >.< (which I tried).

The 2nd unk is really weird. I think it's some kind of location because all the numbers changed when there was a new patch (unlike a checksum which wouldn't have changed unless the file was). It's the only logical thing that is remaining to include.

For the two header bytes, The version was lambda and my guess as well.

kolli · Post by **kolli** » Fri Jul 13, 2007 8:02 am

So, here is the new version with the updated names and the easy string extrcation.

The new function includes all the stuff that is needed to read a string (only the transStr is seperarted, maybe I include that later).
The parameters of the function are:
the file "f"
the integer "offset" (contains the offset from the parent block, if it's a link)
the bool link (must be true, when it's a link to an existing filename)
the bool pft (must be true, when called from extractDirectories)

Code: Select all

def rStr(f, offset = 0, link = 0, pft = 0):
  if(link):
    loc = rS32(f)
    pos = f.tell()
    f.seek(loc + offset)
    if(pft): flag = rU8()
  size = rS8(f)
  if(size == 0): result = ""
  if(size > 0):
    if(size == 127): size = rS32(f)
    result = transStr16(f.read(size*2))
  if(size < 0):
    if(size == -128): size = rS32(f)
    else:             size = -size
    result = transStr(f.read(size))
  if(link): f.seek(pos)
  return result

Now, finally for some new stuff.

edit1:
I came to a really logical idea, what the unk2 could be. But unfortunatly I yet do not know how/if it's encrypted.
What properties do files normally have (besides the content)? Correct: a name, a size, a checksum (that's all there) and what else? A date! My only problem is now, that I couldn't figure out it's format. It's no Unix timestamp.
I'll try around with that a bit, because this seems the only logical explanation for these bytes..

edit2:
I wrote a testing routine that saves all the infos about the imgs to txt files.
Currently the last unk is displayed as a unsigned 32 bit integer, but that's not sure. It could also be a signed 32bit integer or something completly different.
I uploded a pack of txt files that contain all the img headers from 0.39 from all wz files.
Ah, you only can post 1 attachment... O included the txts in the old attachment.

edit3:
I checked around a bit with these numbers.... I tried also to display them as signed 32 bit integers --> some of them get negative.
My first idea, that they always can be devided by 8, is false. There were some, that can't.
But still no progress, what they could mean. I still believe, it's some timestamp.

Omega · Post by **Omega** » Sat Jul 14, 2007 1:02 am

I highly doubt it's a date. A while back I had saved all the unk1 and unk2s, and then compared them to the after-patch values, and monitored which files changed. The files which didn't change, unk1 stayed the same; the files which did change, unk1 changed as well. The conclusion was that it was some kind of checksum (though it could also have been the date, should have thought of that). unk2 on the other hand changed for every file, even those that were unpatched. This is most likely not a date. As you were going based on logic earlier, the date is completely useless from a gameplay perspective. The name is required to reference other parts, the size is required to know what to read, the checksum is required for file integrity. The only remaining thing that could be of use is a direct pointer to where the .img file is in the bigger data file. I highly doubt they read through the data files sequentially.

XeNTaX

XeNTaX

.wz archive

asdf