It appears that .aud file is nothing different than an .ogg stream with a header.
This is what I did:
I converted .vox to .ogg with telltale speech extractor.
After that, I get a headerless .ogg file. Now I copied the header from original .aud file to the converted .ogg file and I renamed the file to .aud.
This works so far and you hear german speech in game, if you put the modified .aud file in game folder.
But the problem now is, that I can't hear the last word of the spoken sentence, because the wrong duration of the audio file is played.
Instead 12,980 seconds of the edited .aud stream, you hear always that duration of 11,187 seconds from the original .aud stream...
This is the header of the original .aud stream file:
EDIT: The duration is stored in the granulpos field of the last page in the vorbis stream.
The duration is stored in samples and not in milliseconds/seconds. But if you divide the samples by the samplerate, you get the duration in seconds.
The original english ogg file consists of 493348 samples. The value is stored at offset 15743. The samplerate of the ogg file is 44100Hz.
493348 divided by 44100 equals 11,187 seconds.
Same with german .aud file. Duration consists of 572418 samples. The value is stored at offset 15afc and samplerate is also 44100Hz.
572418 divided by 44100 equals 12,980 seconds.
So as I said, if I now add the header from original aud file to german ogg file and I rename it to aud and put it in game pack folder, the game plays the edited audio file always with a duration of 493348 samples / 11,187 seconds, though in edited aud file the duration contains 572418 samples / 12.980 seconds.
I don't know if any other file is missing that tells the game to play the right duration of this audio file or if I have to edit/modify the header in a way, that the right duration of 12,980 seconds is beeing recognized by the game correctly...