Face animations (lipsyncing)

vfig on 19/12/2023 at 22:57

This is a repost from my now-defunct blog, so that the information is still available:

While working on my upcoming Thief mission, I noticed an intriguing property in Dromed: Renderer > Face Textures

Inline Image:

https://i.imgur.com/VKxCDU6.png

Smiling, wincing, and stunned faces? I had my surprised face on at any rate. Thief never changed NPC’s faces, did it? If I added this property to an object, the game crashed. Ah, obviously this is dead code relating to a feature cut long before the game shipped.

Things got more interesting though: when I added this property to an object, it also dropped in this related state property: Renderer > Face State

Inline Image:

https://i.imgur.com/eZxjO44.png

What is this “talking” bit about? Well this made me remember an oddity in the list of Dromed commands that I’d half noticed earlier, but not really paid attention to:

Code:

face_process : find textures to match speech samples.

…textures to match speech samples. Does this mean Thief at one point had–not just NPC faces changing to match their emotional state, but also animated mouths when they talked? That’d be a big deal. Valve had been making a big deal of their feature in Half-Life that animated an NPC’s mouth in real time to match their speech. Huh. I did a little more poking around in Dromed and found another property that seemed related: Sound > Face Motions

Inline Image:

https://i.imgur.com/aL5rnAD.png

I was well and truly interested now, and decided to do some digging. First point of call was the Thief 2 source code, and it didn’t take long to find the corresponding code:

Inline Image:

https://i.imgur.com/28VCvIT.png

Having poked through the face code and the rendering code a little, it does look as though all the code for this feature remained in the game. Having read the code a little more carefully, and done some tests in Dromed, here’s what I’ve found:

The face_process command in Dromed will generate a file facepos.str with animation frames for all speech samples whose schemas have the Sound > Face Motions property turned on. Here’s a small sample I generated with it, from a hammerite muttering:

Code:

hm1a0mu1:"14444313441431111144314441445415533444454554443144440110000"
hm1a0mu2:"033111011111111111113114411444444441144414411141110100455555354441545431144444444541111145554443444441414543334311411114444113103431111100000000"
hm1a0mu3:"0155444110011111111111244411410111144444112421111011010000410100000011121112541414211111111044411114414222124411111141214211000214112100004104422210141111210441210000000"

Each of these strings is a sequence of frames for a 16-frame-per-second facial animation for the sample in question. Each digit relates to one of the following mouth positions:

Code:

0: neutral
1: small oh
2: small ee
3: big oh
4: big ee
5: shout

I wanted to see how well these generated frames corresponded to the speech, so here’s hm1a0mu3 alongside a set of simple faces that I drew and animated with a bit of javascript:
[video=youtube;7BqYCRsuXPY]https://www.youtube.com/watch?v=7BqYCRsuXPY[/video]

I don’t know about you, but having made that animation, I was now really keen to see if I could get this working in the game somehow. Now, I haven’t got it working at all yet, but I have got as far as having it not crash when I use the Renderer > Face Textures property. Here’s how it works:

Each of the Neutral/Smile/Wince/Surprise/Stunned textures should be the prefix of a set of textures in the mesh/txt and mesh/txt16 folders for each of these visages (to use the term from the source code). Let’s say I’d put in guardneut for the neutral visage. Now if this visage won’t support mouth animations, then I only need to create the guardneut1.gif texture for this visage. If it does support animations, then I also need guardneut2.gif and so on up to guardneut6.gif. Here the digits 1-6 correspond to neutral, small oh, smalle ee, big oh, big ee, and shout, respectively. Note that while the frames in facepos.str start at zero, the texture names start at one, presumably to make them more artist-friendly. Strange that the programmers didn’t also do the same for the page numbering in book files:

Inline Image:

https://i.imgur.com/QlBziIB.png

But anyway, once I had the numbered textures in place, and set the visage names to the appropriate prefixes, the game no longer crashed when the guard starting talking. It didn’t swap out the texture at all, but it was clearly doing something.

But that’s as far as I’ve got just now. I haven’t got a firm handle on what the “texture to replace” field should contain, and how it corresponds to the texture name baked into the model file. I think I’m going to need to use a debugger for the next bit, to break when it’s checking the materials to replace so I can really see what’s going on. So expect an update sometime after I do that.

vfig on 19/12/2023 at 22:58

This is a repost from my now-defunct blog, so that the information is still available:

The good news is I got it working this time!

Before anything else, I’d like to clarify a point from that post: the face_process command does nothing useful unless you first set the face_path property. For example this:

Code:

set face_path ./strings/
face_process

Will write to “./strings/facepos.str” relative to the directory Dromed is in. The path is just stuck on the front of the file name, so make sure you have the trailing slash.

So I followed through on my threat to apply a debugger to Dromed, and that had… mixed results. I installed Visual Studio, which turned out to be a waste of time since its interface now is quite confusing, and I couldn’t figure out how to open a memory view and search for it. And I needed to do that because–since Dromed.exe doesn’t have any symbols–I needed to locate some of the static strings relating to the feature, such as “facepos.str”, in order to find identify the actual functions I was interested in. The goal, of course, being to put some breakpoints on the various conditionals in those functions to see which of them was responsible for the thing not actually working as expected.

So with Visual Studio out of the question, I then decided to try r2 (radare2). Since its focus is the disassembly of undocumented binaries, it’s got much better tools for searching. But it’s a command line only thing, which is okay, but means doing a bunch more reading in order to learn. But I successfully identified all the interesting functions from face.cpp in it, and so came up with a list of addresses for breakpoints. So then I stuck breakpoints on those addresses and ran Dromed. Well, tried to. For whatever reason, I couldn’t get Dromed to run with r2 attached. Time to turn to a new debugger!

This time around it was the venerable Windbg. I fired it up, set up the breakpoints, launched Dromed, and… nothing. Went between game mode and edit mode a few times. Changed the Face Textures and Face State properties around a bit. The breakpoints just weren’t ever being hit. So then I dug through the disassembly a bit more to find the callers of the face functions, so I could put breakpoints further up. So the MeshTex functions got breakpoints too, and those weren’t being hit either! Wtf.

So as I was scratching my head over this, I took a closer look at the MeshTexPrerender() function that calls FacePrerender():

Code:

void MeshTexPrerender(ObjID Obj, mms_model *pModel)
{
sMeshTexRemap *pRemap;

// Does this object have our special property?
if (!g_pMeshTexProperty->Get(Obj, &pRemap)) {
g_pCurrentModel = 0;
return;
}

...

FacePrerender(Obj, pModel);
}

Yeah. Look at that if there. The MeshTexPrerender function exits early if its corresponding property isn’t present. And so in that case the face stuff never gets called. This is it, I thought, all I need to do is bung the MeshTex property onto my guinea pig guard, and it’ll work.

Well, it didn’t.

I’d never noticed the MeshTex property (that’s Renderer > Mesh Textures to you) before, so I did a quick google to see if anyone had discussed it before, and the very first thing I found was the wiki page for it saying: “Not in Thief Gold”.

Aha.

Ahaha.

Ahahahahahahaaa.

So all this time I’d been chasing down this feature, thinking it had been in The Dark Project and cut before release. But no: it appears to have been a new feature for Thief 2. But I’d been using the Gold variant of NewDark, which–although it has the code in there, since NewDark is based off the Thief 2 codebase–doesn’t support the new properties from Thief 2 in game. Even though Dromed will happily let you throw them around.

So this was the second reason I’d seen nothing happening at all. So I quickly set up a Thief 2 NewDark environment with Dromed and all that, and then created a new test mission. I added the Sound > Face Motions property to the guard idle schema. Then I added a guard. Then quickly scribbled a bunch of face variations onto the guard texture and saved them into mesh/txt16. Then added the face textures property and set up the replacements. And finally, stuck the Renderer > Mesh Textures cherry on top of the layer cake of prerequisites. With trepidation I pressed Alt+G.

The moment I entered game mode, the guard coughed.

AND HIS MOUTH ANIMATED ALONG WITH THE SOUND.

Mine just hung open.

Relief and surprise and joy mixed together. So the feature does work after all! It just needed Thief 2, and this extra property.

It was finished. All that was needed now was to make a demo to show it off…

vfig on 19/12/2023 at 22:58

This is a repost from my now-defunct blog, so that the information is still available:

Nothing is ever that easy, is it?

Now that it was clear that the feature was only in Thief 2, rather than the demo being some random guard humming and mumbling to himself, instead there was an obvious conversation to use to show off the feature: the archers of Lady Van Vernon and Master Willey hurling insults at each other across the rooftops, from Life of the Party. So I immediately set work on that.

First I added a second archer to the test map, this time the one with the purple shirt. And quickly hacked a copy of his textures with the same sketched mouths as the first (red-shirted) archer. Loaded up the map and checked that he, too, moved his mouth when coughing and muttering. Good.

Identifying the speech schemas for the conversation in question just needed a quick look at CONV.SCH:

Code:

...

//////////////////////////////////
//MISSION 11 - LIFE OF THE PARTY//
//////////////////////////////////

//c1101 -Stand off
schema sg11101A //"...And I'm telling you.."
archetype AI_CONV
sg11101A
schema_voice vguard1 1 c1101 (LineNo 1 1)

schema sg51101B
archetype AI_CONV
sg51101B
schema_voice vguard5 1 c1101 (LineNo 2 2)

...

Then it was back to Dromed and find_obj sg11101A and so on to select the schemas one by one in the Object Hierarchy, and add the Sound > Motion property to each. I don’t know if there’s a better process to do this en masse than manually, but for just eight schemas the manual steps are fine. And so I regenerated facepos.str.

Now conversations in Thief use the so-called pseudoscripts provided by the AI > Conversation property. A pseudoscript is basically a bunch of actions that can be performed: play a sound schema or a motion schema, turn to face a particular direction, move to someplace, add or remove a metaproperty (for behavioural changes), and so on. Due to the high barrier to writing custom .osm scripts for the game (the script language is basically a custom C++ preprocessor), many fan mission designers–I haven’t looked to see if the same is true of the official missions!–also used this pseudoscript system for other scripted behaviours. So anyway, I needed a conversation setup.

So I opened up the Life of the Party mission and found the object there that drove the archers’ conversation. With my usual hacky method for copying a few things between mission files in the editor, I went through each step of the conversation and screenshotted the edit dialog, pasting them into the copy of Photoshop I always have running. Then I can just refer to the screenshots when copying the stuff into my own mission. There’s probably a better way of doing this, but–knowing Dromed–doubtful that there’s a much better way. And it turned out to be quite a good thing that I copied it this way.

So anyway, I went back to my mission and created a conversation, linked it to my two guards, then went step by step through the eight pages and filled out the speech actions in each (not bothering with the motions and behavioural bits just yet). Added a button to trigger it, hit Alt+G to get into the game, and pushed the button.

Inline Image:

https://i.imgur.com/Az6fNoA.png

Inline Image:

https://i.imgur.com/9j009Pw.png

Looking good. And of course I’d forgotten to save the changes to the test map.

Taking a look at the source code again, I found that the speech system helpfully offers a callback interface for other parts of the game to get notified when a line of dialog plays; but it’s only used in two places: the face animation uses this to know when to start and stop animating; and the conversation system uses it to know when a line has finished, so that it can play the next step of the pseudoscript. Shouldn’t be a problem though, cause these kinds of callback systems are always set up to handle multiple callbacks, and … oh.

Code:

// NOTE: if kCallbackEntryMax is not 1, a crash occurs at the end of a conversation
// inside SpeechEndCallback, because the first callback causes the speechEndCallbackHash
// to delete the callback function array, so the second callback ptr is a bullshit ptr - patmc
//#define kCallbackEntryMax 4
#define kCallbackEntryMax 1

Ah. So the speech system originally handled up to four different callbacks, but the conversation system didn’t manage its own memory properly and so someone hacked the speech system to only support one at a time. The assertion error was a check that the array of callbacks wasn’t already full: and since both the face animations and the conversation were trying to register callbacks, the first one in filled up the array. This was not going to be easy to work around.

Anyway, now that I knew the source of the assertion failure and why it was happening, I decided to run the test map again and ignore the failure this time. Here be dragons, but I figured they were reasonably polite dragons that wouldn’t cause much harm. And so once again I pressed the button and got the error, but this time chose Cancel. The game continued, and the first guard proceeded to spout his line of dialog, and stopped. And then the other guard just stared mutely at him. Like he’d forgotten his lines. Anyway, this is where it really clicked with me that the conversation system needed the callbacks installed to work properly, otherwise it just never moves on from the first step.

Because this limit is baked into the code and the statically allocated array of callbacks, it’s not really changeable. And who knows if the bug described still exists even if it was changed? (It probably does.) So I had to come up with a different way of handling this.

And that way was: eight different conversations! One for each line. So thankfully I still had all my screenshots of the conversation setup, so it was straightforward to refer to them again when setting up the eight conversations. But a new problem now: one button wasn’t going to be enough to set off the conversations properly, so it was time for more hacks!

I wrote a Squirrel script for the button that would turn on the first conversation, wait for a time, then turn on the second conversation, wait again, and so on. I opened up each of the sound files for the conversation in Audacity to get the length, and set the wait times accordingly. And it worked! It’s ugly as hell, and painful, and not at all robust enough to use in anything that’s going to ship, but it worked. The first guard finished his line, and a fraction of a second later the second conversation was triggered and the second guard remembered to say his line, and so on. At last time for the final spit and polish, I thought.

I googled for references for mouth positions for different sounds, and found a ton of cartoon ones, and a few very bland and expressionless photographic ones, and then (https://miko-noire.deviantart.com/art/Mouth-References-436290183) this one by Miko-Noire which–although it’s not really for different sounds–had exactly the amount of liveliness I was looking for.

Inline Image:

https://i.imgur.com/koAZKqW.png

So I slapped those into the guard textures, and with a bit of slapdash colour alteration, they fit both the light-skinned and dark-skinned guards pretty well. A proper art job would have given them different mouths, of course, and probably also adjusted the cheeks to match. But I’m no artist, and this was already looking much better than my temporary version.

Inline Image:

https://i.imgur.com/assMtpw.png

There was one final bug waiting for me. You see, the face animation system will animate the face for a speech sample even if it doesn’t have an entry in facepos.str. It just plays a predefined, simplistic animation in a loop:

Code:

// If there's no animation string we change frames in a fixed
// sequence.
static int aTimedAnim[8] = { 1, 2, 2, 3, 2, 2, 1, 1 };

And this had probably been coming up before, but now that the faces looked better and I was paying attention to how they looked, I saw this animation coming up. It was now quite noticeable, because it doesn’t have the neutral “mouth closed” frame, nor either of the two frames where my new images had the mouth wide open. So now I could see that randomly, one or the other of the guards would just flap their lips continually without ever really opening or closing their mouth: the telltale sign that this animation was being used.

Well, I spent a lot of time trying to figure this one out. I thought maybe there was a race condition, and so the strings file needed to be preloaded (nope). I thought perhaps the speech samples needed to be preloaded (nope). I thought maybe there sometimes wasn’t enough time at the end of one line for the sample to actually finish, so a bit more delay between lines might help (nope). I thought maybe the conversation callbacks were the issue, and that triggering the guards’ lines through a horrible hack involving AIWatchObj links and teleporting markers around–but that didn’t even work, let alone fix the problem. In the end, I just decided to live with it. In the worst case, I could record multiple takes, and just splice in each guard’s line when it actually worked. It’d be a pain, but it’d work.

So finally the last thing I did was throw in a couple of camera markers, and update the script to attach the player’s camera to each in turn, so that each guard would get a closeup. Oh, and put into the conversation the motions and behaviour changes needed to make the characters move and then start shooting at each other at the end.

And then when I was finally read to record, all eight lines played perfectly, with the correct mouth animations, all on the first take. So maybe there is a Master Builder after all! so here, finally is the result:
[video=youtube;pIAyFXP-6NI]https://www.youtube.com/watch?v=pIAyFXP-6NI[/video]

If you want to muck about with this yourself, you can (https://github.com/vfig/thiefy-stuff/raw/master/T2%20Face%20Animation%20Demo/t2_face_animation_demo.zip) download the demo mission from Github. You’ll need Thief 2 NewDark 1.25 to open it.

Update 2: NewDark 1.26 has just been released, with this gem in the patch notes:

Code:

- restored "kCallbackEntryMax" (speech callback) limit from 1 to 4 to avoid "i < kCallbackEntryMax" asserts

This means using Face Textures should no longer break Conversations, and vice-versa. One could actually ship a mission using this feature now!

vfig on 19/12/2023 at 23:10

footnotes:

the above posts were originally posted in june 2019. the earliest discussion of the Face Textures property i can find is (https://www.ttlg.com/FORUMS/showthread.php?t=112174) from 2007, long before the leaked source made investigating it viable.

nothing in the engine code set the visage (neutral/stunned etc) property. but if you did set that through script, i believe it would work, with the timeout and priority overrides.

it seems possible to use the visage textures without the mouth poses, or (as in the demo above) the mouth poses without visages. obviously both together makes for a pretty massive explosion in numbers of textures needed: 30 textures for every npc would be a drag. however this is a replace feature. so you could make meshes with a different material for the face to simplify the problem, or at least reduce the total size and thus memory load of textures needed.

apart from the work needed, there are a few other drawbacks to this feature:

- it only swaps textures, so it does kinda look a little weird to see the mouth open wide but the chin not move at all. a bit cartoony, at least.

- the face_process command was only a rough draft implementation, that only looked at the volume of the sound and not the spectrum, and used very little filtering. so the animations it produces tend to be very 'flappy' as the volume fluctuates during speech. so for a polished result, you would probably want a smarter program to analyse the sounds and generate the digit strings. phoneme extraction—at least to the quality level you might want for thief—is a pretty well studied problem, so making such a program wouldnt be too hard.

edit: i had written that the output from face_process "only use ~~three or so of the possible mouth positions", as per some notes i had taken, but this was a misunderstanding on my part, and trivially contradicted by the actual output in my first post.~~

PinkDot on 24/12/2023 at 14:59
Thanks for reposting this. It's a good read. Reminds me of "war stories" from "The Algorithm design manual".

Do you think it'd be possible to create a working mod for the original game (T2) with all the spoken lines processed to include the phonemes?

One more thing I'd like to see is eyes closed for sleeping and dead AIs. But that's more on the visage side/facial expressions - not the mouth movements.

vfig on 24/12/2023 at 21:41
Quote Posted by PinkDot
Do you think it'd be possible to create a working mod for the original game (T2) with all the spoken lines processed to include the phonemes?

totally possible. the mod would need to have a fully complete facepos.str, all the textures for the different guards with mouth variations, and then a dml to add the face properties to the ai archetypes.

(with the newdark 1.26 fix, all the shenanigans i had to do with multiple conversations to get the demo working arent needed -- the face system and the conversation system coexist together just fine as they can each have their own callback)

PinkDot on 25/12/2023 at 23:23
OK, that's good to hear. I'm not saying I will do it, but I wanted to try out the system, just out of curiosity.

Quote:
There was one final bug waiting for me. (...) In the end, I just decided to live with it.

The bug with the pre-defined animation playing back instead of the actual one - I'm guessing still no solution to that? Is this a deal breaker or just a minor nuisance? Given the low visual fidelity of the system, is this something we could get away with?

Quote:
and then a dml to add the face properties to the ai archetypes.

The DML would have to add Sound > Face Motions to all the spoken schemas, wouldn't it? My guess is that it could be as well added to the group of schemas (the ones with capitalized names) instead of each individual one, wouldn't it?

Quote:
And finally, stuck the Renderer > Mesh Textures cherry on top of the layer cake of prerequisites.

Does Mesh Textures property needs to be applied to the instanced AI in the mission or to the archetype?
And if I understand correctly - it can be blank, but just needs to be present - is that correct?

vfig on 26/12/2023 at 00:10
Quote Posted by PinkDot
The bug with the pre-defined animation playing back instead of the actual one - I'm guessing still no solution to that?

that only happens if the ai speaks a line that is missing from facepos.str — i suppose its less a bug and more a 'well, some random mouth flapping looks better than speaking with the mouth closed' kind of decision.

Quote Posted by PinkDot
The DML would have to add Sound > Face Motions to all the spoken schemas, wouldn't it? My guess is that it could be as well added to the group of schemas (the ones with capitalized names) instead of each individual one, wouldn't it?

Face Motions is an inherited property, so yes, probably it could be added to the appropriate archetypes instead of each schema object.

Quote Posted by PinkDot
Does Mesh Textures property needs to be applied to the instanced AI in the mission or to the archetype?

Face Textures is an inherited property, so it can probably be added to the archetype. but Face State is not an inherited property, so it needs to exist on each concrete AI; it is possible that omitting Face State on the concrete ais would be okay, and it would be added at runtime when the AI starts speaking, but i have not experimented with this to know one way or the other.

Quote Posted by PinkDot
And if I understand correctly - it can be blank, but just needs to be present - is that correct?

oh wait, you were asking about Mesh Textures. my memory is not 100%, but i think that yes, it can be blank, as long as it exists on the ai instance or (as Mesh Textures is also an inherited property) one of its archetypes.

PinkDot on 26/12/2023 at 00:51
Quote:
that only happens if the ai speaks a line that is missing from facepos.str — i suppose its less a bug and more a 'well, some random mouth flapping looks better than speaking with the mouth closed' kind of decision.

Oh, I thought, that this was randomly kicking in, despite the sound file having a face numbers string present. If it happens only for the missing ones, then it's all fine.

Quote:
Face Motions is an inherited property, so yes, probably it could be added to the appropriate archetypes instead of each schema object.

Looks like adding this to the group of schemas made 'face_process' generated a few hundreds lines in the facepos.str file, so looks like it works.

I tried doing this for the first line of Basso in Running Interference, but the lipsync does not work. I just go into Game Mode and no changes.

Here's my setup:

Inline Image:
https://imgur.com/qItjs64

- Added Face Textures property
- I named the Neutral state as BassoNT, thus created a folder mesh/txt16/BassoNT and in there I have my textures BassoNT1.gif, BassoNT2.gif etc. up to NT6.
- the other states are populated just to not have them blank, but that did not make any difference.

The schema has Face Motions property, but not sure if this is actually used in runtime or just for the face_process command?
Inline Image:
https://imgur.com/s7ivFqT

(In case images do not display: (https://imgur.com/a/kqMk4q5))

What am I missing?

EDIT: OK, nevermind - for some reason I thought the extra textures need to go into separate folders. Once moved back into mesh/txt16, they work.

However they need to share the palette, if they're .gifs. I hope this works with other formats too - will try next.
Oh, and the lipsync was actually terrible. No correlation between the words whatsoever. As you pointed out, the job done by face_process isn't great and could be improved using a different solution.

EDIT 2: I actually take back my words on the quality of the lipsync. While it may not be ideal, the reason why it was off sync for me previously was that my speakers were connected via bluetooth.... :rolleyes: Once I switched to normal ones, it felt much better! Yay!

PinkDot on 26/12/2023 at 02:06
Quote:
Face Motions is an inherited property, so yes, probably it could be added to the appropriate archetypes instead of each schema object.

Looks like this isn't even an essential part of the system. Once the facepos.str is generated, the schemas do not need that property anymore.