Projects and News

Synchro

Cheap real-time tools?

We produce huge volumes of the industry’s facial animation. Our clients trust us to deliver every shot to the same exacting standards, and to meet tight production schedules. Naturally, this requires some heavyweight technologies and extensive production expertise. Pretty much everything we do uses specialist in-house tech. But, every few months, a vendor or research group will put out a demo of some new automated approach. Often accompanied by “this is UNEDITED!!” and “this is REAL TIME!!”. Occasionally, one of our clients will ask about it as a production method.

Now, don’t misunderstand – there’s some fun research going on out there and we enjoy these demos as much as anyone. Moreover, our R&D team has a bunch of amazing real-time demos of our own lined-up this summer. These things make a lot of sense for consumer apps. But do we use off-the-shelf tools or real-time systems in our professional production pipelines? No. Here’s why:

As production challenges increase the space has become more difficult to navigate, and more dangerous for producers and their budgets. The price of a particular animation tool or capture system tells you almost nothing about the full cost of final quality animation. Our clients demand precisely defined costs and timescales to get to final quality. And the leading next-gen producers demand next-gen facial animation in every shot, not just for a few closeups in cutscenes, but in huge volumes throughout the game.

The pipeline can only be called a solution once you’re certain it will always deliver consistent final results on time and on budget. The true cost of a failed system can be immense – late delivery, disappointing quality, inconsistency, and other production stresses can have a serious impact on the wider product.

Here’s the problem with cheap off-the-shelf tools (especially so-called real-time systems): the output isn’t final, but it’s much worse than that – the output is almost always much further from final (at least by our standards) than people realize. A cool video (especially if it has nice textures) can look like a reasonable work-in-progress, but first impressions reveal very little about how much effort is required to turn it into a final result.

Consider a high-end face rig – let’s say you look at the result on a given frame and realize the left lower lip is a little too high…what do you adjust? There could be tens, even hundreds of controls influencing that lip position – which one is wrong? Perhaps – if you’re extremely lucky – it’s just one control that’s wrong. But if you got your basic result using the wrong technology to start with, it’s far more likely the system got close – but not close enough – using the wrong controls (or even worse, a set of dumb, hard-to-interpret automatic blends). So to fix this frame, and especially those around it, you have a huge amount of work to do. So much work in fact that you need to question the value of the original ‘solve’. Jaw position is a great example: go and watch some clips of real-time systems in action, and pause the video on a few random frames. Is the jaw position right? If not, it may well be the cause of the something’s-not-quite-right sense you got from the clip. So, just fix the jaw, right? Er, no, because the other controls (or automatic blends) took values which were as near as they could get to being ‘right’ in the presence of an incorrect jaw position. See the problem? It’s worse than that. Let’s say the jaw is wrong, in a particular frame, but it’s hard to tell because you can’t see the teeth. Maybe the other controls have done a reasonable job of making the image look OK. But, even if you can live with that frame what are you going to do a few frames later when the teeth appear and they’re in the wrong place? You can’t just move the jaw. Where should it move from? For temporal consistency you need to go back and fix the frame you thought looked OK! You want your animation to look world-class? You have a ton of work to do. Anyone who can glance at a shot and claim it’s 80% (say) there without looking at the underlying data is talking unmitigated nonsense. Ignore such talk! Believing this kind of analysis will make your project expensive, late, low-quality, or all three.

Animation should respect the quality of great rigs, by animating those rigs exactly as they were designed to be animated. Don’t lower your standards because of the limitations of a software tool or capture method.

These are the reasons we never use any opaque ‘black-box’ solving method. Every Cubic Motion solver is custom-built from the ground up, by a team of scientists and facial animation experts, to produce the right structure of result from the start. Then, when our expert facial animators step in to make the shot perfect, their skills can be applied in a scalable and consistent way – everyone working with the rigs exactly as designed.

We understand what it takes to deliver at scale. We have vast experience in overcoming these problems, so that you don’t have to. We’ll get you there on time and on budget. We can handle all the latest types of capture (including depth), but we insist on extremely precise tracking and deeply engineered solvers. We would never use a system that relied on ‘setting poses’ and hoping for the best.

However, if you feel the need to test cheap tools then at least do this: select really tough, representative shots from your own project (never use pre-selected data from the vendor). Now get the stopwatch out and measure, measure, measure. Be thorough – making sure you’re including every task, including those of the reviewers (the worse your ‘first passes’ are, the more of these people you’ll need). If you’re going to rely on an apparently cheap system – then be absolutely certain it scales before you even start.

If we ever decide that these tools can deliver better productivity than our own technologies, then of course we’ll be first in queue to put them in our pipeline. Until then, we’ll stick to helping developers deliver on time to the highest possible artistic standard for any given budget.