AI turns 2D photos into 3D models – and renders KITT from Knight Rider

If you hadn’t already noticed that machine learning is hugely helpful for assisting 3D tasks, here’s a Pontiac Trans Am and an NVIDIA Omniverse plug-in to drive the point home.

[embedded content]

Machine learning has been a big pillar of NVIDIA’s GTC event this week. And it makes some sense – the “artificial intelligence” moniker may have captured the imagination, but a big part of what is currently driving AI trends is really graphics processing architectures. Those innovations in silicon have helped to resurrect an approach that might have been confined to history; that is, neural nets.

The double-edged sword is that marketers and the public alike may seize onto the idea that this represents AI as general intelligence – something capable of emulating humans. But to NVIDIA’s credit this week, the company – and its massive research portfolio – has remained grounded in stuff that’s realistic and usable, from virus-fighting chemistry to 3D design.

And there’s been a lot of stuff – I find myself just digging through papers and GitHub repositories, and a lot of it is fantastically wonderful for artists and creatives. In a dark world, it’s nice to see that AI can be as much about expression as “automation.” NV’s ongoing mission to make us want to buy RTX cards is on track, for sure, although a lot of this stuff (shhh) doesn’t actually require an RTX architecture and you might be using a server anyway.

The research is really something. Machine learning tasks do still tend to fit particular niches rather than being as generally useful as a GPU is, let alone a CPU. But they are justifying the silicon.

I think NV did a decent job picking KITT as it’s something people can follow. It’s an idea that has been bouncing around for a bit – trying GANs (generative adversarial networks) to spit out 3D objects and textures and not just 2D graphics. But there’s updated research from October, code you can try out right now, and a coming Omniverse plug-in.

Computer, one KITT, please

Here’s the basic idea:

Start with some 2D images to train on – photos from different angles. You’ll need those different angles in training; it can’t train and extrapolate on just single-viewpoints.

The dataset they used was actually not huge – 6000 cars, 1000 horses, 1000 birds. The cars and birds work really well, in fact; I think the horses just needed a more diverse training set. (They apparently had too many pictures from certain side angles and not the top – plus I think tails are tricky.)

It didn’t take long to train, either – four V100 GPUs over 120 hours. (And you’ll do better on a higher-end chip.)

With that training in place, here’s the fun part – the inverse model, so a 3D geometry and texture, can be spit out in 65 milliseconds. From just one image.

I mean, that’s huge. Maybe you need something other than cars, but this is a training task that could be easily handled by a small graphics team with some spare server hours. And once it’s trained, you can spit out tons of the models you need at will.

This doesn’t have to put experienced 3D modelers out of work, either – or even replace depth cameras. It is a clear winner for quick mock-ups, storyboards, and other prototypes in advance of the more expensive design work. And it also suggests predictive tools that might assist you inside a 3D modeling task. That is, while it was a clever idea to share the KITT video, it’s not really that video that you should judge the tech on directly. If you want to model KITT, you’ll probably still do it by hand. But as a complement to other tools – and as a cute new Omniverse feature – this is great.

It’s worth checking out the design teams here as each one of the researchers has done other great work. (GAN and machine learning followers will recognize some of those author names immediately, like Yuxuan Zhang‘s efficient image labeling or, more in the general press, Sanja Fidler of University of Toronto and her team’s pop song generator and work on neuroaesthetics in fashion.)

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

Yuxuan Zhang, Wenzheng Chen, Huan Ling, Jun Gao, Yinan Zhang, Antonio Torralba, Sanja Fidler

Where to find the code?

Well, you should definitely start with the infamous StyleGAN, which has been faking faces and style and other things using GANs –

https://github.com/NVlabs/stylegan

But the critical stuff comes from just adding new datasets to this 2019 project, “Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer”:

https://github.com/nv-tlabs/DIB-R

I keep talking about Omniverse because NVIDIA keeps talking about Omniverse but – they’re onto something here, because they’re doing effectively real-time renders in this system provided as just an Omniverse plug-in. Sold and sold. It’s clever stuff.

NV tells CDM they’ll have that plug-in for us some time this summer. We’ll be watching.

And since I keep talking about the potential of Omniverse for individual artists and small creative teams, we’ll look at that, too.

Anyway, if you’re saying “but don’t I really not need that Omniverse plug-in since I can just go to GitHub and start messing around?” Yes, you’re right. I do like the idea of machine learning applied to 3D workflows generally, though, and we might even see them as ways of helping manage assets – like Clippy, if Clippy understood 3D assets.

But yeah, as far as a weird music video made of nothing but distorted AI-generated horses, it should be possible. Work on the horse tracks now, and then we can have spewing horse particle systems made out of AI horses as a summer project. You’re welcome.

Where to get more AI goodness:

GTC talks: https://www.nvidia.com/en-us/gtc/

Research: https://www.nvidia.com/en-us/research/

Oh yeah, and among other things, Audio2Face app is here on Omniverse in open beta, as well. You will need an RTX GPU for that, even though you can do without it with the Git repositories there.

Audio2Face is very fun.

[embedded content]

It’s not exactly relevant to this story, but I’m in Germany, there’s a car, and it’s Hasselhoff, so if you need some added inspiration for your 3D / motion / VJ work today, take it away.

[embedded content]