Generative AI has exploded in popularity over the past year, with massive ramifications for filmmaking in virtual production and across all areas. In this story, we will survey some of the more promising generative AI tools for filmmakers and explore important ethical considerations.
What is Generative AI?
Generative AI uses artificial intelligence techniques to create new content, such as images, videos, music, text, and other forms of data. It uses algorithms and machine learning models to generate content that can imitate or simulate human-like creativity.
Generative AI systems leverage deep learning algorithms, such as generative adversarial networks (GANs), variational autoencoders (VAEs), and recurrent neural networks (RNNs), to create content based on patterns and data input. These algorithms can learn from vast amounts of data and generate new content similar to what they have learned.
Ethical Considerations for AI
Generative AI can create new and original content. However, it raises ethical concerns about attribution, ownership, bias, privacy, and responsibility.
Attribution and ownership: One of the primary concerns with art created by AI tools is the question of ownership and attribution. Who should be credited for the artwork? Is it the artist who created the art used to train the AI, the programmer who wrote the code, or the AI system itself?
Bias and discrimination: AI systems can learn from biased data, resulting in generative art perpetuating prejudices and discrimination. For example, an AI tool trained on a dataset that lacks diversity may produce art that reflects only the dominant culture, race, or gender.
Privacy: Using AI tools in generative art may require collecting and processing personal data. This raises concerns about privacy and the protection of personal information.
Manipulation and deception: AI tools can manipulate or deceive people. For example, a deepfake image created by an AI system may be used to spread misinformation or impersonate an individual.
Responsibility and accountability: As AI tools become more sophisticated, the question of who is responsible and accountable for the art they produce becomes more complex. Should the artist, the AI system, or the technology company that created the AI tool be held responsible for the negative consequences of the artwork?
A Survey Of AI Tools By Production Phase
Now that we’ve introduced the concept of generative AI and explored some ethical ramifications let’s survey some of the current AI tools. We’ll quickly introduce each category, potential use cases, and examples of different tools. Please let us know which tools we should add to this listing in the comments. (And for a peek at the vast AI landscape, see this report.)
Pre-Production AI Tools
•Use cases: script development, concept generation, research, correspondence, and organizational support.
ChatGPT by OpenAI: A conversational chatbot, the dialogue format makes it possible for ChatGPT to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
Now incorporated into Microsoft Bing
Google Bard: An experimental conversational AI service from Google Labs.
•Use cases: previs/storyboarding/concept development/pitchvis.
Midjourney: Via a Discord server, users use the /imagine command and type in a prompt; the bot then returns an image.
Dall-E: A neural network that creates images from text captions for a wide range of concepts expressible in natural language.
Stable Diffusion: Generates detailed images conditioned on text descriptions, it can also be applied to inpainting, outpainting, and generating image-to-image translations guided by a text prompt.
ControlNET: A neural network structure for controlling Stable Diffusion models by adding additional parameters, improving control and accuracy of generated images.
Cuebric: takes AI-generated art and slices it into ready-to-use Unreal Engine environments for virtual production.
NVidia GET3D: Generates 3D meshes with complex topology, rich geometric details, and high-fidelity textures.
Production AI Tools
•Use cases: highly accurate digital actors, retargeting performance into a different face or the same actor at a different age, de-aging, motion capture, and non-anthropomorphic characters driven by human performances.
Deepfake: A portmanteau of deep learning and fake in which a person in an existing image or video is replaced with someone else’s likeness.
YouTube artist “Shamook” got so good at Deepfake ILM hired him.
Used for Luke Skywalker’s de-aging in The Book of Boba Fett
Used for Harrison Ford’s de-aging in Indiana Jones in The Dial of Destiny.
Earlier versions used in Tron: Legacy and The Irishman.
•Use cases: assist camera operator/crew in visualizing complex, multiplane composites.
Weta Simulcam developed for Avatar: The Way of Water
In an article I wrote for American Cinematographer, I learned about the system Weta developed for live, machine-learning-assisted compositing. “We started lidar-scanning the sets as they were nearing completion and generated a lot of training data for our neural network,” says Weta FX motion-capture supervisor Dejan Momcilovic.
The neural-network system learns the basic geometry of the set from this lidar data and can then predict that geometry for compositing. “We infer an object disparity in one camera, turn it into depth, and then project that back into space and observe it with the hero camera,” he continues. “The computer-vision camera is very fast at acquiring the image, so we’re a frame ahead and ready to composite.”
Post-Production AI Tools
•Use cases: take a source voice and train a learning model to speak in that voice, automate language localization and ADR, retarget a source voice recording a different vocal style, and generate sound effects and music based on text prompts.
A voice cloning/machine learning technology was used for Darth Vader’s voice in the Obi-Wan series for Disney+ and for younger Luke Skywalker’s voice in The Book of Boba Fett, among many others.
“Can closely simulate a person’s voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker’s emotional tone.” –ArsTechnica.
“A neural net that generates music, including rudimentary singing, as raw audio in various genres and artistic styles.”
•Use cases: the ability to shoot footage for composites without needing an LED wall or green screen. Tools can deduce depth from pure imagery.
Copies sequence-specific effects, such as garbage matting, beauty repairs, or deblurring, from a small number of frames in a sequence and then trains a network to replicate this effect on the entire sequence- saving a lot of manual effort and time.
An algorithm developed to be robust, fast, and flexible to visual effects requirements while delivering automated and high-quality luma mattes quickly.
•Use cases: create a new video based on a source video input with a still image style input.
“Synthesize new videos by applying the composition and style of an image or text prompt to the structure of your source video. It’s like filming something new without filming anything at all.”
Machine Learning to animate still images with lifelike accuracy.
Visual dubbing, enables dialogue replacement removing profanity or translating to another language while maintaining perfect lip-sync.
•Use cases: speed up repetitive, menial editorial tasks, reduce skillset entry barrier.
Various tools to color match, morph cut transition, auto-tagging/classification.
“Synthesia is an AI video generation platform enabling you to quickly create videos with AI avatars in over 120 languages. It includes templates, a screen recorder, a media library, and more tools.”
Generative AI is rapidly evolving; researchers and developers continually explore new ways to improve and enhance its capabilities. We will likely see more sophisticated and innovative generative AI applications as technology advances. The power and utility of these applications must be carefully balanced with ethical concerns to maximize the benefit and avoid the pitfalls.
Some changes are already happening to address moral and rights concerns. For example, ArtStation posted a blog adding protections against AI misappropriation. Specifically, artists can add a “NoAI” tag to mark uploaded artwork as unauthorized for use in training AI models. ArtStation’s site search engine also added the ability to filter for or filter out AI-generated artwork.
SketchFab followed suit with a NoAI tag change to its terms of service in an email sent on 2/20/23, “if you wish to protect your uploads from usage by generative AI programs, you may tag your models ‘NoAI.’ You (and other users) agree not to use uploads marked as ‘NoAI’ in datasets for, in the development of, or as inputs to generative AI programs.”
So many of these generative AI tools are already cutting-edge, and it can be challenging to predict their evolution. It’s hard not to imagine more AI tools appearing in future camera hardware. We already see this happening with neural engines and computational photography in phone cameras.
As responsible filmmakers, we must balance leveraging the potential benefits of AI while proactively guiding their appropriate usage and avoiding abuse.