Netflix Releases VOID AI Model for Advanced Video Object Removal

For anyone who has spent time in a post-production suite in Los Angeles, the phrase “we’ll fix it in post” is usually a prayer whispered in desperation. Whether it’s a stray boom mic dipping into a shot at a studio in Culver City or an unwanted pedestrian wandering through a carefully choreographed scene on a street in Downtown LA, the cost of removing objects from a video has traditionally been a grueling balance of manual frame-by-frame painting and expensive reshoots. But the landscape just shifted. Netflix, in collaboration with researchers from Sofia University’s INSAIT, has open-sourced a model called VOID—which stands for Video Object and Interaction Deletion—and it is fundamentally different from the inpainting tools we’ve seen over the last few years.

The real breakthrough here isn’t just that VOID can make an object disappear; it’s that it understands the physics of what happens after that object is gone. Most AI video erasers act like a digital smudge tool, filling a gap with a static background that often looks “floaty” or unnatural. VOID, however, is designed to handle “interaction-aware” deletions. If you remove a person who was holding a guitar, the model doesn’t just leave a hole where the person was; it predicts that the guitar should now fall naturally. It’s the difference between erasing a smudge and rewriting the laws of physics for a specific scene. For the massive ecosystem of VFX houses and independent creators across Southern California, this moves the needle from simple “cleanup” to actual “scene restructuring.”

Beyond Static Inpainting: The Mechanics of Interaction Deletion

To understand why VOID is causing a stir in the AI community, you have to look at the “interaction” part of its name. In a typical video, objects don’t exist in a vacuum—they push, pull, and collide. When a standard tool removes a car from a collision scene, you’re often left with weird artifacts or a background that doesn’t react to the impact. VOID changes this. According to the research, if given a video of two vehicles colliding, VOID can remove one vehicle and generate footage where the remaining car continues down the road, although simultaneously replacing the smoke, fire, and debris from the crash with an undisturbed road surface.

View this post on Instagram

Another striking example involves a person jumping into a pool. While older tools might struggle with the complex fluid dynamics of a splash, VOID can remove the person and render the pool surface as if it were completely untouched, erasing both the splash in the water and any droplets on the surrounding ground. This is achieved because VOID is a vision-language system; it doesn’t just look at pixels, it takes a language description of the object to be removed, allowing for much more precise control over the output.

The Technical Engine Under the Hood

This isn’t a standalone miracle; it’s a sophisticated pipeline of several high-end AI models working in tandem. VOID is built on top of Alibaba’s CogVideoX video diffusion model. To teach the model how objects actually interact, the researchers fine-tuned it using synthetic data from Adobe’s HUMOTO and Google’s Kubric. The actual process of identifying what to remove is handled by a combination of Google’s Gemini 3 Pro, which analyzes the scene to identify affected areas, and Meta’s SAM2, which handles the precise segmentation of the objects.

For those looking to implement this in a professional workflow, it’s important to note that VOID uses two sequential transformer checkpoints. The first pass serves as the base inpainting model, while an optional second pass uses a warped-noise refinement model to correct shape distortions and improve temporal consistency. This two-step process is why VOID outperformed other heavy hitters in user surveys. In a test with 25 participants, VOID was preferred 64.8 percent of the time, significantly beating out competitors like Runway, which came in second at 18.4 percent, as well as tools like ProPainter, ROSE, and DiffuEraser.

The Commercial Implications for the LA Creative Economy

The fact that Netflix has released this under the Apache 2.0 license is a massive deal for the local industry. This license means the framework can be used commercially, allowing boutique studios and freelance editors to integrate VOID into their pipelines without paying exorbitant licensing fees to a closed-source platform. However, there is a significant hardware barrier. Running VOID requires a GPU with at least 40GB of VRAM, such as an NVIDIA A100. For a lot of home-based editors in the Valley or smaller agencies in Hollywood, this means they can’t just run this on a standard laptop; they’ll need to look into cloud computing or high-end workstation upgrades.

We are seeing a broader trend where AI is moving from “generative art” to “precision utility.” By integrating advanced AI tools into the editing process, the goal isn’t necessarily to replace the VFX artist, but to remove the mindless “grunt work” of rotoscoping and frame-cleaning. This allows artists to focus on the creative direction of a shot rather than the tedious task of erasing a stray cable from a foreground shot. As these tools become more accessible, the speed of iteration for indie filmmakers in Los Angeles will increase exponentially, potentially lowering the barrier to entry for high-production-value storytelling.

Navigating the Novel Workflow: Local Resource Guide

Given my background in analyzing the intersection of technology and local industry, it’s clear that the adoption of models like VOID will create a demand for a very specific set of expertise. If you’re a production house or an independent creator in the Los Angeles area looking to leverage this “interaction-aware” AI, you shouldn’t just look for a general editor. You need specialists who understand the bridge between traditional cinematography and latent-space manipulation.

Depending on where you are in your production, here are the three types of local professionals you should be seeking out:

AI-Integrated VFX Supervisors: Look for supervisors who don’t just know Maya or Nuke, but are proficient in deploying open-source models from Hugging Face. The key criterion here is their ability to manage a “two-pass” workflow—knowing when to stop at the base inpainting and when to apply warped-noise refinement to maintain temporal consistency across a shot.
HPC (High-Performance Compute) Consultants: Since VOID requires 40GB+ of VRAM, many local studios will need to pivot their hardware. You need consultants who can set up A100-capable cloud environments or build local workstations that can handle the memory load of CogVideoX without crashing. Avoid general IT support; look for those specializing in machine learning infrastructure.
Digital Rights & AI Legal Counsel: While the Apache 2.0 license is permissive, the apply of AI-generated content in commercial film still exists in a legal grey area regarding copyright. You need legal professionals who specifically understand the nuances of open-source AI licenses and how they interact with guild regulations and studio contracts in the entertainment industry.

Integrating these tools into a professional pipeline requires a blend of artistic intuition and technical rigor. As we see more of these models move from research papers to GitHub, the competitive edge in the LA market will go to those who can implement them the fastest and most ethically.

Ready to identify trusted professionals? Browse our complete directory of top-rated ai experts in the Los Angeles area today.