A new initiative from the Alibaba Group offers one of the best methods I have seen for generating full-body human avatars from a Stable Diffusion-based foundation model.
Titled MIMO (MIMicking with Object Interactions), the system uses a range of popular technologies and modules, including CGI-based human models and AnimateDiff, to enable temporally consistent character replacement in videos – or else to drive a character with a user-defined skeletal pose.
From single source images, three diverse characters are driven by a 3D pose sequence (far left) using the MIMO system. See the project website and the accompanying YouTube video (embedded at the end of this article) for more examples and superior resolution. Source: https://menyifang.github.io/projects/MIMO/index.html
Generated characters, which can also be sourced from frames in videos and in diverse other ways, can be integrated into real-world footage.