Zooming On All Actors: Automatic Focus+Content Split Screen Video Generation

Moneish Kumar1, Vineet Gandhi1, Remi Ronfard2, Michael Gleicher3

1IIIT Hyderabad
2Univ. Grenoble Alpes/INRIA/LJK
3University of Wisconsin-Madison

Input Frame SSC

Figure1:Illustration of our approach showing a frame from (a) the input video and (b) its corresponding split screen composition .


Recordings of stage performances are easy to capture with a high-resolution camera, but are difficult to watch because the actors' faces are too small. We present an approach to automatically create a split screen video that transforms these recordings to show both the context of the scene as well as close-up details of the actors. Given a static recording of a stage performance and tracking information about the actors positions, our system generates videos showing a focus+context view based on computed close-up camera motions using crop-and zoom. The key to our approach is to compute these camera motions such that they are cinematically valid close-ups and to ensure that the set of views of the different actors are properly coordinated and presented. We pose the computation of camera motions as convex optimization that creates detailed views and smooth movements, subject to cinematic constraints such as not cutting faces with the edge of the frame. Additional constraints link the close up views of each actor, causing them to merge seamlessly when actors are close. Generated views are placed in a resulting layout that preserves the spatial relationships between actors. We demonstrate our results on a variety of staged theater and dance performances.


Input Video
The SSC obtained using a naive approach i.e. computing the virtual camera feed for each actor and frame independently by simply following the tracks. It uses a fixed layout and hence the order is not preserved . We can also observe the redundancies and the noisy camera movement. The video SSC is obtained with smooth camera motion and preserving the relative order.
The SSC obtained after the dynamic layout selection and the optimization but without transition constraints, which results in jerks in the video The final result including the layout transition constraints

Results On Other Sequences

Two Actor Sequences

Three Actor Sequences

Four Actor Sequence