Supplementary Material Dynamic Inference: A New Approach Toward Efficient Video Action Recognition
To help us better understand how videos differentiate from each other in terms of their distinguishability for action recognition, we visualize the video instances which exit at different checkpoint of our method. We adopt dynamic inference with MSDNet-38 [3] and show six randomly sampled test videos from Kinetics-400 [4] validation set in Figure 1, the visualization illustrates the ability of our approach to reduce the computational requirements for recognizing “easy” videos. The top row Fig. 1(a) shows two videos that were correctly classified and exited by the first checkpoint. The middle row Fig. 1(b) shows two videos that were correctly classified and exited at the third checkpoint. The bottom row Fig. 1(c) shows two “hard” examples that would have been incorrectly classified by the first few checkpoints but were passed on the last checkpoint. The figure suggests that early checkpoint recognizes prototypical class examples, whereas the last classifier recognizes non-typical videos.
AI 理解论文