Karl Pertsch

I am a first year PhD student in the Cognitive Learning for Vision and Robotics Lab (CLVR) at the University of Southern California where I work on deep learning, computer vision and robotics with Professor Joseph Lim.

Before joining CLVR I obtained my diploma in EE from TU Dresden, Germany working with Professor Carsten Rother. I also got the chance to spend one year as a Fulbright Scholar in the GRASP Lab at the University of Pennsylvania working with Professor Kostas Daniilidis.

Email  /  CV  /  Google Scholar  /  LinkedIn


  • [Apr 2019] New preprint on keyframe-based video prediction!

  • [Apr 2019] Our work on discovering an agent's action space got accepted to ICLR19!

  • [Dec 2018] I presented our work on unsupervised learning of agent's action spaces at the Infer2Control workshop at NeurIPS 2018 in Montreal.

  • [Aug 2018] I joined USC's PhD program in Computer Science, working at CLVR lab with Joseph Lim.

  • [Jun 2018] New ArXiv preprint on unsupervised discovery of agent's action spaces through stochastic video prediction.


I'm interested in machine learning, computer vision and robotics. At the moment I am working on unsupervised learning of predictive models that can be used for planning and control. Before that I worked on more 'classic' computer vision. i.e. 6DoF object pose estimation.

KeyIn: Discovering Subgoal Structure with Keyframe-based Video Prediction
Karl Pertsch*, Oleh Rybkin*, Jingyun Yang, Kosta Derpanis, Joseph Lim, Kostas Daniilidis, Andrew Jaegle
Preprint, 2019
project page / arXiv / poster

We propose a keyframe-based video prediction model that can unsupervisedly discover the moments of interesting change, the keyframes, in the data. We show that using the predicted keyframes as subgoals for planning improves performance on a simulated pushing task.

Hover over image (or tap the screen) to see the video.

Learning what you can do before doing anything
Oleh Rybkin*, Karl Pertsch*, Kosta Derpanis, Kostas Daniilidis, Andrew Jaegle
International Conference on Learning Representations (ICLR), 2019
project page / arXiv / poster

We learn an agent's action space from pure visual observations along with a predictive model. It can then be used to perform model predictive control, requiring orders of magnitude fewer action annotated videos.

Hover over image (or tap the screen) to see the video.

iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects
Omid Hosseini Jafari*, Siva Karthik Mustikovela*, Karl Pertsch, Eric Brachmann, Carsten Rother
Asian Conference on Computer Vision (ACCV), 2018

Combining a CNN-based regression of dense on-object surface labeling with RANSAC-based pose fitting for accurate 6DoF pose estimation of texture-less objects under heavy occlusion.

I stole this website layout from here!