Fall 2017 - CIS 5543


Computer Vision   


Basic Information:

·  Lecture time: Tuesday 5:30-8:00pm, TTLMAN 402

·  Instructor: Haibin Ling | SERC 382 | 215-204-6973 | hbling AT temple.edu

·  Office Hours: Tuesday 3:00-5:00pm, or by appointment

·  Syllabus: PDF

Matlab Resources
Temple Univ.
CIS Dept.
DABI Center


Important Linkes:

·  FAQs and announcements: general FAQ, about review, about presentation.

·  Computer Vision: Algorithms and Applications, Richard Szeliski, 2010.

·  Computer Vision:  Models, Learning, and Inference, Simon J.D. Prince, 2012.


Topics and Materials (Tentative, will be revised FREQUENTLY)

Additional Readings

Week 1

General introduction. (Slide 1, 2)

·  Background

·  Topics in visual data analysis

·  Applications

·  Related fields
·  Image formation





Week 2

Background – Review of math tools (Slides)

·  Commonly used statistical inference tools (SVM, k-means, boosting, etc.)

·  Linear and nonlinear analysis (PCA, LDA, Manifold, etc.)

·  Graph algorithms (HMM, MRF, etc.)


Materials: (Slides)

·  Szeliski's book, Appendix A-C


(Mock) Paper presentation (review assignment):

· [Presenter: Peng Chu] Densely Connected Convolutional Networks, Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger, CVPR 2017.


Quiz 1.

·  Additive logistic regression: a statistical view of boosting, J. Friedman, T. Hastie, and R. Tibshirani, The Annals of Statistics 2000.

·  Nonlinear Dimensionality Reduction by Locally Linear Embedding, Sam T. Roweis, Lawrence K. Saul, Science, 2000.

·  A Global Geometric Framework for Nonlinear Dimensionality Reduction, Joshua B. Tenenbaum, Vin de Silva, John C. Langford, Science, 2000.





Last day to add or drop a Full Term 16-week course (seriously)


Week 3


Paper selection due!

Background - Optimization & Learning (Slides)

·  Basic idea and linear optimization

·  Convex optimization

·  Camera model


·  Convex optimization by Boyd and Vandenberghe.






Week 4




Background and Geometry (Slides)

·  Camera model, calibration

·  Multiview geometry

·  Introduction to 3D reconstruction



·  Szeliski's book, Ch. 1-2

·  Prince's book, Ch. V

Paper presentation (review assignment):

· [Presenter: Nooreen Dabbish] EPnP: An Accurate O(n) Solution to the PnP Problem, V. Lepetit, F. Moreno-Noguer, and P. Fua, IJCV 2009.

· [Presenter: Marija Stanojevic] Building Rome in a Day. S. Agarwal, N. Snavely, I. Simon, S. M. Seitz and R. Szeliski, ICCV 2009.

· [Presenter: Xinyi Li] Sparse to Dense 3D Reconstruction From Rolling Shutter Images, Olivier Saurer, Marc Pollefeys, Gim Hee Lee, CVPR 2016.


· A flexible new technique for camera calibration, Zhengyou Zhang, PAMI 2000.

· Bundle Adjustment: A Modern Synthesis. B. Triggs, P. McLauchlan, R. Hartley and A. Fitzgibbon, Vision Algorithms: Theory and Practice, 1999.

· Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency, S. Tulsiani, T. Zhou, A. A. Efros, J. Malik, CVPR 2017.

· On-the-Fly Adaptation of Regression Forests for Online Camera Relocalisation. T. Cavallari, S. Golodetz, N. A. Lord, J. Valentin, L. Di Stefano, P. H. S. Torr, CVPR 2017.




Week 5

Background - Deep Learning (Slides)



· Deep Learning, Goodfellow, Bengio, and Courville, 2016, MIT press. http://www.deeplearningbook.org/

· Deep Learning Tutorial, http://ufldl.stanford.edu/tutorial/.


Paper presentation (review assignment)

· [Presenter: Ashis Chanda] Full Resolution Image Compression with Recurrent Neural Networks. G. Toderici, D. Vincent, N. Johnston, S. Hwang, D. Minnen, J. Shor, M. Covell, CVPR 2017.


Additional Readings:

· Reducing the dimensionality of data with neural networks. Hinton, G. E. and Salakhutdinov, R. R, Science, 2006.

· Visualizing and Understanding Convolutional Networks, M. D. Zerler and B. Fergus, ECCV 2014.

· Deep learning. Y. LeCun, Y. Bengio, and G. Hinton, Nature 2015.

· Long Short-Term Memory. Hochreiter, Sepp; Schmidhuber, Jürgen. Neural Computation, 1997.

· Global Optimality in Neural Network Training, Benjamin D. Haeffele, Rene Vidal, CVPR 2017.




Week 6

Segmentation (Slides)

·  Clustering, perceptual grouping, segmentation



·  Szeliski's book, Ch. 5

Paper presentation (review assignment):

· [Presenter: Heng Fan] Learning Hierarchical Features for Scene Labeling, C.  Farabet, C. Couprie, L. Najman and Y. LeCun. PAMI, 2013.

· [Presenter: Peter Isaev] Fully Convolutional Networks for Semantic Segmentation. J. Long, E. Shelhamer, T. Darrell, CVPR 2015.


Additional Readings:

· Stochastic relaxation, gibbs distributions, and the bayesian restoration of images, S. Geman and D. Geman. PAMI, 1984.
· Normalized Cuts and Image Segmentation, J. Shi and J. Malik, PAMI 2000

· Fast Approximate Energy Minimization via Graph Cuts, Y. Boykov, O. Veksler and R. Zabih, PAMI, 2001.

· Efficient Graph-Based Image Segmentation, P.F. Felzenszwalb and D. P. Huttenlocher, IJCV, 2004.

· Image Parsing: Unifying Segmentation, Detection, and Object Recognition. Z. Tu, X. Chen, A. Yuille, and S-C Zhu, IJCV 2005.

· SLIC superpixels compared to state-of-the-art superpixel methods, R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk, PAMI, 2012.




Week 7

Features, Representation, and Matching (Slides, slides)

· Invariant features and representations

· Matching: features, surface, and images



·  Szeliski's book, Ch. 4 - 6


Paper presentation (review assignment)

· [Presenter: Sidra Hanif] NetVLAD: CNN Architecture for Weakly Supervised Place Recognition. R Arandjelovic, P Gronat, A Torii, T Pajdla, J Sivic, CVPR 2016.

· [Presenter: Branimir Ljubic] 3D Point Cloud Registration for Localization Using a Deep Neural Network Auto-Encoder. G. Elbaz, T. Avraham, A. Fischer. CVPR 2017.

· [Presenter: Ziyu Yang] Learning Deep Representation for Face Alignment with Auxiliary Attributes. Z. Zhang, P Luo, C. Chen, X. Tang, PAMI 2016.

Additional Readings:

· Distinctive image features from scale-invariant keypoints," D. Lowe, IJCV 2004.

· Improving the Fisher kernel for large-scale image classification. F. Perronnin, J. Sanchez, and T. Mensink. ECCV 2010.

· Learning to Assign Orientations to Feature Points. K. M. Yi, Y. Verdie, P. Fua, V. Lepetit, CVPR 2016.


· Iterative Point Matching for Registration of Free-Form Curves and Surfaces, Z. Zhang, IJCV, 1994.

· A tensor-based algorithm for high-order graph matching, O. Duchenne, F. Bach, I. Kweon, and J. Ponce, PAMI, 2011.
· Alignment by Maximization of Mutual Information, by P. Viola and W. M. Wells III, IJCV, 1997

· SIFT Flow: Dense Correspondence across Scenes and its Applications, C. Liu, J. Yuen, A. Torralba, PAMI, 2011.

· Comparative Evaluation of Hand-Crafted and Learned Local Features. J. L. Schonberger, H. Hardmeier, T. Sattler, M. Pollefeys. CVPR 2017.




Week 8

Visual Recognition

·  Shape recognition (Slides): shape space, shape matching

·  Faces (Slides): FacEigenface, EBGM, ASM


·  Szeliski's book, Ch. 14


Paper presentation (review assignment):

· [Presenter: Janki Kansara] 3D Shape Attributes. D. F. Fouhey, A. Gupta, A. Zisserman, CVPR 2016.

· [Presenter: Yifan Wu] Hybrid Deep Learning for Face Verification. Y. Sun, X. Wang, X. Tang PAMI 2016.

· [Presenter: Sarah Lehman] FaceNet: A Unified Embedding for Face Recognition and Clustering, F Schroff, D Kalenichenko, and J Philbin, CVPR 2015.

Additional Readings:

· Active shape models - their training and application, T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, CVIU, 1995.

· Shape Matching and Object Recognition Using Shape Contexts, S. Belongie, J. Malik and J. Puzicha, PAMI, 2002.


· Lambertian Reflectance and Linear Subspaces, R. Basri and D. Jacobs, PAMI 2003.

· Robust Face Recognition via Sparse Representation, J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, PAMI'09





Week 9


Midterm - Project proposal due

·  Electronic version due before class (email to me before the class starts)

·  No presentation needed

·  Template, using CVPR 2017 template, available at


·  Requirement:

-        Strictly following CVPR template above, including font and page sizes

-        Should contain at least the following sections: (1) Introduction, (2) Proposed research, (3) Evaluation plan, and (4) References

-        Minimum page request 2 pages, this does NOT include the references.


General Image Recognition

·  Bag-of-words

·  Deep learning-based recognition


·  Szeliski's book, Ch. 14


Paper presentation (review assignment):

· [Presenter: Sijia Yu] Deep Residual Learning for Image Recognition. K.  He, X. Zhang, S. Ren, J. Sun, CVPR 2016.

· [Presenter: Hongzheng Wang] Dynamic Image Networks for Action Recognition. H. Bilen, B. Fernando, E. Gavves, A. Vedaldi, S. Gould, CVPR 2016.

Additional Reading:

· The Pyramid Match Kernel: Efficient Learning with Sets of Features.  K. Grauman and T. Darrell.  Journal of Machine Learning Research (JMLR), 8 (Apr): 725--760, 2007.

· Tutorial by L. Fei-fei, R. Fergus, and A. Torralba.

· Alex Krizhevsky, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012.


· Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. Jianlong Fu, Heliang Zheng, Tao Mei, CVPR 2017.





Week 10

Object detection (Slides)

·  Human detection

·  Face detection

·  General object detection

Paper presentation (review assignment):

· [Presenter: Fan Yang] Object Detection with Discriminatively Trained Part Based Models, P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, PAMI, 2010.

· [Presenter: Haifu Ge] Region-based convolutional networks for accurate object detection and segmentation, R. Girshick, J. Donahue, T. Darrell, J. Malik. CVPR 2014.

· [Presenter: Rajorshi Biswas] YOLO9000: Better, Faster, Stronger. J. Redmon, A. Farhadi, CVPR, 2017.

· [Presenter: Chenglong Fu] Three-Dimensional Object Detection and Layout Prediction Using Clouds of Oriented Gradients. Z. Ren, E. B. Sudderth, CVPR 2016.

Additional Reading:

· Robust Real-time Object Detection, P. Viola and M. Jones, IJCV 2002.

· Pedestrian detection: An evaluation of the state of the art. P. Dollar, C. Wojek, B. Schiele, and P. Perona. PAMI, 2011.

· SSD: Single Shot MultiBox Detector. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, A. Berg, ECCV 2016.

· Fast, Accurate Detection of 100,000 Object Classes on a Single Machine, T. Dean, M. A. Ruzon, M. Segal, J. Shlens, S. Vijayanarasimhan, and J. Yagnik, CVPR 2013




Week 11

 Tracking (Slides)

· Single target tracking

· Multiple target tracking

· General tracking



· Object Tracking: A Survey, A. Yilmaz, O. Javed, and M. Shah, ACM J. of Computing Surveys, 2006.

Paper presentation (review assignment):

· [Presenter: Suhan Jiang] High-Speed Tracking with Kernelized Correlation Filters, J. F. Henriques, R. Caseiro, P. Martins, J. Batista, PAMI 2015.

· [Presenter: Sheng Zhang] Multi-target Tracking by Lagrangian Relaxation to Min-Cost Network Flow. A. A. Butt and R. T. Collins, CVPR 2013.

· [Presenter: Abrar Alrumayh] Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields. Z Cao, T Simon, S-E Wei, Y Sheikh, CVPR 2017.


Additional Reading:

· CONDENSATION - conditional density propagation for visual tracking, M. Isard and A. Blake, IJCV 1998.

· R. Collins. Multitarget data association with higher-order motion models. CVPR, 2012

· Multiple object tracking using k-shortest paths optimization. J. Berclaz, F. Fleuret, E. Turetken, and P. Fua. PAMI, 2011.

· Multi-Target Tracking by Online Learning a CRF Model of Appearance and Motion Patterns, B. Yang, R. Nevatia, IJCV 2014




Week 12

Simultaneous Localization and Mapping (SLAM) (Slides)

· Visual SLAM

· Multi-sensor SLAM



· SLAM for Dummies. A Tutorial Approach to Simultaneous Localization and Mapping. S. Riisgaard and M. R. Blas.

· Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age, C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J.J. Leonard, IEEE T. Robotics, 2016.


Paper presentation (review assignment):

· [Presenter: Jiayi Hu] Isometric Non-Rigid Shape-From-Motion in Linear Time. S Parashar, D Pizarro, A Bartoli, CVPR 2016.

· [Presenter: Ge Deng] ORB-SLAM: A Versatile and Accurate Monocular SLAM System. R. Mur-Artal, J. M. M. Montiel and J. D. Tardós, IEEE T. Robotics, 2015.

· [Presenter: Yubin Duan] Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data. J Janai, F Guney, J Wulff, M J Black, A Geiger, CVPR 2017.


Additional Reading:

· Parallel Tracking and Mapping for Small AR Workspaces, G. Klein and D. Murray, ISMAR, 2007.

· DTAM: Dense Tracking and Mapping in Real-Time, R. A. Newcombe, S. J. Lovegrove and A. J. Davison, ICCV 2011.

· LSD-SLAM: Large-Scale: Direct Monocular SLAM. J. Engel, T. Schops, D. Cremers, ECCV 2014.

· CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction, K Tateno, F Tombari, I Laina, N Navab, CVPR 2017.

· KinectFusion: Real-Time Dense Surface Mapping and Tracking. R. A. Newcombe, et al., ISMAR 2011.




Week 13

Advanced Topics

· Augmented Reality (Slides)



· The History of Mobile Augmented Reality, C. Arth, L. Gruber, R. Grasset, T. Langlotz, A. Mulloni, D. Schmalstieg, and D. Wagner, arxiv.org/abs/1505.01319.


Paper presentation (review assignment):

· [Presenter: Haotian Chi] A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation. A. I. Mourikis and S. I. Roumeliotis, ICRA 2007.

· [Presenter: Bingyao Huang] Holoportation: Virtual 3D Teleportation in Real-time. S. Orts-Escolano, et al., UIST 2016.


Final - Project presentation





Week 14

Advanced Topics

· Medical image analysis (Slides)


Final - Project presentation






Project report due!