Adaptation By Components

Johns Hopkins University
ICLR, 2024 & CVPR, 2024

In order to solve the problem of unsupervised domain adapation in visual domain, we hypothesize the idea of Domain Transfer with Compositionality or ’Adaptation by Components’: The fundamental idea is that objects are made of parts or components and in the annotated source domain, we can learn the spatial patterns as well as the appearance of the object category components. This is because we can often estimate the viewpoint of the object. We can, like humans, then leverage robust parts of these objects to transfer our source task knowledge to the nuisance-ridden domain.

A Bayesian Approach to OOD Robustness in Image Classification

An important and unsolved problem in computer vision is to ensure that the algorithms are robust to changes in image domains. We address this problem in the scenario where we have access to images from the target domains but no annotations. Motivated by the challenges of the OOD-CV benchmark where we encounter real world Out-of-Domain (OOD) nuisances and occlusion, we introduce a novel Bayesian approach to OOD robustness for object classification. Our work extends Compositional Neural Networks (CompNets), which have been shown to be robust to occlusion but degrade badly when tested on OOD data. We exploit the fact that CompNets contain a generative head defined over feature vectors represented by von Mises-Fisher (vMF) kernels, which correspond roughly to object parts, and can be learned without supervision. We obverse that some vMF kernels are similar between different domains, while others are not. This enables us to learn a transitional dictionary of vMF kernels that are intermediate between the source and target domains and train the generative model on this dictionary using the annotations on the source domain, followed by iterative refinement. This approach, termed Unsupervised Generative Transition (UGT), performs very well in OOD scenarios even when occlusion is present. UGT is evaluated on different OOD benchmarks including the OOD-CV dataset, several popular datasets (e.g., ImageNet-C [9]), artificial image corruptions (including adding occluders), and synthetic-to-real domain transfer, and does well in all scenarios outperforming SOTA alternatives (e.g. up to 10% top-1 accuracy on Occluded OOD-CV dataset).

Video Presentation

Poster

BibTeX

@inproceedings{kaushik2024bayesian,
        title={A Bayesian Approach to OOD Robustness in Image Classification},
        author={Kaushik, Prakhar and Kortylewski, Adam and Yuille,  Alan},
        booktitle={41st IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        year={2024},
        organization={IEEE}
      }
      

Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation

We consider the problem of source-free unsupervised category-level pose estimation from only RGB images to a target domain without any access to source domain data or 3D annotations during adaptation. Collecting and annotating real-world 3D data and corresponding images is laborious, expensive, yet unavoidable process, since even 3D pose domain adaptation methods require 3D data in the target domain. We introduce 3DUDA, a method capable of adapting to a nuisance-ridden target domain without 3D or depth data. Our key insight stems from the observation that specific object subparts remain stable across out-of-domain (OOD) scenarios, enabling strategic utilization of these invariant subcomponents for effective model updates. We represent object categories as simple cuboid meshes, and harness a generative model of neural feature activations modeled at each mesh vertex learnt using differential rendering. We focus on individual locally robust mesh vertex features and iteratively update them based on their proximity to corresponding features in the target domain even when the global pose is not correct. Our model is then trained in an EM fashion, alternating between updating the vertex features and the feature extractor. We show that our method simulates fine-tuning on a global pseudo-labeled dataset under mild assumptions, which converges to the target domain asymptotically. Through extensive empirical validation, including a complex extreme UDA setup which combines real nuisances, synthetic noise, and occlusion, we demonstrate the potency of our simple approach in addressing the domain shift challenge and significantly improving pose estimation accuracy.

Poster

BibTeX

@inproceedings{
        kaushik2024sourcefree,
        title={Source-Free and Image-Only Unsupervised Domain Adaptation for Category Level Object Pose Estimation},
        author={Prakhar Kaushik and Aayush Mishra and Adam Kortylewski and Alan Yuille},
        booktitle={The Twelfth International Conference on Learning Representations},
        year={2024},
        url={https://openreview.net/forum?id=UPvufoBAIs}}