Abstract

In this paper, we present a novel approach to synthesize realistic images based on their semantic layouts. It hypothesizes that for objects with similar appearance, they share similar representation. Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations. Conditioning on these features, we propose a dynamic weighted network constructed by spatially conditional computation (with both convolution and normalization). More than preserving semantic distinctions, the given dynamic network strengthens semantic relevance, benefiting global structure and detail synthesis. We demonstrate that our method gives the compelling generation performance qualitatively and quantitatively with extensive experiments on benchmarks.

Talk [Slides]

(To be updated)

Method

Our semantic-composition GAN (SC-GAN) decouples semantic image synthesis into two parts: semantic encoding and stylization. They are realized by semantic vector generator GV (SVG) and semantic render generator GR (SRG), respectively. As shown in above figure, SVG takes the semantic layout S and produces multi-scale semantic vectors in a feature map form (since we treat each feature point as a semantic vector). SRG is to transform a random sampled noise to the final synthesized image with a dynamic network. The key operators (convolution and normalization) in this network are conditionally parameterized by the semantic vectors provided by SVG and a group of weight candidates (shown below).

Paper and Supplementary Material

Yi Wang, Lu Qi, Ying-Cong Chen, Xiangyu Zhang, Jiaya Jia.
Image Synthesis via Semantic Composition.
2021.
(hosted on ArXiv)
(Camera Ready)

[Bibtex]

Acknowledgements

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.