A LEARNED REPRESENTATION FOR ARTISTIC STYLE

Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur
ICLR 2017

Abstract

construct a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings by reducing a painting to a point in an embedding space

Introduction

pastiche: an artistic work that imitates the style of another one
automate pastiche/style transfer: render an image in the style of another one
traditional methods: “grow” textures one pixel at a time using non-parametric sampling of pixels in an examplar image $\to$ “growing” textures one patch at a time $\to$ …
machine learning methods: neural style(expensive) $\to$ feedforward style transfer network (the style transfer network is tied to a single style)
solution: conditional instance normalization(reduces each style image into a point in an embedding space)
A LEARNED REPRESENTATION FOR ARTISTIC STYLE

STYLE TRANSFER WITH DEEP NETWORKS

style transfer: finding a pastiche image $p$ whose content is similar to that of a content image $c$ but whose style is similar to that of a style image $s$ (high-level features in classifiers tend to correspond to higher levels of abstractions for visualizations)
content similarity: distance between high-level features extracted by a trained classifier
style similarity: distance between Gram matrices $G$ of low-level features as extracted by a trained classifier (the artistic style of a painting may be interpreted as a visual texture)
neural style:

min_{p} L (s, c, p) = λ_{s} L_{s} (p) + λ_{c} L_{c} (p)

feed-forward method: style transfer network $T : c \mapsto p$
A LEARNED REPRESENTATION FOR ARTISTIC STYLE
the network T is tied to one specific painting style

$N$ -STYLES FEEDFORWARD STYLE TRANSFER NETWORKS

intuition: many styles probably share some degree of computation
train a single conditional style transfer network $T (c, s)$ for $N$ styles
to model a style, it is sufficient to specialize scaling and shifting parameters after normalization to each specific style
all convolutional weights of a style transfer network can be shared across many styles
it is sufficient to tune parameters for an affine transformation after normalization for each style

conditional instance normalization: transform a layer’s activations $x$ into a normalized activation $z$ specific to painting style $s$

z = γ_{s} \frac{x - μ}{σ} + β_{s}

μ, σ

x

’s mean and standard deviation taken across spatial axes

γ_{s}, β_{s}

: obtained by selecting the row corresponding to

s

in the

γ

and

β

matrices

A LEARNED REPRESENTATION FOR ARTISTIC STYLE

integrating an

N + 1

-th style to the network
原理很简单

EXPERIMENTAL RESULTS

METHODOLOGY

the same network architecture as in “Perceptual losses for real-time style transfer and super-resolution”
train the $N$ -style network with stochastic gradient descent using the Adam optimizer

Discussion

in the case of art stylization when posed as a feedforward network, it could be that the specific network architecture is unable to take full advantage of its capacity: pruning the architecture leads to qualitatively similar results;
the convolutional weights of the style transfer network encode transformations that represent “elements of style”