Image Style Transfer

图像转换(Style transfer)一直是个让人感到新颖的主题，本文利用CNN(Convolutional Neural Networks)的方式进行图片的风格转换，并藉由调整参数来决定原图像与转换风格后的相似程度，细节将在本文陆续说明，这里分享自己实作上的过程与结果

简介

图像转换(Style transfer)最早可追溯到2015年Gatys 等人所发表的 A Neural Algorithm of Artistic Style，
他们所採用的方式是利用VGG(Visual Geometry Group)模型进行图像的特徵提取，关键在于提取出来的特徵分为content 和 style features，所谓 content是指一张图像的大致轮廓，而style是指图像中更细节的资讯(像是纹理、对比度、方向性等)，因此只要将原图像的content成分取出，搭配欲产生的风格照片之style进行结合，透过loss函数的设计在这两着间达成平衡，便能合成出具有content和style成分的图象。

技术与原理

整个模型的主要核心在于如何体取出图像中的content和style的特徵，接着透过增加图像预处理(image preprocessing)，以及尝试不同的模型架构、learning rate的选择、调整loss参数达到最佳的合成效果

(引用自参考资料[3])

图像预处理

这里尝试先将图像进行缩放和归一化，使用pytorch中的transform套件进行缩放，并转为tensor的形式，接着在image 的部分即是将原图像套用到transform定义好的缩放方式，并从原来的(512,512,3)在第0维上新增一个维度，形成(1,512,512,3)的四维向量，目的是为了方便后续进行特徵(features)的堆叠。另外这里的(512,512,3)分别代表图像的512x512的像素及RGB三颜色(通道数)。

def image_loader(path,is_cuda=False):    image=Image.open(path)    loader=transforms.Compose([transforms.Resize((512,512)),transforms.ToTensor()])    image=loader(image).unsqueeze(0)    return image.to(device,torch.float)

模型架构

首先从模型架构来说，我採用VGG19的预训练(pre-training)模型，直接利用前一大段的CNN架构来加快模型收敛时间。这里只保留VGG前30层[3]，其中把有需要处理的层别其对应的索引值分别为'1','3','8','13','20','29'(relu1_1,relu1_2,relu2_2,relu3_2,relu4_2,relu5_2)来提取特徵，原因是希望特徵在线性激活后更譨购抓出图像中重要的部分，依序取出图像的特徵(features)并存在feature box中，直观上可以想像为了让机器学会辨识一张图像的特徵(例如:纹理、边缘等等资讯)，在VGG模型中透过不同层滤波器(filter)所产生的不同特徵图，又称为feature map，而feature box就是收集这些feature map的过程。如下示意图

(引用自参考资料[4])

class VGG(nn.Module):    def __init__(self):        super(VGG,self).__init__()        self.layer_names= ['3','8','13','20']         #Since we need only the 5 layers in the model so we will be dropping all the rest layers from the features of the model        self.model=models.vgg19(pretrained=True).features[:29] #model will contain the first 29 layers            # x holds the input tensor(image) that will be feeded to each layer    def forward(self,x):        features=[]        # features={}        for layer_num,layer in enumerate(self.model):            #activation of the layer will stored in x            x=layer(x)            #appending the activation of the selected layers and return the feature array            if (str(layer_num) in self.layer_names):                features.append(x)            return features

Content features

由于VGG卷积层能够有效提取出各层特徵，并将图像转换为四维向量层层堆叠形成features map，接着进一步定义content loss为原图与content image的均方误差(MSE)，而这里之所以採用MSE的理由是希望计算如下

其中的content image 会预先採用複製原图的方式，并迭代更新content loss。

实作代码如下:

def calc_content_loss(gen_feat,orig_feat):    content_l=torch.mean((gen_feat-orig_feat)**2) #*0.5    return content_l

Style features

在计算图像的style时，採用余弦相似性(Cosine similarity)来计算图像本身的"相似性"，若将图像的各特徵向量化后，那么要评估任意向量间的相似度会变得非常有用。因此，当考虑任意两向量在向量空间中，可透过计算向量的内积来知道，当两向量成90度时，内积为零，意味此两向量彼此毫无相关。

(引用自参考资料[1])

(引用自参考资料[7])

将上述提及的cosine similarity推广到图像处理，相当于进一步计算图像的特徵相关性分布，而这个分布形成的二维方阵称作格拉姆矩阵(Gram matrix)，细节可参考[6]
这里提到的gram matrix是指针对图像在不同通道、像素下(nw,nh,nc)进行相关性(correlation)的计算，也就是说从Gram matrix中的数值大小，能够看出合成图和原图在那些特徵的关係强弱，具体公式如下

对于style loss 的计算，同样计算和原图的均方误差(MSE)估算变异量。代码中使用torch.mm 将先前每一层所储存的feature map进行矩阵相乘运算

代码如下

def calc_style_loss(gen,style):    #Calculating the gram matrix for the style and the generated image    batch,channel,height,width=gen.shape    G=torch.mm(gen.view(channel,height*width),gen.view(channel,height*width).t())    A=torch.mm(style.view(channel,height*width),style.view(channel,height*width).t())    style_l=torch.mean((G-A)**2) #/(4*channel*(height*width)**2)    return style_l

Total Loss

为了让合成的图样产生最佳的效果，势必在content loss和style loss间须取得平衡，因此分别引入α和β作为决定合成图像中content 和style的成分多寡，在求解total loss 的最佳解过程採用梯度下降法(Gradient descent)搭配Adam优化器实现。

代码中首先初始化loss后，去进一步原图分别和content/style image的差异，最初由于

def calculate_loss(gen_features, orig_feautes, style_featues):    style_loss=content_loss=0    for gen,cont,style in zip(gen_features,orig_feautes,style_featues):    content_loss+=calc_content_loss(gen,cont)    style_loss+=calc_style_loss(gen,style)    total_loss=alpha*content_loss + beta*style_loss     return total_loss

训练与结果展示

整体来说，为求加速训练，而非重头去随机产生我们要的合成图，所以这里採用origin_img.clone().requires_grad_(True)将原图直接複製一份作为最终预产生合成图的"範本"，而optimizer决定了"图像"本身的训练，而非"模型"，过程中不断修正调整图像的loss达到收敛，找到最佳的α和β的组合。

gen_img=origin_img.clone().requires_grad_(True)    optimizer=optim.Adam([gen_img],lr=opt.lr)    epoch=200    for e in range (epoch):        gen_features=model(gen_img)         orig_features=model(origin_img)        style_features=model(style_img)         total_loss=calculate_loss(gen_features, orig_features, style_features)        #optimize the pixel values of the generated image and back-propagate the loss        optimizer.zero_grad()        total_loss.backward()        optimizer.step() # update gen_img parameters

经过迭代更新200次后，其实合成出来的图象已经达到不错的效果，由于α和β的权重来决定原图偏向style的程度，若要使合成图更多style的部分，除了调大调β同时也必须增价epoch的训练回合才能达到预期的效果，可观察到训练7000回后，图像主体的颜色及纹路出现大幅度改变，确实效果也更像梵谷星空图风格了

另外，针对不同风格k效果在下图展示了epoch=7000设置α/β=0.01下产生的风格图

最后，完整代码可参考操考资料[5]，欢迎互相交流，不吝指教

参考资料

Neural Networks IntuitionsDeep Residual Learning for Image RecognitionA Neural Algorithm of Artistic StyleDeep Learning & Art: Neural Style Transfermy_githubGram matrix格拉姆矩阵（Gram matrix）详细解读

Image translation (风格转换)

Image Style Transfer

简介

技术与原理

图像预处理

模型架构

Content features

Style features

Total Loss

训练与结果展示

参考资料

关于作者: 网站小编

Image Style Transfer

简介

技术与原理

图像预处理

模型架构

Content features

Style features

Total Loss

训练与结果展示

参考资料

给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1Image translation (风格转换)

2Meta SeamlessM4T 浅嚐翻译蒟蒻的滋味

3[nginx] Url强制改写后缀跳转

4Facebook粉专自动发文以及留言API

5无法连上资料库：Connection refused