cs231n - 이해하기 2

cs231n

  • http://cs231n.stanford.edu/

이 포스팅은 딥러닝에 대한 기본 지식을 상세히 전달하기보다는 간략한 핵심과 실제 모델 개발에 유용한 팁을 위주로 정리하였습니다.

Detection and Segmentation

1) semantic segmentation :

  • sliding window
  • Fully convolutional : labeling class per every pixel
    • downsampling and upsampling : how to upsampling(unpooling)
      • nearest neighbor
      • bed of nails
      • max unpooling(remember which element was max)
      • Transpose Convolution

2) classification + localization :

  • class score : softmax loss
  • box coordiantes(x, y, w, h) : L2 loss
    • treat localization as a regression problem

multimodal - how to determine weight of two different loss function? loss값 외에 다른 지표를 참고

  • aside : Human pose estimation

3) Object Detection

  • fixed set of categories and draw box location
  • benchmark dataset : PASCAL VOC
  • each image needs a different number of outputs - not easy to solve with regression
  • sliding window : crops of the image, CNN classifies each crop as object or background
    • how to choose crop? need to apply CNN to huge number of locations and scales very expensive
  • region proposals : selective search gives 1000 region proposal -> brute force but high recall
  • R-CNN
    • region of interest(RoI) from a proposal method (~2k)
    • Warped image regions
    • forward each region through convNet
    • classify regions with SVMs
    • Box regression

      slow train and inference

  • fast R-CNN
    • Forward whole image through ConvNet
    • RoIs from proposal method on convnet feature map of image
    • RoI pooling layer
    • fully connected
    • classification and regression
  • faster R-CNN
    • make CNN do proposals
    • insert region proposal network(RPN) to predict proposals from features
    • jointly train with 4 lossess

  • detection without proposals : YOLO / SSD

4) Instance Segmentation

  • Mask R-CNN - similar to faster R-CNN
    • can also does pose : add joint coordinates
  • bechmark data : microsoft coco data

Visualizing and Understanding

what’s going on inside ConvNets? What are the intermediate features looking for?

  • visualize the filters : raw weights
    • not that interesting
  • Last Layer :
    • check Nearest Neighbors in faeture space(last fc layer)
    • dimensionality reduction : PCA, t-SNE
  • Occlusion Experiments : 부분적으로 마스크함
    • mask한 영역으로 인해 확률이 극격히 변화면 해당 영역은 크리티컬하다고 가정
  • saliency maps : 이미지의 각 픽셀들에 대해서 클래스 스코어의 그래디언트를 구함. compute gradient of class score with respect to image pixels
    • intermediate feature via guided backprop : which part of image impact to intermediate activation value
    • relu : positive gradient만 이용하면 더 나이스 이미지를 얻을수 있다
  • gradient ascent : 지금까지는 보통의 백프로파게이션을 통해 이미지의 어떤 부분이 뉴련에 영향을 주는지 알아봤다면(고정된 입력 이미지 값), 그래디언트 어센트는 뉴런의 액티베이션을 최대화하는 방향으로 이미지를 만들어내는 것임(입력 이미지 값을 생성하는 것)
    • generate a synthetic image that maximally activates a neuron
    • better regrularizer (image prior regualarization)
    • optimize in FC6 latent space instead of pixel space
  • Fooling Image
    • 엘리퍼튼 이미지를 고르고
    • 코알라 클래스 스코어를 골라
    • 코알라 클래스 스코어를 최대화하도록 이미지를 모디파이
    • 네트워크가 코알라로 분류할때까지 반복

  • DeepDream
    • choose an image and a layer in a CNN : repeat:
      • Forward : compute activations at chosen layer
      • set gradient of chosen layer equal to its activation
      • backward : compute gradient on image
      • update image
  • Feature Inversion : 피쳐벡터를 뽑고, 그 피쳐벡터에 매칭되는 다른 입풋 이미지를 만들어냄

  • Texture sythesis : given a sample patch of some texture, can we genrate a bigger image of the same texture?
    • classical approch : nearest
    • neural texture synthesis : gram matrix
  • Style Transfer

  • slow : train another model to transfer style
  • fast style transfer

Generative Models

  • addresses density estimation
  • generative models of time-series data can be used for simulation and planning

  • Fully visible belief network

  • 픽셀의 오더를 어떻게 결정하지? –> pixelRNN

  • pixelRNN
    • 코너에 있는 픽셀부터 다이어고날 방향으로 시퀄셜로 학습 using RNN (LSTM)
    • sequtional is slow
  • pixelCNN
    • 코너에 있는 픽샐부터 시작하는 것은 같으나
    • context region(previous pixels)으로부터 모델링되는 것
    • training is faster but generation must still process sequentially
  • Variational auto-encoder
    • intractible to compute p(x|z) for every z
    • in addition to decoder pθ(x|z), define additional encoder qφ(z|x) that approximates pθ(z|x)

\[\begin{align} log p_\theta(x^{(i)}) & = E_z[log p_\theta(x^{(i)}|z)] - D_{KL}(q_\phi(z|x^{(i)}||p_\theta(z))) + D_{KL}(q_\phi(z|x^{(i)}||p_\theta(z|x^{(i)}))) \\ \\ E_z[log p_\theta(x^{(i)}|z)] & \ reconstruct \ the \ input \ data \\ \\ D_{KL}(q_\phi(z|x^{(i)}||p_\theta(z))) & \ make \ approximate \ posterior \ distribution \ close \ to \ prior \\ \end{align}\]
  • GAN

reinforce learning

Categories:

Updated:

Comments