Machine Learning

10:20 - 29 11 2018


 

Deep learning based Super Resolution for object Recognition (DSR2)

Ho Phuoc Tien
 

Actually, we focus on the problem of super resolution to enhance the performance of object recognition. In fact, researches on visual object recognition have recently obtained many successes thanks to deep neural networks. Most of these results are based on predefined datasets, in which there are one or few conspicuous objects in an image. Other works, for example Faster R-CNN, show interesting results on complex natural images, but they are likely to miss small objects.

Small size or low resolution of an object is a huge obstacle for computer to recognize it correctly. Our idea is to first apply a super resolution step in order to obtain a higher resolution image; it is then in this better quality image that we hope to recognize the correct object category…
 

Deep Laplacian Pyramid Network for Text Images Super-resolution

Hanh T M Tran and Ho Phuoc Tien

Super resolution (SR) is an important topic in image processing and computer vision thanks to its ability to generate pleasing images and facilitate the tasks of detection and recognition

Convolutional neural networks (CNNs) have recently demonstrated interesting results for single image super-resolution. However, these networks were trained to deal with super-resolution problem on natural images.

In this project, we adapt a deep network, which was proposed for natural images super-resolution, to single text image super-resolution.

  • Different from previous works on SR of text images, we consider multiple SR scales.
  • We propose to combine different losses (i.e, Gradient Difference Loss (GDL) with L1/L2 loss) to enhance edges in super-resolution image.
  • To evaluate the network, we present a new dataset for text image super resolution. This dataset may be useful for single resolution in a general context since it contains not only paragraphs but also words or sentences in banners and newspaper

Quantitative and qualitative evaluations on our dataset show that adding the GDL improves the super-resolution results.

Code and models are available at https://github.com/t2mhanh/LapSRN_TextImages_git.
 

Neural networks on FPGA

Thang Viet Huynh and Huynh Minh Vu

In recent years, Artificial Neural Networks (ANNs) have been used and implemented widely in the field of pattern recognition. The trend of Internet of Things (IoT) and the rapid rise of smart embedded devices with integrated identification functionality require the integration of smart chips that perform identification functions on the device, as well as satisfying strict requirements on speed, performance and power consumption. We aim to contribute to this open issue, through the development of a customizable hardware architecture for neural networks on FPGA with applications in pattern recognition.

In this project, we have developed a customizable hardware architecture to implement ANNs on FPGA. The architecture uses only one single computational layer (SHL_ANN, short for Single Hardware Layer ANN) to perform calculations for the entire multiple-layer ANNs. The neural network's inputs, outputs and weights are represented and calculated with half-precision floating-point number format (16 bits).

For performance evaluation, we use the handwritten digit recognition application based on the MNIST database and perform experiments with FPGA chips from Xilinx.

Code, an implementation and demos are available at https://sites.google.com/site/nnfpga
 

Convolutional neural networks and visual recognition

Visual recognition was long considered as a privilege of the human visual system. Yet, with the recent re-emergence of convolutional neural networks (CNN), they show that computers can now surpass humans in visual object recognition. Many factors contribute to this success; the most important are labelled data, computing resource, and algorithm. The architecture of CNNs constantly evolves and brings about better recognition performance. However, the generalization capability of deep neural networks is still far from human ability. A closer look at the differences between humans and CNNs in object recognition reveals many interesting insights and may help to improve future deep neural networks.

CIFAR10 dataset

Example of a residual network
 

Visual Attention and Saliency Detection

When looking at a scene human eyes concentrate on some specific or salient regions in order to get useful visual information. The ability of instantaneous focus on visually important regions is called visual attention. In the field of computer vision, a large number of researches have tried to simulate visual attention and design algorithms to predict salient regions in an image. Such studies not only help to improve and widen machine vision’s capabilities, but also lead to many interesting applications such as image compression, surveillance, and person re-identification. An effective saliency model is likely to take into account both low-level and high-level visual information. How to combine these two flows of information is worth a detailed consideration. At the same time, a criterion to robustly evaluate the correspondence between a saliency map and its ground truth is also important for generating an effective saliency model.

Image superposed with human fixations (left), human fixation map (centre), and saliency map generated by a saliency model.