Assistive Screening is an interesting field which has gained popularity among startups and research labs focused on Artificial Intelligence / Machine Learning. With a vision of reducing the effective time for the screening process, the algorithms developed aim to use the knowledge of medical experts for identifying important and useful data from the screening process along with a predicted diagnosis of a sample. Image Processing and Computer Vision algorithms are natural solutions to medical use cases. With the advent of Deep Learning along with GPU technologies becoming more economical and embedded into portable devices, and just like how every field in general, Medical Image processing has gained renewed interest among researchers.

Figure 1: Sample PAP-smear images and annotations  fromAIndra Systems dataset

Particularly in the healthcare sector, the introduction of DNN led to the automation of several tasks like PAP-smear image analysis (sample image is shown in Figure 1), MRI image analysis, dermoscopic analysis, analysis of retinal images, etc. Recent articles [1][2] show superior performance of DNN algorithm vs dermatologists.

Figure 2: performance of a deep neural net when trained with noisy MNIST labels [2]

An important fuel for the DNN is the training images. Nowadays the biomedical image acquisition techniques are more advanced, hence getting enough number of high-quality images is very easy. But the major bottleneck is to get the annotations for all these images to train the network. Getting that many labelled images for biomedical images are challenging in terms of manpower and cost. The main reason being the labelling task requires experts and also that most of them work part-time on the said task.               

An important issue with the annotations of biomedical images is, they are highly subjective.  Due to this reason, these images are annotated by multiple experts [3]. We would, in fact, expect the neural networks to perform much better than the human experts. As per the literature [2], the agreement between different experts is only 55.4% and an expert concurring to himself after a second look is only 70.7%.            


Figure 2 shows the performance of a deep neural net when trained with noisy MNIST labels [2]. This demonstrates that the performance of a neural net is not limited by the accuracy of the experts, provided the expert’s errors are random. One obvious question is how many noisily labelled training examples are worth a correctly labelled training example. In machine learning, majority of the methods consider the consensus among the annotators to train a model. But there can be useful information in the disagreement annotations as well, which can be useful to improve the network performance. The challenge here lies in incorporating the information so that we can build an algorithm which is not biased towards a particular expert. With image recognition algorithms exceeding human level performance [4], there is definitely scope for much more improved and engineered algorithms in the field of Medical Image Processing. Exciting times ahead ..!!!




[1] Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists

[2] Dermatologist-level classification of skin cancer with deep neural networks

[3] Who Said What: Modeling Individual Labelers Improves Classification

[4] Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification


Leave a Reply

Your email address will not be published. Required fields are marked *