Published on 2023-02-05
Computer VisionHuman Brain is amazing! We can recognize shapes, colors and patterns almost instantly. Computing devices does not! That's why we, humans have to teach them image analysis algorithms. This post is part of Computer Vision Series.
If you are into photography, you will know that every digital image contains pixels. For most images, pixel values are integers that range from 0 (black) to 255 (white). For grayscale images, the pixel value is typically an 8-bit data value (with a range of 0 to 255) or a 16-bit data value (with a range of 0 to 65535). For color images, there are 8-bit, 16-bit, 24- bit, and 30-bit colors. The 24-bit colors are known as true colors and consist of three 8-bit pixels, one each for red, green, and blue intensity.
Binary Image is exactly as it sounds. Being binary, it only contains two pixel values, often 0 for black, and either 1 or 255 for white. You might wonder why would we need to convert the colorful image into two pixel values Image. When transformed into 2 pixel-values image (1 pixel-value as background and 1 pixel-value as foreground), we could clearly see the objects without disturbance of colors. Hence, binary Image is especially useful for shape identification, documents processing, parts counting and inspections.
Now that, we have cleared up the definition of Binary Image. We will look at how to get started with image analysis: 1. Converting an image into a binary image, affectionately known as Thresholding 2. Basic Morphological filtering for eliminating noises 3. Counting objects and Labeling (Connected Component Analysis) 4. Features extractions - centroid - perimeter - area - bounding box - roundness
In order to convert an image into a binary image i.e. we want an image separating foreground and background, we need to first identify the pixel values frequency in the image. Normally, background would have a different color than foreground (objects) in the image. Background color would be situated into one or similar pixel value ranges. So, when converting an image into a binary image, we need to know the range of pixel value in the image (color intensity). If we plot a histogram with pixel values and their occurrences in the image, we would know the distribution. Once we know the distribution, we can decide on a threshold value to convert the image into binary image!
Before all that, how does a computing device read an image? Let's start with PGM (Portable Gray Map). A PGM image represents a grayscale graphic image and you can open a pgm image with a note editor of your choice. It follows below format. Line1: A "magic number" for identifying the file type. A pgm image's magic number is the two characters "P2" then Whitespace (blanks, TABs, CRs, LFs). Line2: A width, formatted as ASCII characters in decimal. Whitespace. A height, again in ASCII decimal. Line3: The maximum gray value (Maxval), again in ASCII decimal. Must be less than 65536, and more than zero. A single whitespace character (usually a newline). And pixel values
Ok, now that, you get the idea of PGM. We will look at reading PGM file with Python. There are many libraries you can use for Image Processing. OpenCV is one of the popular ones. But we won't use any of the libraries as we want to understand what happens behind and not clouded with abstraction level libraries.
Alright, here is the code snippet written in pure python to read PGM file.
#0. Read the PGM and output the image information in a dict object
# name: the name of the file
# cols: no. of columns in the image file i.e height of the image
# rows: no. of rows in the image file i.e. length of the image
# max_gray_value: maximum gray value of the input image file
# pixels: image's pixels value split into 2d arrays in the format of [Row][Col]
def readPGMImage(filename):
with open(filename, "r") as f:
# Read the first line and check if it's the PGM magic number
magic_number = f.readline().strip()
if magic_number != "P2":
raise ValueError("Not a PGM file")
# Read the next line and split it into the width and height of the image
size = f.readline().strip().split()
cols = int(size[0]) # width
rows = int(size[1]) # height
# Read the next line and use it as the maximum gray value
max_gray_value = int(f.readline().strip())
# Read the rest of the file into a list of pixel values
pixel_raw = [int(x) for x in f.read().strip().split()]
# Convert the list of pixel values into a 2D list of pixels in the position of pixel = [Row][Column]
pixels = [pixel_raw[i:i + cols] for i in range(0, len(pixel_raw), cols)]
return dict(name = filename, cols = cols, rows = rows, max_gray_value = max_gray_value, pixels = pixels)
# Usage example
imgInfo = readPGMImage("image1.pgm")
print(f"\n==== Image Information ====")
print(f"Name of the file: {imgInfo['name']}")
print(f"Pixels Dimension (width x height) = {imgInfo['cols']} x {imgInfo['rows']}")
print(f"Max Gray Value = {imgInfo['max_gray_value']}")
Now that, we have our pixel values (stored as imgInfo.pixels
), we can plot Image Histogram.
Image Histogram is a graph of pixel-intensity distribution. The values are usually normalized from 0 to 1. From the image histogram, we can deduce the contrast and object identifications. If it is a one tailed distribution, image contrast is low (not good!). The other useful aspect of image histogram is determining the threshold value.
Threshold value is the decision pixel value determining which pixel values would be background and which pixel values would be foreground. For example, in the range of 0 to 255, if a pixel value is greater than threshold value, say = 125, set it to 0 otherwise, change the pixel value to 1. Now, we can have a binary image with just two pixel values 0s and 1s.
The most popular and widely used method for image thresholding is Otsu's method. It finds the threshold value that minimizes the intra-class variance between the background and foreground pixels.
Here is the code snippet written in pure python for finding thershold value from input image
def getThresholdValuefromImgHistogram(pixels, max_gray_value):
import matplotlib.pyplot as plt #optional
# Initialization
size = max_gray_value + 1
x = range(size)
# Loop the pixel value and count the occurrence of pixel value
y = [0 for i in range(size)]
for row in pixels:
for value in row:
y[value] += 1
# Plot the histogram chart (optional)
plt.plot(x, y)
plt.xlabel("Pixel Value")
plt.ylabel("Number of pixels")
plt.title("Image Histogram")
plt.show()
# Normalization
_bin = [count / len(pixels) / len(pixels[0]) for count in y]
# Implementing Otus's Method
# Compute the cumulative sum
cumulative_sum = [0] * (size) # initialize the array size of max_gray_value and fill with zeros
cumulative_sum[0] = _bin[0] # set the first value
for i in range(1, size): # loop through the _bin and compute the cumulative sum
cumulative_sum[i] = cumulative_sum[i - 1] + _bin[i] # total sum at the final element is 1! (because of normalization)
# Compute the cumulative mean
cumulative_mean = [0] * (size) # initialize the array size of max_gray_value and fill with zeros
for i in range(1, size):
cumulative_mean[i] = cumulative_mean[i - 1] + i * _bin[i]
# Compute the maximum between-class variance
max_variance = -1
threshold = -1
for t in range(1, len(_bin)):
if (cumulative_sum[t] != 0 and cumulative_sum[t] != 1):
background_mean = cumulative_mean[t] / cumulative_sum[t]
foreground_mean = (cumulative_mean[max_gray_value] - cumulative_mean[t]) / (1 - cumulative_sum[t])
variance = cumulative_sum[t] * (1 - cumulative_sum[t]) * (background_mean - foreground_mean) ** 2
if variance > max_variance:
max_variance = variance
threshold = t
return threshold
Got it? The result of above code is a plotted image histogram and threshold value which for image1 is 175. Image Histogram
Once we get our threshold value, we just simply convert the pixel values to either 0 or 1 by checking the pixel value > threshold_value.
def transformIntoBinaryImage(thresholdValue, pixels, max_gray_value):
return [[0 if x > thresholdValue else max_gray_value for x in row] for row in pixels]
## Example Usage
thresholdValue = getThresholdValuefromImgHistogram(imgInfo["pixels"],imgInfo["max_gray_value"] )
thresholded_pixels = transformIntoBinaryImage(thresholdValue, imgInfo["pixels"], imgInfo["max_gray_value"])
Let's see how does our threshold-ed image (binary image) look like:
You will notice that background is now black and foreground (hearts, numbers, spade) is white. Do you also see the white dot around rows 150 mark? This is the noise we need to clean from image.
In next post, we look at Basic Morphological Operations for filtering noises in the image.
• Computer Vision: Algorithms and Applications by Richard Szeliski • Computer Vision: A modern approach, Forsyth • Computer Vision by Linda G. Shapiro