by Joan Barreto Ortiz
In my last post, I introduced one of my current research projects involving image analysis. Similarly, this blog will describe a basic image-based analysis used to detect differences in seed size across fine fescues; it can be easily modified and applied to other small seeds or objects. We developed this system for a recent research paper that looks at differences in DNA content and ploidy level across USDA sheep fescue (Festuca ovina) PI accessions. It appeared likely that the ploidy level of the Festuca ovina complex was positively correlated with seed size, but it was very difficult to tell just by looking at small seeds with our own eyes. This system helped us solve the problem and allowed us to find differences that were statistically significant across the fine fescues. A Matlab version of the script can be found in my personal Github account.
The semi-automatic system requires a folder with the images. In this case, we used TIFF (Tagged Image File Format) because this format does not compress the files, it keeps all the image data and quality. We obtained the images by placing ten seeds on a regular flatbed scanner with a red background (Figure 1a) to increase the contrast and facilitate the segmentation. We also scanned a penny (Figure 1b) as a reference to compare the size of fine fescue seeds. These are known as RGB images because they are made of 3 layers: red (R), green (G) and blue (B). Each layer, in this case, is a matrix with 2100 rows and 1620 columns that contains values between 0 (min) and 255 (max) in each cell. Combinations of numbers across the three layers make all the colors that we see in photos, where a totally black photo has only zeros (0s) and a totally white photo has 255s in the cells across the three layers, also known as RGB channels. Once we have the images, the script does the rest.
The script contains a function (FFSS) that opens and analyzes one image at the time starting with the split of the RGB into its three individual channels. Figure 2 shows the three images that comprise the RGB, lighter values are closer to 255 (or 100%) and darker values are closer to 0 (or 0%). If we think of percentages, a totally red image will have R=1, G=0, B=0, but in Figure 2, we can notice that there is a distribution of values between 0 and 1 across all three channels. Splitting the image into layers allows us to segment, i.e. to separate the regions of interest (ROI) from the background by filtering only the values between 0 and 1 that correspond to the seeds.
Image segmentation, or thresholding, can be a challenging task depending on the quality and the heterogeneity of the objects in the image. Matlab has an app to segment ROI based on color called “Color Thresholder” (Figure 3). On the top right, the app displays the distribution of pixel values for each channel. These values can be manually changed until we have determined the approximate values that correspond to our ROI. To segment our ROI (seeds) we only had to change the green channel for values greater than 65 (Figure 3B), because we used a very contrasting and homogeneous background. Other images may require adaptive segmentation based on machine learning algorithms that “teach” the computer what the ROI are. Although our thresholding was not perfect using only Color Thresholder, it was easily improved with other filtering algorithms (Figure 3C). We also separated the seeds from each other so that two or more would not be considered as a single ROI. Nonetheless, there are other algorithms that deal with overlapping ROI such as watershed; here is a great example by Heineck et al. (2019) of how this algorithm can be implemented to quantify disease severity in grasses.
After applying various filters, we measured the properties of each ROI using ‘regionprops’ in Matlab. This function allows measurement of multiple dimensional and spectral properties of ROI, such as size and pixel intensity or mean RGB values. The ‘regionprops’ function requires a mask, which is a binary (only zeros and ones) image where black areas have zeros and white areas (corresponding to the ROI) have ones. Once the binary masks were created, we proceeded to measure the area of each ROI as well as the major (length) and minor (width) axes of a fitted ellipse (Figure 4). Because our seeds have relatively homogenous color, we did not add more measurements, but other types of seeds or objects may have variation in color and therefore additional measurements would be worth considering.
The success and simplicity of the system was due to using a homogeneous and contrasting background as well as the multiple algorithms and tools in Matlab and the Image Processing ToolBox. We hope the system will be useful to other researchers and users as it gets improved, but what if your organization does not provide Matlab or you can’t afford it? Nothing to worry about! We also have a totally free version written in Python! Although the Python version is not as interactive as Matlab can be, it is meant to be very easy to use. It also has the option to do color analysis (color_data = True) and to save the transformed images and seed detection (plots = True); see Figure 5.
Image-based techniques have become essential tools for phenotyping in plant science research. In large research plots or agricultural fields, taking measurements can be labor-intensive and time-consuming; technologies using drones, for example, can phenotype an entire field using substantially less personnel and time. The seed size measurements we calculated would have been very time-consuming and difficult to achieve using conventional tools such as calipers, and highly biased if we had visually characterized the differences in size. As a reference, the python function takes about 1.5 seconds to process each image including color data and exporting images. In general, imaging techniques facilitate the detection, with high precision, of quantitative variation in agronomic traits, and dramatically accelerate data collection.