Identification : Resolution

Image resolution is the detail an image holds.  Resolution could be expressed in a number of ways.  In the field, resolution is determined by ones visual acuity, the distance from the bird and the magnification of the optics being used.  In print, resolution may be defined in terms of dots per inch or dots per centimeter on paper.

On the screen, it is determined by the number of horizontal lines.  If we had no means to zoom into digital photographs on screen then the screen resolution would determine the maximum resolution for identification purposes.  Luckily however every image viewing software package allows us to zoom into an image and inspect it's fine detail right down to its minute components, its pixels.  This zoom function is of huge advantage over photographic film.

So when it comes to digital images resolution is best expressed in terms of pixel resolution.

Measuring Pixel Resolution
It is important to measure the actual number of pixels within the surface area of the bird rather than choosing the resolution of the whole image.  This makes sense.  When you take a picture of a bird in most cases the bird will only take up a small portion of the overall image.  So, for a true representation of pixel resolution it is important to zoom right in and extract the bird and its pixels out of the scene for closer examination.  There are useful tools available freely online that will count the actual number of pixels in a defined area of the image. is one such program.  Using simply use the Lasso Select function, trace around the subject and at the bottom of the screen the selected area will be displayed in total pixels.

Surely birds come in all shapes and sizes and they photograph in all kinds of poses.  How could a simple measurement of pixel area be considered good enough in all scenarios?  Well lets put this theory to the ultimate test -  a comparison of a frame-filling bird with a narrow waif.

The ultimate comparison

In the left corner, weighing in at a colossal 16,050,000 pixels we have our heavy weight champion Obesebird, while in the right corner, at little more than 700,000 pixels we have featherweight Skinny-legged Stickbird.  In order for these two to meet in our ring we must slim down Obesebird to 700,000 pixels (a reduction of the image to approx. 21% of it's original size should do the trick) and put them in the ring together.  Finally, to make things a bit more interesting I am going to reduce the final image to 60% (trial and error again).  That way both birds are now resolved at approx. 250,000 pixels each, which is at the upper end in terms of image quality in our first Rev., 1.0 of the Birding Image Quality Tool.

Here is the result.

Here is a portion of the image zoomed up.
While the smaller image has clearly lost some of its clarity and is pixilated, it still retains enough detail to make it useful for our ID purposes.  This gives me enough confidence to proceed with the logic that a single, simple measure and scale for pixel resolution may well be enough for our purposes.

The example above is at the extreme end of things.  I have been tinkering around with different bird images to come up with a standard set of poses for bird identification purposes.  More of that another time.  Below is an example of how pixel resolution in bird images varies around a mean. From my studies so far I am finding that in most cases where we are dealing with useful images for bird identification purposes it should be possible to find an optimal pixel resolution with + or - 25% margin around a mean of around 300,000 pixels.

Human Visual Acuity Versus Digital Image Resolution
Human's have a very sophisticated visual system.  Vision, it could be said, is our most prominent and acute sensory ability.  Firstly, we have a reasonably acute eye-sight, focused mainly in a very small part of the retina called the Fovea centralis (or fovea). Most of the colour optical receptors of the eye (cones) are located in this small space.  Much like a digital camera, the visual acuity of the fovea is mainly a product of it's large number and density of photoreceptors.  Birds of prey, which have a much greater visual acuity than us, have many times more photoreceptors making up their visual system, somewhat akin to having a camera with more megapixels. 

Unlike most animals, humans observe the world in full colour, thanks to the fact that most of us have three colour cones in our eyes.  Most animals only possess green and blue cones but, thanks to a genetic mutation, the ancestors of humans and related primates developed the ability to see in red in addition to green and blue.  The main evolutionary benefit it seems has been our ability to distinguish ripened fruits from unripened green fruit and foliage, giving our ancestors a competitive advantage over other fruit-foraging species.  

Our green cones outnumber blue and red two to one.  The digital image sensor and formerly colour film both have attempted to mimic the human visual system by attempting to recreate this balance.  The result from a digital imaging perspective is the Bayer Filter.

The image above depicts the workings of a typical digital camera.  The Bayer Filter sits on top of the digital image receptors (photosites).  It works in much the same way as the cone cells of the human eye.  Just as a red cone cell in the eye will only pass red light, the red bayer filter will only allow red light through to the digital receptor.  Each photosite therefore equates to a single pixel of the equivalent bayer filter colour with a record of the  light intensity hitting it.


As the illustration above depicts, colour digital image formation using a Bayer Filter comes at a cost.  Because the initial "Bayer Raw" image consists of a mosaic of green, blue and red coloured pixels, the image must be processed to form a correctly-coloured digital image.  Called demosaicing, this process consists of an algorithm which interpolates the data from adjacent photosites (two green, a red and a blue) to create the full colour picture.

Interpolation involves averaging values so there is a significant amount of uncertainty brought about by this process.  Some camera manufacturers and raw image editing packages use more complicated algorithms to produce better results.

HERE is a nice blog posting by Adam Hooper, explaining and illustrating the difference between two common types of demosaicing interpolation methods, Bilinear and Adaptive Homogeneity-Directed (AHD).  Basically the bilinear method doesn't take account of the actual image content and simply, blindly averages every pixel.  While, on the other hand a more intuitive algorithm like AHD follows lines and edges between patches of colour and tries to create better definition and less blurring of colour across patches.  But consequently AHD involves more processing, and therefore is slower in creating an image from RAW.

In the ideal world each photosite would work like a mini-spectrophotometer, capable of recording a complete spectral analysis of the light hitting it.  Imagine how big that image file would be, not to mind the sophisticated photosite technology required!

Human Raw Vision
When we start to look at the fine workings of a digital camera and processor it is all too easy to become critical about the loss of data and seemingly heavy processing that is going on.  But, before we get too carried away lets compare what we have just seen with the workings of the human eye and brain.  If we could somehow zoom into the image that our eyes capture we would probably be no less critical.

The fovea is a tiny spot at the back of the retina, directly opposite the pupil of the eye.  It is packed with cone cells for acute colour vision but contains no rod cells (used for low light or night vision).  If you have ever gazed at a galaxy or comet in the night sky you will have noted that it is easier to observe if you focus on a spot slightly to the side of it.  This is because the cone cells in the fovea have relatively poor low light sensitivity.  By shifting the focus to the side of an object of interest the image of the object is projected onto the periphery which is rich in low light sensitive rods.  Suddenly, the object materialises, albeit frustratingly blurry and poorly defined.  When we try and centre our vision on the object, again it appears to vanish as the cones cannot register it's low light.  As kids we all learnt how to find the blind spots in our eyes, where the optic nerve enters the eye.

In an earlier posting HERE I came up with a way to check one's foveal field of view using a neat scintillating pattern I had found online.  It is really amazing just how narrow and tunneled our focus actually is, and it is not too surprising that we often miss something that is literally right under our nose.  

If we think that the heavy processing going on in the camera is unpalatable, consider what the brain has to do to construct a full colour image from the light hitting such a complex arrangement of structures.  Almost every detail we consciously register comes from the cone cells in the fovea.  Our peripheral cones and rods are active by day as part of our peripheral vision. Peripheral vision serves to widen our field of view, alerting us to movement and aiding our spacial awareness, but has little or no active or conscious role until after dark when the rods come into their own as our sole method of vision.

Above I have compared what an image of a small, distant triangle might look like if captured exactly as it appears in life (left) with what a normal modern digital camera records (centre) and what an equivalent human retina might see (right).  The digital camera sensor consists of a regular grid of green, blue and red colour photosites.  The ratio of green is to blue and red is 2:1:1, which is intended to match the distribution of cone cells in the retina.  Unlike the digital sensor, the cone cells in the retina are arranged at random and vary both in size and shape (surface area exposed to the light).  So the digital image starts out not that dis similarly from a "raw" human visual image.

To the brain the triangle edge must have a very odd and ever-changing shape - as image projected on to the back of the eyeball does not remain perfectly stationary (like a photograph) but instead moves about constantly in real time (light a video recording) as our head moves relative to the subject.  The brain must process this real time image and somehow make sense of it.

How much of what we see is real and how much is a construct of the human brain as it tries to fill in gaps?  Using human vision and struggling to make sense of a distant object is not much different from someone trying to make sense of a tiny fuzzy object in a digital image.  Both involve a high degree of uncertainty.  On a visit to an optician the Snellen chart quickly remind us of the limitations of our visual acuity.  What we need I think is an equivalent cue for digital image acuity.  In the Image Quality Tool I use Image Resolution as this cue.

Acutance is an intriguing concept which again draws parallels between digital imaging and human vision.  If an image appears sharp our brain will happily accept it as being sharp.  Due to demosaicing, digital images start out slightly soft in appearance.  Unsharp Masking is very effective at increasing the acutance or apparent sharpness of photographs but, as these links highlight, the net effect is actually a loss of image data at the pixel level.  When attempting to make sense of small details in images it is best to start with the original raw image if available, not the final, possibly heavily sharpened image.

The actual mechanism by which acutance works in photo-finishing appear very similar to the natural visual phenomena of of Mach bands and the Cornsweet Illusion.

Moiré is an artefact associated with image resolution.  It can be produced wherever two regularly occurring patterns overlap.  One of these patterns may include the regular distribution of photosites making up the image sensor.  Another may be the repeating pattern of lines making up the computer screen image.  Another may be any regular pattern occurring in the digital image itself.  Lastly, moiré may be produced due to the repeating pattern within an image processing algorithm.  In bird images it occurs most commonly in the repeating pattern of flight feather fringes in the closed wing.  For more see HERE.

The top left image is of high resolution.  The images to the right of it are reduced in resolution to 25% and 12.5% of the original image size respectively.  At full crop there is no obvious difference between these three images on screen.  However when zoomed up at roughly 20% crop the differences are obvious.  I have sharpened the images to enhance the moiré pattern.  The pattern in the 100% and 25% resolution images are much the same, consisting of a slight parallel moiré pattern in the primary and secondary fringes.  However the added pixelation of the 12.5% resolution image adds an additional regular pattern and therefore an extra moiré pattern emerges.  The overall effect is a cross-hatch. 

1 comment:

  1. Hello,

    I just came across the Birding Image Quality Tool piece, "Identification: Resolution." Nice job! As a heads up, you linked to the article "Visual acuity" on which is not W3C accessible for people with low vision.

    I created a new website: "Seeing & Believing." It's the most comprehensive eyesight resource that fully meets website accessibility requirements (PLEASE tell me if you know of a website that does this more thoroughly!)

    Would you mention my version of the Visual acuity piece on the Birding Image Quality Tool page as a resource for those that rely on accessibility? The URL is:

    Best regards,
    Christina Reyes

    Seeing & Believing