Principles of digital matching

Patrick Julien
Institut géographique national, Laboratoire MATIS,
2, avenue Pasteur 94160 Saint-Mandé, France . patrick.julien@ign.fr


Abstract

The objectives of digital image matching are recalled, and the principle of best correlation of areas is described. Geometrical constraints for oriented stereoscopic pairs are introduced, suitable to decrease the amount of computation. Finally we point out the higher significance of correlation for geometrically alike details, and the advantage of rectifying geometrical distortions while matching details.


1. Definition of digital matching

Image matching designates the action of associating point to point two images of a same object or scene. For example our natural binocular vision system permanently performs image matching , in order to merge the two images seen by the eyes into a unique perceived one. This natural image matching is also of permanent use in classical analogical photogrammetry, when processing stereoscopic pairs of photographs.

Digital image matching is a substitute for visual matching with an artificial vision system, where the images are digital and the point to point association is performed by a computer program. By "point to point" association we mean that for any point P in one image, the program can tell which point Q in the other image represents the same detail, or is homologous to P. Now digital matching can be defined as follows : given two digital views of an object, find automatically all the pairs of homologous details.

julien1.jpg (7257 bytes)

Figure 1

This definition needs several complements.

1) It is better to speak of details rather than of points, because a detail is not necessarily punctual. For example, on the images of figure 1, a corner of the house can be considered as punctual ; an edge of a wall is linear, not punctual; a detail such as the bush is not punctual, and its description cannot be reduced to a single point.

2) By "all the pairs", one must understand a number of pairs, great enough to describe

completely the correspondence between images. Furthermore note that some details may have no visible homologous, because of occluding objects.

3) The definition does not assume stereoscopic views, because digital matching can apply to non-stereoscopic views, for example for image registration , or measurement of fiducial marks. Neither is there any a priori assumption about the nature and geometry of the views; they can be photographs,terrestrial or aerial ones, satellite images, or also drawings.

4) Finally note that, although the definition mentions only two images, searchers now develop digital matching on three or more images.

2. Applications of digital matching in photogrammetry.

Digital matching applies in photogrammetry whenever pairs of homologous points are needed, namely for acquisition of tie points and of dense digital terrain models in the case of aerial stereoscopic pairs ; but also for image registration and measurement of fiducial marks .

3. Best correlation principle.

Amongst the different possible approaches for digital matching, we only consider in this paper the one based on the best correlation principle, which can be stated as follows.

Let r1(c,l) , r2(c,l) be the matrices representing the digital images. In theory, two homologous details, assimilated to pixels P1=(c1,l1) , P2=(c2,l2), should appear identical in both images : r1(P1) = r2(P2) . One can easily guess that this condition cannot characterize homologous pixels, because there are lots of non-homologous pairs Q1,Q2 such that r1(Q1)=r2(Q2).

On the other hand, one can notice then if two pixels P1=(c1,l1) , P2=(c2,l2) are homologous, then their neighbourhoods must appear identical, which means :

vP1(c,l) = vP2(c,l) for -h £ c £ h , -k £ l £ k ,

where vP1(c,l)=r1(c1+c, l1+l) , vP2(c,l)=r2(c2+c, l2+l) , -h£c£h, -k£l£k, define two neighbourhoods of P1,P2. This condition appears rather more severe than the previous one, since there is a low probability to meet two neighbourhoods vQ1,vQ2 of non-homologous pixels Q1,Q2 such that vQ1=vQ2.

In brief, a necessary condition for two pixels P1,P2 to be homologous, should theoretically be :

vP1=vP2 , and vP1¹vQ2 for any pixel Q2¹P2 , and vQ1¹vP2 for any pixel Q1¹P1.

Now in the reality, two neighbourhoods vP1,vP2 are never rigorously identical ; they only are alike, with differences in radiometry and in geometry ; in particular the geometrical differences, or relative distortions, are caused by the local relief of the object (terrain). Furthermore even in the absence of geometrical distortions (locally flat terrain), the digital images of the detail are generally distinct, because of a sub-pixel shift of the sampling grid. Then it seems natural to introduce a measure of the likeness of the two neighbourhoods, which is a number I(vP1,vP2), defined either through a distance, for example I(vP1,vP2) = d1(vP1,vP2) = Sc,l|vP1(c,l)-vP2(c,l)| , or through a correlation index such as I(vP1,vP2) = Sc,lvP1(c,l)vP2(c,l).

The more the images are alike, the closer is the value I(vP1,vP2) to the optimum Iopt=I(vP1,vP1) , obtained with two identical neighbourhoods. For two homologous pixels P1,P2, the neighbourhoods are as much alike as possible, or in other words, the absolute deviation |I(vP1,vP2) - Iopt| must be minimum, which means explicitly :

| I(vP1,vP2) - Iopt | < | I(vP1,vQ2) - Iopt | for any pixel Q2¹P2 ,

and | I(vP1,vP2) - Iopt | < | I(vQ1,vP2) - Iopt | for any pixel Q1¹P1 ;

this necessary condition for homology of pixels P1,P2 is what the best correlation principle consists of.

4. Examples of likeness measures I(Vp1 ,Vp2).

The simplest measures are defined through d1 or d2 distances : I=d1 or I=d2 , with

d1(vP1,vP2)= Sc,l |vP1(c,l)-vP2(c,l)| , d2(vP1,vP2)= (Sc,l (vP1(c,l)-vP2(c,l))2 )1/2.

For these indexes , the optimum Iopt=0 is a minimum.

But the most frequently used measure is probably the correlation value, normalized and centered, also called linear correlation coefficient :

I(vP1,vP2) = cov(vP1,vP2) / s(vP1)s(vP2)

=Sc,l(vP1(c,l)-EvP1)(vP2(c,l)-EvP2) / (Sc,l(vP1(c,l)-EvP1)2)1/2(Sc,l(vP2(c,l)-EvP2)2)1/2

where vP1,vP2 are defined for -h£c£h , -k£l£k , N=(2h+1)(2k+1) , and

EvP1 = 1/N Sc,lvP1(c,l) , EvP2 = 1/N Sc,lvP2(c,l)

For this measure : -1 £ I(vP1,vP2) £ l and Iopt=1 is a maximum.

Then the necessary condition for homology of P1,P2, given in paragraph 3, becomes :

I(vP1,vP2)>I(vP1,vQ2) and I(vP1,vP2)>I(vQ1,vP2) for any pixels Q2¹P2 and Q1¹P1.

Furthermore I(vP1,vP2)=Iopt=1 if and only if vP1,vP2 are linked by a linear relation vP2=avP1+b , a>0; in other words, I(vP1,vP2) is insensible to a linear change of dynamics between the two images; this property is often useful and justifies, together with experience, the wide use of this measure.

5. Implementing the best correlation principle.

According to the best correlation principle, digital matching now comes down to evaluating the index I(vQ1,vQ2) for a number of pairs Q1,Q2.

For example figure 2 shows a portion of an aerial photograph, from which a 33x33 pixels neighbourhood vP1 of pixel P1 has been extracted. The linear correlation coefficient I(vP1,vQ2(i,j)) has been evaluated for a set of 21x21 neighbourhoods vQ2(i,j) extracted from an other photograph, and corresponding to pixels Q2(i,j) = (c2+i,l2+j), -10£ i,j £10, around an estimated homologous (c2,l2) of P1. The maximum 0.9478 of the "correlation surface" s(i,j)=I(vP1,vQ2(i,j)) occurs for i=-2, j=+3, giving the position of the true homologous P2=(c2-2, l2+3) of P1.

julien2.jpg (32342 bytes)

As an information, let us evaluate the computation time necessary to obtain the above correlation surface.

The linear correlation coefficient of two N-arrays needs about 8N floating operations ; here for N=33x33=1089 and 21x21=441 coefficients, about 3.84x106 floating operations are needed. On the basis of a 500 MHz processor and of 10 cycles per floating operation, 3.84x106x10/5x108=0.077 second is necessary to obtain an homologous P2. This is cheap if one wants to obtain a small number of homologous pairs, for example for tie points.

Now in the above example, the photograph has 4600x4600 pixels (23cmx23cm photo digitized by 0.050mm). If one wants to obtain the homologous for each pixel P1, in or

der to get a digital terrain model (DTM) , the total time is about 4600x4600x0.077s = 452 hours = 19 days; this amount, several times as much as in a classical photogrammetric survey, is unrealistic.

These estimations show it is important to reduce the computation time, more especially as digital images are often much larger, and may count up to 15000x15000 pixels.

In order to reduce the computation time, one can act on three parameters : 1) the DTM density, by processing only one pixel on 2x2, or 3x3, etc ...; 2) the neighbourhoods size; 3) the size of the correlation surface. For example processing only 1 pixel on 3x3, using 11x11-neighbourhoods instead of 33x33 ones, and computing 7x7-correlation surfaces instead of 21x21 ones, will reduce the computation time to 37 minutes for a 4600x4600 image, and 6.5 hours for a 15000x15000 image, which seems now realistic.

In the particular case of a stereoscopic pair, the correlation surface can be reduced to a "correlation curve" (1x21 instead of 21x21 in the above example) provided that the pair is oriented; this possibility follows from geometrical constraints, which we now describe.

6. Geometrical constraints for an oriented stereoscopic pair.

The geometrical constraints are of two kinds (see figure 3) : the epipolar constraint, for which the relative orientation of the pair is required, or the vertical constraint, for which the absolute orientation is required.

6.1 The epipolar constraint.

Consider a pixel P1 in the first image; P1 is the image of some detail M; the position of M is unknown, but M is known to be located on the perspective ray S1P1; so the image P2 of M in the second image is located on the line D2 -known as an epipolar line-, image of S1P1 ; then P2, the homologous of P1, must be searched for only along D2, i.e. in a one-dimensional area. In other words the "correlation surface" reduces to a "correlation curve" s(i)=I(vP1,vQ2(i)), corresponding to a set of points Q2(i)=P2+iDx+iDy , aligned on D2.

Note that determination of the epipolar line D2 assumes that the position of the second image with respect to the first one -i.e. the relative orientation of the pair- is known.

6.2 The vertical constraint.

Let M be a detail of the terrain, whose planimetric position m is given, but whose altitude is unknown : M is located somewhere on the vertical line above m; so the "correlation surface" reduces to a "correlation curve" s(i) = I(vQ1(i),vQ2(i)) , corresponding to a set of pairs Q1(i)=Q1(0)+iD1x+iD1y ,Q2(i)=Q2(0)+iD2x+iD2y images of points Mi on the vertical line.

Note that considering vertical lines assumes that the positions of the images with respect to the ground -i.e. the absolute orientation of the pair- is known.

julien3.jpg (18781 bytes)

7. Effect of the neighbourhood size on the correlation curve.

Decreasing the size of neighbourhoods vQ1,vQ2 seems an easy way to reduce the computation time. On the other hand, smaller neighbourhoods involve a higher probability to meet non-homologous pixels Q1,Q2 such that vQ1»vQ2, i.e. I(vQ1,vQ2)»Iopt . The diagrams on figure 4 give examples of correlation curves for various neighbourhood sizes : for too small neighbourhoods, the multiple maxima of the correlation curves make the research of an homologous indeterminate. In this example, for the considered pixel, the neighbourhood size should be at least 30 meters (19x19 pixels). For smaller neighbourhoods, the best correlation principle is not a sufficient condition for homology, and additional conditions, such as coherence between adjacent pixels, are necessary.

8. Coping with geometrical distortion in digital matching.

As mentioned above, too small neighbourhoods vQ1,vQ2 are proscribed. On the other hand, larger neighbourhoods may be affected by geometrical distortions, due to terrain relief. This "basic problem", according to Hobrough [1978], is illustrated in figure 5 : the strong distortion between the two neighbourhoods vQ1,vQ2 makes it difficult to appreciate their best registration, either visually, since the sum vQ1+vQ2 appears fuzzy for any shift of vQ2 with respect to vQ1, or digitally, since the correlation curve -not represented- is flat, involving an imprecise determination of the maximum, not to mention the low value 0.55 of this maximum.

julien4.jpg (113418 bytes)

 

julien5.jpg (13944 bytes)

Then the way to overcome the problem is to transform geometrically one (or both) neighbourhood(s) to get them superposable. Such a transform is a priori unknown, depending on the -unknown- shape of the terrain. So the transform has to be determined together with the shift between neighbourhoods; a simple transform -with a minimum number of parameters- is suitable, in order to preserve robustness.

A natural choice is a linear transform L, applied to vQ2, defined by parameters a,b,g,d , with which the matching problem now states : find the values a,b,g,d,i optimizing the function s(a,b,g,d,i) = I(vQ1,L(vQ2)+iDx+iDy) .

For example, if I=d2, the explicit value of s(a,b,g,d,i)2 is

Sc,l (r1(c1+c, l1+l) - r2( a(c2+c)+b(l2+l)+iDx , g(c2+c)+d(l2+l)+iDy ) )2 ;

this is the "least-squares correlation" method described by Ackermann [1984] or in Wrobel's state of the art [1988].

Another possible transform consists of a double projection of both neighbourhoods vQ1,vQ2, first on a surface approximating the terrain, then on the horizontal plane (figure 6) ; in other words the transformed neighbourhoods uQ1,uQ2 are local orthophotos. The simplest choice for the approximating surface is a plane, defined by three parameters z,h,q, with which the matching problem states : find the values z,h,q optimizing the function s(z,h,q) = I( uQ1(z,h,q) , uQ2(z,h,q) ).

julien6.jpg (12047 bytes)

Figure 7 shows the resulting transformed neighbourhoods uQ1,uQ2 after fitting the best parameters z=z1, h=z2, q=z3 ; note the plane condition z1-z2+z3-z4=0. The sum uQ1+uQ2 is now sharp and the correlation 0.90 significant, with 167m large neighbourhoods (121x121 pixels) ; as an information, the slope of the plane (or terrain since the plane fits the terrain well) is 0.65.

julien7.jpg (12173 bytes)

In this example, the function s(z1,z2,z3) was optimized by relaxation, described precisely as follows. Let (z1k,z2k,z3k) be the kth-approximation; if s(z1k, z2k, z3k) is not locally maximum with respect to z1k, then z1k is successively decreased : z1k-Dz , z1k-2Dz, ..., z1k-jDz, or increased z1k+Dz, z1k+2Dz, ..., z1k+jDz, until s(z1k-jDz, z2k, z3k) or s(z1k+jDz, z2k, z3k) is a local maximum ; then z1,k+1 is set to z1k-jDz or z1k+jDz ; z2,k+1 and z3,k+1 are obtained in the same manner.The initial approximation (z10,z20,z30) is the one minimizing a small set of values s(zj,zk,zl) , zmin£zj,zk,zl£zmax, for example zj = zmin+j(zmax-zmin)/10.

9. Conclusion.

This paper discussed some basic facts in digital matching based on correlation between neighbourhoods, or "area based" matching, but other basic facts have been omitted. For example, approaches based on other features, such as extracted contours, are also of frequent use. Also, only the geometrical aspect of digital matching has been considered; but radiometric image quality also influences digital matching. As a final remark, we wish to point out that correlation is a powerful -robust and precise- tool for digital matching, under the condition that images are made geometrically alike.

References.

HOBROUGH,G. 1978 : Digital on-line correlation. Bildmessung und Luftbildwesen 3/1978 : 79-86.

ACKERMANN,F. 1984 : Digital image correlation : performance and potential application in photogrammetry. Photogrammetric record, 11(64): 429-439 (October 1984).

WROBEL,B. 1988 : Least square methods for surface reconstruction from images. Invited paper, ISPRS Commission III, Kyoto : 806-821 (1988).


Redaction: Grégoire Ramuz Last update: October 1999