Information fusion framework

From OTBWiki
Jump to: navigation, search

This page intends to describe a possible design for a Dempster-Shafer fusion framework. This framewok will be useful to implement work of V. Poulain on databases update, and other task involving information fusion. It should be extending to support other fusion techniques like bayesian fusion or fuzzy logic.

Design for a Dempster-Shafer framework

For a very quick reminder on Dempster-Shafer theory, one can refer to the Wikipedia page.

Universal set

The universal set is the set of all the possible states. This set can be represented by a set of labels (either integer or string type). It can be indexed.

Power-set

The power set is the set of all the sub-part of the universal set, including the empty set. If the universal set contains labels {0,1,2}, then the power set is {{0},{1},{2},{0,1},{0,2},{2,3},{1,2,3}}. Therefore, the size of the power set is 2^n. An efficient way to represent on element of the power set is to use binary words, where the ith byte is set to 1 is the element contains the ith label and 0 instead. For instance, the {2,3} element would be 110 in base-2, or 6 in base-10. Therefore, the power set for the {0, ..., n} universal set can be indexed (and represented) by the set {0,...,2^n}, and the index in base-2 allows to know which element of the universal set are used.

Masses

In Dempster-Shafer theory, a source of information assigns a belief mass to each element of the power set, such that the total mass equals one. Therefore, we could represent a mass function as a std::map<long,float>, where the key represent the base-10 index of the power set elements and the floating point value represents the mass associated with a given element.

In many cases, a source of information will only bring belief to a few element of the power set, so we could save space by not including null masses into our map.

Combining two distinct masses

If two sources of information assign distinct masses to the elements of the same power set, Dempster-Shafer theory provides a rule to combine them into a joint mass function, also defined on the whole power set. To compute this joint mass, we first need to determine all the pairs of elements of the power set whose intersections are empty.

Optimized method for fusion of classifiers using the Dempster Shafer framework

Among the many bibliographic ressources available about the Dempster Shafer (DS) framework used for the fusion of classifiers, it is possible to pick out 3 publications into which an optimized algorithm for DS classifiers fusion is proposed (Cf. BIBLIOGRAPHY below).

Thus, [1] presents different methods for the fusion of classifiers with DS among them but a more detailed demonstration can be found in an older one: [2], which is especially well presented to understand how the fusion of classifiers could be made in 2 steps with DS combinations (Cf. pages 425-428 in [2]). Here is a summary of the general method which is planned to be implemented into the OTB.

Hypothesis

-Let be K images of classification (i.e. K classifiers) of the same location.

-Let M be the number of possible class labels Ai in those images.

-O is the Frame of discernment (i.e. the set of possible values Ai for the pixels of the K images). ex: O = {A, B, C}, with A, B and C the M = 3 possible class labels. Let all the subsets of elements of O be the "focal elements", ex: {A,B}, {A,C},...

-Let X be a pixel of these images. Thus X will have K values for the classification belonging to the frame O. The objective is to choose THE class label among them which satisfies a decision rule based on DS.

-Each classifier (i.e. classified image Im_Cl_i)) has its own confusion matrix from which it is possible to extract the recognition rate of each class label from O. In the DS framework we will consider 2 ways to set the basic probability assignments (bpa) (i.e. the masses noted m(Ai)):

->The 1st one will consider the recognition rate (Nb_True_Positives(Ai)/Total_Nb_Pixels) to be the mass m(Ai).
->The 2nd one will consider the predictive rates ((Nb_True_Positives(Ai) + Nb_False_Positives(Ai))/Total_Nb_Pixels) to be the mass  m(Ai)(Cf. [3]) which is apparently best because 
  it deals with wrongly classified pixels.
-Exemple K = 6 classifiers (6 input images), with, for the pixel X:
Im_Cl_1(X) = A
Im_Cl_2(X) = A
Im_Cl_3(X) = A
Im_Cl_4(X) = B
Im_Cl_5(X) = B
Im_Cl_6(X) = C
STEP I

The STEP I of the method consists in combining the classifiers with the DS rule of combination of masses, according to the labels they give to X. Thus we have 3 groups here for the pixel X:

GR_1(X) -> A

GR_2(X) -> B

GR_3(X) -> C

We note K1 the number of groups. Thus we have K1 = 3 < K here. For example, the group GR_1(X) only handle A, A_ (A_ = NOT A = O - {A}) and O.

We note (+) the DS rule of combination (already implemented in the OTB in otb::JointMassOfBeliefFilter< otb::MassOfBelief<LabelPixelType> >):

m_123(A) = m_1(+)m_2(+)m_3(A)

m_123(A_) = m_1(+)m_2(+)m_3(A_)

m_123(O) = m_1(+)m_2(+)m_3(O)

m_123 = 0 for the other sets different from A, A_ and O

By the same method:

m_45(B) = m_4(+)m_5(B)

m_45(B_) = m_4(+)m_5(B_)

m_45(O) = m_4(+)m_5(O)

m_45 = 0 for the other sets different from B, B_ and O

By the same method:

m_6(C)

m_6(C_)

m_6(O)

m_6 = 0 for the other sets different from C, C_ and O

Thus we obtain K1 = 3 classifiers with different "focal elements" {Ai, Ai_, O}.


STEP II

The second step consists in combining these new K1 classifiers with the DS method (+), which is computationally expensive according to [2] because of all the possible combinations between the differents "focal elements" {Ai, Ai_, O}. Thus, [2] proposes a method based of the combined masses of the step I to directly estimate the belief functions Bel() of the "focal elements" {Ai, Ai_, O}.

Thus, after the step II of this optimized method, the bel(Ai) functions of the focal elements present among the classifiers are exactly the same as those directly calculated with all the combinations. However, since this method only handles the focal elements {Ai} present among the classifiers, it underestimates the actual bel(Ai_) of the complementary functions (Cf. demonstration below). Thus, in our case, it is necessary to correct this with adjustment variables (Cf. demonstration below).


Decision rules

According to Xu et al. [2], several decision rules are available to choose which valu the pixel X should have after the DS fusion:

->The chosen class label for X could be the label Ai for which bel(Ai) is MAXIMAL.
->The chosen class label for X could be the label Ai for which bel(Ai) is MAXIMAL AND this MAXIMAL is greater than a threshold 0 < alpha < 1,
  otherwise the pixel X has the label "UNDECIDED".
->The chosen class label for X could be the label Ai for which di = [bel(Ai)-bel(Ai_)] is MAXIMAL AND this MAXIMAL value of di is greater than a threshold 0 < alpha < 1,
  otherwise the pixel X has the label "UNDECIDED".



Implementation ideas

Thus, the main idea would consist in processing EACH pixel X of the input images individually according to these method. However, we could imagine a preliminary step consisting in extracting the mass of belief functions (bpa) from the confusion matrices before the DS combination in 2 steps. Since the OTB already handles bpa objects (otb::MassOfBelief<LabelPixelType>) into which the masses of all the sets and subsets of the power set [1] are handled, we could imagine a filter which takes an image list (ex: with K images) with their confusion matrices as inputs and which returns the K bpas OTB objects as output from which it will be possible to extract the masses m_k(Ai) of the classifiers k for each class Ai.

Correction and adjustments of the optimized algorithm

The problem induced by the optimized algorithm

Since the optimized algorithm only calculates the A, B, C and K constants (Cf. [2]) from the sets Ai, Ai_ and O actually present among the classifiers, they do not handle the other sets Aj of the universe O, which have combined masses of belief (after the step II) NOT EQUAL to 0. Thus, these sets Aj play a role in the estimation of the belief functions bel(Ai_) of the complementary sets Ai_, with which they have NOT VOID intersections.

In the algorithm proposed by Xu et al. [2], the direct calculation of bel(Ai) is correct, since the direct DS combination of the K classifiers results in the same values for bel(Ai). However, these two methods give different values of bel(Ai_). More precisely, the optimized algorithm underestimates bel(Ai_), which means that the method does not handle the mass of belief of some subsets of the power set in the calculation of bel(Ai_).

Let us consider a concrete example to understand were the problem lies.

Illustration of the issue by an example

/!\ This example is correct only if ALL the K classifiers have mk(O) = mk(universe) = 0, which is expected to be true in our DS application of fusion, which will estimate the masses of belief of each classifier k from its confusion matrix.

Introduction

Let O = {A, B, C, D, E, F} be the Universe.

Let be K = 6 classifiers (6 input images), with, for the pixel X:

Im_Cl_1(X) = A
Im_Cl_2(X) = A
Im_Cl_3(X) = A
Im_Cl_4(X) = B
Im_Cl_5(X) = B
Im_Cl_6(X) = C

Thus, only 3 singleton classes are present among the classifiers. The classes D and E are not represented.

Let consider the combination of the K = 6 classifiers in 2 steps:

/!\ Hyp: mCl1(O) = mCl2(O) = mCl3(O) = mCl4(O) = mCl5(O) = mCl6(O) = 0

DS combination: STEP I

There are 3 resulting classifiers respectively for {A, A_}, {B, B_} and {C, C_}:

GR_1(X) -> {A, A_} <=> classifier 1 based over the 2 focal elements A and A_

GR_2(X) -> {B, B_} <=> classifier 2 based over the 2 focal elements B and B_

GR_3(X) -> {C, C_} <=> classifier 3 based over the 2 focal elements C and C_

Moreover, we can write:

m1(B) = m1(C) = 0

m2(A) = m2(C) = 0

m3(A) = m3(B) = 0

DS combination: STEP II

Let consider a direct combination of these 3 classifiers here, without using the formulas of Xu et al. [2]:

Combination of classifiers 12:

         m1(A)*[m2(B_) + m2(O)]        m1(A)*m2(B_)
m12(A) = ---------------------- = ---------------------- ; in our case
           1 - m1(A)*m2(B)           1 - m1(A)*m2(B)
         m2(B)*[m1(A_) + m1(O)]        m2(B)*m1(A_)
m12(B) = ---------------------- = ---------------------- ; in our case
           1 - m1(A)*m2(B)           1 - m1(A)*m2(B)


                                      m1(A_)*m2(B_) 
m12(C, D, E) = m12(A_ <INTER> B_) = ----------------------
                                     1 - m1(A)*m2(B)
m12(O) = m12(A_) =  m12(B_) = 0 ; in our case


Combination of classifiers 123:


           m12(A)*[m3(C_) + m3(O)]            m12(A)*m3(C_)
m123(A) = --------------------------- = --------------------------- ; in our case
         1 - m3(C)*[m12(A) + m12(B)]   1 - m3(C)*[m12(A) + m12(B)]


           m12(B)*[m3(C_) + m3(O)]            m12(B)*m3(C_)
m123(B) = --------------------------- = --------------------------- ; in our case
         1 - m3(C)*[m12(A) + m12(B)]   1 - m3(C)*[m12(A) + m12(B)]


           m3(C)*[m12(A_) + m12(B_) + m12(C, D, E)]       m3(C)*m12(C, D, E)
m123(C) = ----------------------------------------- = --------------------------- ; in our case
                  1 - m3(C)*[m12(A) + m12(B)]         1 - m3(C)*[m12(A) + m12(B)]


                                               m12(A_ <INTER> B_)*m3(C_)       m12(C, D, E)*m3(C_)
m123(D, E) = m123(A_ <INTER> B_ <INTER> C_) = --------------------------- = --------------------------- in our case
                                              1 - m3(C)*[m12(A) + m12(B)]   1 - m3(C)*[m12(A) + m12(B)]


m123(Universe) = m123(A_) = m123(B_) = m123(C_) = 0 ; in our case 
Calculation of the belief functions

By definition, bel(Ai) corresponds to the sum of the masses of ALL the subsets included in and equal to Ai. Thus, in our case, we can write:

bel(A) = m123(A) + m123(B_) + m123(C_) + m123(D_) + m123(E_) + SUM[m123(ALL other subsets intersecting A)]
bel(A) = m123(A) ; in our case

Thus, since we only handle the following focal elements A, A_, B, B_, C and C_ in the optimized algorithm, the calculations of bel(A), bel(B) and bel(C) result in similar values than those obtained with the direct DS combination. Thus, we can estimate the values of bel(Ai_) with the same definition:

bel(A_) = m123(A_) + m123(B) + m123(C) + m123(D) + m123(E) + m123(A_ <INTER> B_ <INTER> C_) + SUM[m123(ALL other subsets NOT intersecting A)]
bel(A_) = m123(B) + m123(C) + m123(D, E) ; in our case

However, the problem lies in the calculations of bel(A_), bel(B_) and bel(C_) which handle other subsets of the power set which are not present among the K classifiers (i.e. the sets D and E in our example).

The algorithm results in the following value for bel(A_): bel(A_) = m123(B) + m123(C), which does not handle the not null mass of belief m123(D, E). Thus, bel(A_) is underestimated by the optimized algorithm.

bel(D_) = m123(D_) + m123(A) + m123(B) + m123(C) + m123(E) + SUM[m123(ALL other subsets NOT intersecting D)] = m123(A) + m123(B) + m123(C) ; in our case
bel(E_) = m123(E_) + m123(A) + m123(B) + m123(C) + m123(D) + SUM[m123(ALL other subsets NOT intersecting E)] = m123(A) + m123(B) + m123(C) ; in our case

Thus, we can write:

bel(D) = bel(E) = 0 ; in our case
bel(D_) = bel(E_) = m123(A) + m123(B) + m123(C) ; in our case
Estimation of the missing masses of belief in bel(A_), bel(B_) and bel(C_)

Since the SUM of all the masses of belief of the focal elements in the power set should be equal to 1 after the step II of the DS combination, then we can deduce the missing masses of belief, with Ai equal to the <UNION> of all the subsets present in the classifiers (here {Ai = A <UNION> B <UNION> C}):

SUM[m123(ALL other subsets NOT intersecting Ai)] = m123(D, E) = 1 - m123(A) - m123(B) - m123(C) ; in our case because ALL the K classifiers have mClk(Universe) = 0

Thus, we can write:

bel(A_) = m123(B) + m123(C) + m123(D, E) = m123(B) + m123(C) + 1 - m123(A) - m123(B) - m123(C) ; in our case
bel(A_) = 1 - m123(A) = 1 - bel(A) ; in our case 

BIBLIOGRAPHY

[1] Dymitr Ruta and Bogdan Gabrys, "An Overview of Classifier Fusion Methods" Computing and Information Systems, 7 (2000) p.1-10 2000 University of Paisley [URL]: http://eprints.bournemouth.ac.uk/9649/1/CIS_2000_Ruta_Gabrys_fusion_methods_overview.pdf

[2] Xu, L.; Krzyzak, A.; Suen, C.Y.; , "Methods of combining multiple classifiers and their applications to handwriting recognition," Systems, Man and Cybernetics, IEEE Transactions on , vol.22, no.3, pp.418-435, May/Jun 1992 [URL]: http://www.cs.cuhk.hk/~lxu/papers/journal/IEEEsmc92.PDF

[3] Parikh C.R., M.J. Pont and N.B. Jones (2001) "Application of Dempster-Shafer theory in condition monitoring systems: A case study", Pattern Recognition Letters, 22 (6-7): 777-785. [URL]: http://www.le.ac.uk/engineering/mjp9/parikh1.pdf (PREPRINT VERSION)