Inference in Bayesian networks



Inference

  • from observed events assumes on probability of other events,
  • observations (E – a set of evidence variables, e – a particular event),
  • target variables (Q – a set of query variables, q – a particular query variable),
  • Pr(q|e) to be found,
  • network is known (both graph and CPTs)
  • marginalization Pr(X)=yPr(X,y)

alt text

Inference by enumeration

Pr(¬p3)=P1,P2,P4Pr(P1,P2,¬p3,P4)=P1,P2,P4Pr(P1)Pr(P2|P1)Pr(¬p3|P2)Pr(P4|P2)Pr(p2|¬p3)=Pr(p2,¬p3)Pr(¬p3)Pr(p1,¬p3,p4)=P2Pr(p1,P2,¬p3,p4)=P2Pr(p1)Pr(P2|p1)Pr(¬p3|P2)Pr(p4|P2)

Variable elimination

  1. pre-computes factors to remove the inefficiency
    • factors serve for recycling the earlier computed intermediate results
    • some variables are eliminated by summing them out
  2. does not consider variables irrelevant to the query
    • all the leaves that are neither query nor evidence variable
    • the rule can be applied recursively.

Excercise

calculate Pr(¬p3) and Pr(p2|¬p3)

(ii)

fP1(P2)=P1Pr(P1,P2)=P1Pr(P1)Pr(P2|P1)fP1(p2)=P1Pr(P1,p2)=P1Pr(P1)Pr(p2|P1)=.4×.8+.6×.5=.62fP1(¬p2)=P1Pr(P1,¬p2)=P1Pr(P1)Pr(¬p2|P1)=.4×.2+.6×.5=.38

(iii)

fP1,P2(P3)=P2fP1(P2)Pr(P3|P2)fP1,P2(p3)=P2fP1(P2)Pr(p3|P2)=.62×.2+.38×.3=.238fP1,P2(¬p3)=P2fP1(P2)Pr(¬p3|P2)=.62×.8+.38×.7=.762

(iv)

fP1,¬p3(P2)=fP1(P2)Pr(¬p3|P2)fP1,¬p3(p2)=.62×.8=.496fP1,¬p3(¬p2)=.38×.7=.266

solution

Pr(¬p3)=fP1,P2(¬p3)=.762Pr(p2|¬p3)=fP1,¬p3(p2)fP1,¬p3(p2)+fP1,¬p3(¬p2)=.496.496+.266=.6509

Forward sampling

  1. topologically sort the network nodes
    • for every edge it holds that parent comes before its children in the ordering,
  2. instantiate variables along the topological ordering
    • take Pr(Pj|parents(Pj)), randomly sample Pj ,
  3. repeat step 2 for all the samples (the sample size M is given a priori)
  • samples that contradict evidence not used
  • forward sampling gets inefficient if Pr(e) is small,

Randomly generated numbers for estimating Pr(p1|p2,¬p3)

Rejection sampling

  • rejects partially generated samples as soon as they violate the evidence event e
  • sample generation may stop early → slight improvement
s1 s2 s3 s4 s5 s6 s7 s8
P1 T F F T T F F F
P2 T F T F F T F T
P3 F ? T ? ? F ? F
P4 ? ? ? ? ? ? ? ?

Likelihood weighting

  • avoids necessity to reject samples
  • the values of E fixed, the rest of variables sampled only
  • however, not all events are equally probable, samples must be weighted
  • the weight equals to the likelihood of the event given the evidence
p1 p2 p3 p4 p5 p6
P1 T F F F F F
P2 T T T T T T
P3 F F F F F F
P4 ? ? ? ? ? ?
wi .64 .40 .40 .40 .40 .40

wi=Pr(p2|P1i)Pr(¬p3|p2)

Pr(p1|p2,¬p3)iwiδ(p1i,p1)iwi=0.24

Gibbs sampling

  • generate sequence of samples

  • pm={P1=p1m,...,Pn=pnm},m{1,...,M}:

  1. fix states of all observed variables from E (in all samples)
  2. the other variables initialized in p0 randomly
  3. generate pm from pm1 (PiE)
    • p1mPr(P1|p1m1,...,pn1m1)
    • ...
    • pnmPr(Pn|p1,...,pn1)
  4. repeat step 3 for m1,...,M
  • ignore samples at the beginning (burn-in period)

  • we don't have Pr(Pi|P1,...,Pi1,Pi+1,...,Pn)

  • we can use Pr(MB(Pi)) instead
    • Markov blanket (MB) is the neighborhood,
    • MB covers the node, its parents, its children and their parents
    • MB(Pi) is the minimum set of nodes that d-separates Pi from the rest of the network.