Peter Norvig on Bayesian Networks - Dictionary of Arguments
Norvig I 510
Bayesian Networks/belief networks/probabilistic networks/knowledge map/AI research/Norvig/Russell: Bayesian networks can represent essentially any full joint probability distribution and in many cases can do so very concisely.
Norvig I 511
A Bayesian network is a directed graph in which each node is annotated with quantitative probability information. The full specification is as follows:
1. Each node corresponds to a random variable, which may be discrete or continuous.
2. A set of directed links or arrows connects pairs of nodes. If there is an arrow from node X to node
Y,X is said to be a parent of Y. The graph has no directed cycles (and hence is a directed acyclic graph, or DAG.
3. Each node Xi has a conditional probability distribution P(Xi |Parents(Xi)) that quantifies the effect of the parents on the node.
The topology of the network - the set of nodes and links - specifies the conditional independence relationships that hold in the domain (…). >Probability theory/Norvig, >Uncertainty/AI research.
The intuitive meaning of an arrow is typically that X has a direct influence on Y, which suggests that causes should be parents of effects. Once the topology of the Bayesian network is laid out, we need only specify a conditional probability distribution for each variable, given its parents.
Norvig I 512
Circumstances: The probabilities actually summarize a potentially
Norvig I 513
Infinite set of circumstances.
Norvig I 515
Inconsistency: If there is no redundancy, then there is no chance for inconsistency: it is impossible for the knowledge engineer or domain expert to create a Bayesian network that violates the axioms of probability.
Norvig I 517
Diagnostic models: If we try to build a diagnostic model with links from symptoms to causes (…) we end up having to specify additional dependencies between otherwise independent causes (and often between separately occurring symptoms as well).
Causal models: If we stick to a causal model, we end up having to specify fewer numbers, and the numbers will often be easier to come up with. In the domain of medicine, for example, it has been shown by Tversky and Kahneman (1982)(1) that expert physicians prefer to give probability judgments for causal rules rather than for diagnostic ones.
Norvig I 529
Inference: because it includes inference in propositional logic as a special case, inference in Bayesian networks is NP-hard. >NP-Problems/Norvig.
There is a close connection between the complexity of Bayesian network inference and the complexity of constraint satisfaction problems (CSPs). > Constraint satisfaction problems/Norvig.
Clustering algirithms: Using clustering algorithms (also known as join tree algorithms), the time can be reduced to O(n). For this reason, these algorithms are widely used in commercial Bayesian network tools. The basic idea of clustering is to join individual nodes of the network to form cluster nodes in such a way that the resulting network is a polytree.
Norvig I 539
(…) Bayesian networks are essentially propositional: the set of random variables is fixed and finite, and each has a fixed domain of possible values. This fact limits the applicability of Bayesian networks. If we can find a way to combine probability theory with the expressive power of first-order representations, we expect to be able to increase dramatically the range of problems that can be handled.
Norvig I 540
Possible worlds/probabilities: for Bayesian networks, the possible worlds are assignments
of values to variables; for the Boolean case in particular, the possible worlds are identical to those of propositional logic. For a first-order probability model, then, it seems we need the possible worlds to be those of first-order logic—that is, a set of objects with relations among them and an interpretation that maps constant symbols to objects, predicate symbols to relations, and function symbols to functions on those objects.
Problem: the set of first-order models is infinite.
Solution: The database semantics makes the unique names assumption—here, we adopt it for the constant symbols. It also assumes domain closure - there are no more objects than those that are named. We can then guarantee a finite set of possible worlds by making the set of objects in each world be exactly the set of constant
Norvig I 541
Symbols that are used. There is no uncertainty about the mapping from symbols to objects or about the objects that exist.
Relational probability models: We will call models defined in this way relational probability models, or RPMs. The name relational probability model was given by Pfeffer (2000)(2) to a slightly different representation, but the underlying ideas are the same. >Uncertainty/AI research.
Norvig I 552
Judea Pearl developed the message-passing method for carrying out inference in tree networks (Pearl, 1982a)(3) and polytree networks (Kim and Pearl, 1983)(4) and explained the importance of causal rather than diagnostic probability models, in contrast to the certainty-factor systems then in vogue. The first expert system using Bayesian networks was CONVINCE (Kim, 1983)(5). Early applications in medicine included the MUNIN system for diagnosing neuromuscular disorders (Andersen et al., 1989)(6) and the PATHFINDER system for pathology (Heckerman, 1991)(7).
Norvig I 553
Perhaps the most widely used Bayesian network systems have been the diagnosis and- repair modules (e.g., the PrinterWizard) in Microsoft Windows (Breese and Heckerman, 1996)(8) and the Office Assistant in Microsoft Office (Horvitz et al., 1998)(9). Another important application area is biology: Bayesian networks have been used for identifying human genes by reference to mouse genes (Zhang et al., 2003)(10), inferring cellular networks Friedman (2004)(11), and many other tasks in bioinformatics. We could go on, but instead we’ll refer you to Pourret et al. (2008)(12), a 400-page guide to applications of Bayesian networks.
Ross Shachter (1986)(13), working in the influence diagram community, developed the first complete algorithm for general Bayesian networks. His method was based on goal-directed reduction of the network using posterior-preserving transformations. Pearl (1986)(14) developed a clustering algorithm for exact inference in general Bayesian networks, utilizing a conversion to a directed polytree of clusters in which message passing was used to achieve consistency over variables shared between clusters. A similar approach, developed by the statisticians David Spiegelhalter and Steffen Lauritzen (Lauritzen and Spiegelhalter, 1988)(15), is based on conversion to an undirected form of graphical model called a Markov network. This approach is implemented in the HUGIN system, an efficient and widely used tool for uncertain reasoning (Andersen et al., 1989)(6). Boutilier et al. (1996)(16) show how to exploit context-specific independence in clustering algorithms.
Norvig I 604
Dynamic Bayesian networks (DBNs): can be viewed as a sparse encoding of a Markov process and were first used in AI by Dean and Kanazawa (1989b)(17), Nicholson and Brady (1992)(18), and Kjaerulff (1992)(19). The last work extends the HUGIN Bayes net system to accommodate dynamic Bayesian networks. The book by Dean and Wellman (1991)(20) helped popularize DBNs and the probabilistic approach to planning and control within AI. Murphy (2002)(21) provides a thorough analysis of DBNs. Dynamic Bayesian networks have become popular for modeling a variety of complex motion processes in computer vision (Huang et al., 1994(22); Intille and Bobick, 1999)(23).
Like HMMs, they have found applications in speech recognition (Zweig and Russell, 1998(24); Richardson et al., 2000(25); Stephenson et al., 2000(26); Nefian et al., 2002(27); Livescu et al., 2003(28)),
Norvig I 605
genomics (Murphy and Mian, 1999(29); Perrin et al., 2003(30); Husmeier, 2003(31)) and robot localization (Theocharous et al., 2004)(32).
The link between HMMs and DBNs, and between the forward–backward algorithm and Bayesian network propagation, was made explicitly by Smyth et al. (1997)(33). A further unification with Kalman filters (and other statistical models) appears in Roweis and Ghahramani (1999)(34). Procedures exist for learning the parameters (Binder et al., 1997a(35); Ghahramani, 1998(36)) and structures (Friedman et al., 1998)(37) of DBNs.
1. Tversky, A. and Kahneman, D. (1982). Causal schemata in judgements under uncertainty. In Kahneman, D., Slovic, P., and Tversky, A. (Eds.), Judgement Under Uncertainty: Heuristics and Biases.
Cambridge University Press.
2. Pfeffer, A. (2000). Probabilistic Reasoning for Complex Systems. Ph.D. thesis, Stanford University
3. Pearl, J. (1982a). Reverend Bayes on inference engines: A distributed hierarchical approach. In AAAI-
82, pp. 133–136
4. Kim, J. H. and Pearl, J. (1983). A computational model for combined causal and diagnostic reasoning
in inference systems. In IJCAI-83, pp. 190–193.
5. Kim, J. H. (1983). CONVINCE: A Conversational Inference Consolidation Engine. Ph.D. thesis, Department of Computer Science, University of California at Los Angeles.
6. Andersen, S. K., Olesen, K. G., Jensen, F. V., and Jensen, F. (1989). HUGIN—A shell for building
Bayesian belief universes for expert systems. In IJCAI-89, Vol. 2, pp. 1080–1085.
7. Heckerman, D. (1991). Probabilistic Similarity Networks. MIT Press.
8. Breese, J. S. and Heckerman, D. (1996). Decisiontheoretic troubleshooting: A framework for repair
and experiment. In UAI-96, pp. 124–132.
9. Horvitz, E. J., Breese, J. S., Heckerman, D., and Hovel, D. (1998). The Lumiere project: Bayesian
user modeling for inferring the goals and needs of software users. In UAI-98, pp. 256–265.
10. Zhang, L., Pavlovic, V., Cantor, C. R., and Kasif, S. (2003). Human-mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Research, pp. 1–13.
11. Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. Science,
12. Pourret, O., Naım, P., and Marcot, B. (2008). Bayesian Networks: A practical guide to applications.
13. Shachter, R. D. (1986). Evaluating influence diagrams. Operations Research, 34, 871–882.
14. Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. AIJ, 29, 241–288.
15. Lauritzen, S. and Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. J. Royal Statistical Society, B 50(2), 157–224.
16. Boutilier, C., Friedman, N., Goldszmidt, M., and Koller, D. (1996). Context-specific independence in
Bayesian networks. In UAI-96, pp. 115–123.
17. Dean, T. and Kanazawa, K. (1989b). A model for reasoning about persistence and causation. Computational Intelligence, 5(3), 142–150.
18. Nicholson, A. and Brady, J. M. (1992). The data association problem when monitoring robot vehicles using dynamic belief networks. In ECAI-92, pp. 689–693.
19. Kjaerulff, U. (1992). A computational scheme for reasoning in dynamic probabilistic networks. In
UAI-92, pp. 121–129.
20. Dean, T. and Wellman, M. P. (1991). Planning and Control. Morgan Kaufmann.
21. Murphy, K. (2002). Dynamic Bayesian Networks: Representation, Inference and Learning. Ph.D. thesis, UC Berkeley
22. Huang, T., Koller, D., Malik, J., Ogasawara, G., Rao, B., Russell, S. J., and Weber, J. (1994). Automatic symbolic traffic scene analysis using belief networks. In AAAI-94, pp. 966–972
23. Intille, S. and Bobick, A. (1999). A framework for recognizing multi-agent action from visual evidence. In AAAI-99, pp. 518–525.
24. Zweig, G. and Russell, S. J. (1998). Speech recognition with dynamic Bayesian networks. In AAAI-98, pp. 173–180.
25. Richardson, M., Bilmes, J., and Diorio, C. (2000). Hidden-articulator Markov models: Performance improvements and robustness to noise. In ICASSP-00.
26. Stephenson, T., Bourlard, H., Bengio, S., and Morris, A. (2000). Automatic speech recognition using dynamic bayesian networks with both acoustic and articulatory features. In ICSLP-00, pp. 951-954.
27. Nefian, A., Liang, L., Pi, X., Liu, X., and Murphy, K. (2002). Dynamic bayesian networks for audiovisual speech recognition. EURASIP, Journal of Applied Signal Processing, 11, 1–15.
28. Livescu, K., Glass, J., and Bilmes, J. (2003). Hidden feature modeling for speech recognition using dynamic Bayesian networks. In EUROSPEECH-2003, pp. 2529–2532
29. Murphy, K. and Mian, I. S. (1999). Modelling gene expression data using Bayesian networks.
30. Perrin, B. E., Ralaivola, L., and Mazurie, A. (2003).
Gene networks inference using dynamic Bayesian networks. Bioinformatics, 19, II 138-II 148.
31. Husmeier, D. (2003). Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic bayesian networks. Bioinformatics, 19(17), 2271-2282.
32. Theocharous, G., Murphy, K., and Kaelbling, L. P. (2004). Representing hierarchical POMDPs as
DBNs for multi-scale robot localization. In ICRA-04.
33. Smyth, P., Heckerman, D., and Jordan, M. I. (1997). Probabilistic independence networks for hidden Markov probability models. Neural Computation, 9(2), 227–269.
34. Roweis, S. T. and Ghahramani, Z. (1999). A unifying review of Linear GaussianModels. Neural Computation, 11(2), 305–345.
35. Binder, J., Koller, D., Russell, S. J., and Kanazawa, K. (1997a). Adaptive probabilistic networks with hidden variables. Machine Learning, 29, 213–244.
36. Ghahramani, Z. (1998). Learning dynamic bayesian networks. In Adaptive Processing of Sequences
and Data Structures, pp. 168–197.
37. Friedman, N., Murphy, K., and Russell, S. J. (1998). Learning the structure of dynamic probabilistic networks. In UAI-98._____________Explanation of symbols: Roman numerals indicate the source, arabic numerals indicate the page number. The corresponding books are indicated on the right hand side. ((s)…): Comment by the sender of the contribution. Translations: Dictionary of Arguments The note [Author1]Vs[Author2] or [Author]Vs[term] is an addition from the Dictionary of Arguments. If a German edition is specified, the page numbers refer to this edition.
Stuart J. Russell
Artificial Intelligence: A Modern Approach Upper Saddle River, NJ 2010