Representation of Undirected Graphical Model
Published:
This page’s Markdown is generated via pandoc from LaTex. If you feel more comfortable with a LaTex layout, please check here. The original Tex file is also available here.
1. Review
There are several important concepts and theorems introduced in last lecture about Directed Graphical Models.
Local independence: Indicate that in a directed graph, each variable is independent to its nondescendants given its parent.
Global independence: The global independence is given by d-seperation. Note that there is no need to consider too much about global and local things, you can call them whatever you want.
A fully connected DAG is an I-map of any distribution, since for any .
Minimal I-map: A DAG is a minimal I-map of , if the removal of even a single edge from renders it not an I-map.
A distribution may have several I-maps.
P-map: A DAG is a perfect map (p-map) of a distribution if
Note that not every distribution has a perfect map as DAG. Here is an example:
BN1 wrongly says , BN2 wrongly says
It is impossible for a DAG to capture both of the two independences at same time. The main reason is that the directed model (sometimes) encodes more independences together with the one we want. Thus, there is a portion of the space of distribution that we cannot encode with a DGM. That motivates another type of graphical model: undirected graphical models, aka Markov Random Fields.
2. Undirected Graphical Models
UGMs are very similar to DGMs in structure; but the directed or undirected edges encode differently. The directed model encodes causal relationship between nodes, while UGMs captures pairwise relationship which represents correlation between nodes, rough affinity.
Many things can be modeled as a UGM, such as a photo—each pixel can be a node, a go game—the grid chessboard seems intuitive, or even social networks, as shown in figure 2.
3. Representation
Definition an undirected graphical model represents a distribution defined by an undirected graph , and a set of positive potential functions associated with the cliques of , s.t.
where is known as a partition function:
The potential function can be understood as an contingency function of its arguments assigning “pre-probabilistic” score of their joint configuration. We call this of distribution in equation above as Gibbs distribution, as Definition 4.3 in Koller textbook. And the potential function is defined as factor in Koller textbook.
Definition For , a complete subgraph (clique) is a subgraph such that nodes in are fully interconnected.A (maximal) clique is a complete subgraph s.t. any superset is not complete.
Interpretation of Clique Potentials
The model implies . This independence statement implies (by definition) that the joint must factorize as:
We can write this as
or
However, we cannot have all potentials be marginals and cannot have all potentials be conditionals.
The positive clique potentials can only be thought of as general “compatibility”, “goodness” or “happiness” functions over their variables, but not as probability distributions.
Example UGM — using max cliques
Here we’ll use an example to show an UGM.
We can factorize the graph into two max cliques:
We can represent as two 3D tables instead of one 4D table.
Using subcliques
In this example, the distribution factorized over the subcliques.
Example UGM — canonical representation
A canonical representation of such a graph can be expressed as:
4. Independence properties
Global independence
Definition A set of nodes separates and in , denoted , if there is no active path between any node and given . Global independences associated with are defined as:
In Figure 3, B separates A and C if every path from a node in A to a node in C passes through a node in B. It is written as sepH(A : C|B). A probability distribution satisfies the global Markov property if for any disjoint A,B,C such that B separates A and C, A is independent of C given B.
Local independence
Definition For each node , there is unique Markov blanket of , denoted , which is the set of neighbors of in the graph (those that share an edge with )
Definition The local Markov independencies associated with H is:
In other words, X i is independent of the rest of the nodes in the graph given its immediate neighbors.
Note that, based on the local independence:
Soundness and completeness of global Markov property
Definition An UG is an I-map for a distribution if , i.e., entails .
Definition P is a Gibbs distribution over H if it can be represented as
Theorem (soundness): If is a Gibbs distribution over , then is an I-map of .
Theorem (Completeness): If and are not separated given in (), then and are dependent given , in some distribution represented as () that factorizes over .
The proof of the theorems are available on Koller textbook.
Other Markov properties
For directed graphs, we defined I-maps in terms of local Markov properties, and derived global independence.For undirected graphs, we defined I-maps in terms of global Markov properties, and will now derive local independence.
The pairwise Markov independencies associated with UG are
For example, in figure 5, we have the following independence
Relationship between local and global Markov properties
For any Markov Network H, and any distribution P, we have that if then
For any Markov Network H, and any distribution P, we have that if then
Let P be a positive distribution. If , then
The following three statements are equivalent for a positive distribution P:
Above equivalence relies on the positivity assumption of . For nonpositive distributions, there are examples of distributions , there are examples which satisfies one of these properties, but not the stronger property.
Perfect maps
Definition A Markov network is a perfect map for if for any ; ; we have that
Note that, just like DMs, not every distribution has a perfect map as UGM.
Exponential Form
Constraining clique potentials to be positive could be inconvenient (e.g., the interactions between a pair of atoms can be either attractive or repulsive). We represent a clique potential in an unconstrained form using a real-value “energy” function :
Thus, this gives the joint distribution an additive structure:
where the is called the “free energy”.
The exponential ensures that the distribution is positive. In physics, this is called the “Boltzmann distribution”.In statistics, this is called a log-linear model (as Koller textbook introduces).
Leave a Comment