\documentclass[10pt,twocolumn,letterpaper]{article} \usepackage{amsmath} \usepackage{amssymb} \begin{document} \section{Foo} \begin{comment} def prob_isect(events): """ computes probability of intersection using the chain rule """ assert len(events) != 0 elif len(events) == 1: A = events[0] return P(A) elif len(events) == 2: A, B = events return P(A | B) * P(B) else: A = events[0] rest = events[1:] A_given_rest = A.condition(rest) return P(A_given_rest) * probs_isect(rest) def prob_union(events): """ computes P(union_i A_i) for events A_i using inclusion exclusion """ import operator as op def prod(items): return reduce(op.mul, items, 1.0) def sign_bit(n): return (2 * (n % 2) - 1) return sum( sign_bit(k) * sum( prob_isect(subset) for subset in it.combinations(events, k) ) for k in range(1, len(events) + 1) ) class Event(ut.NiceRepr): def __init__(a, label): parts = label.split('|') a.name = parts[0] a.given = ','.join(parts[1:]) def __nice__(a): return a.label() def split(a): return Event(a.name), Event(a.given) def label(a): if a.given: return a.name + '|' + a.given else: return a.name def __repr__(a): return a.__str__() def __eq__(a, b): return a.label() == b.label() def __hash__(a): return hash(a.__nice__()) def __or__(a, b): given_parts = a.given.split(',') + [b.name] + b.given.split(',') new_given = ','.join([g for g in given_parts if g]) label = a.name + '|' + new_given return Event(label) import pandas as pd import numpy as np rand = np.random.rand events = list(map(Event, ['A', 'B', 'C'])) marg = marginals = {A: rand() for A in events} # Amount of space not taken up by any known event complement = rand() * (1 - max(marginals.values())) # Amount of space in overlapping regions total_overlap = complement + sum(marginals.values()) - 1 cond = conditionals = {} remain = 1 for A, B in ut.combinations(events, 2): overlap = rand() # what probs are already accounted for # accounted = sum([cond[A|C] * marg[C] for C in events if A|C in cond]) cond[A|B] = (1 - accounted) * overlap # Find the opposite way using bayes rule for key, val in conditionals.items(): A, B = key.split() cond[B|A] = val * marg[B] / marg[A] print('cond = %s' % (ut.repr4(cond),)) conditional_events = [A, B, C] a1 = .1 a2 = .2 a3 = .3 a = [float('nan'), .1, .01, .01] for n in range(1, len(a)): P_any_out_c = sum(ut.choose(n, k) * a[k] for k in range(1, n + 1)) print('n = %r' % (n,)) print('P_any_out_c = %r' % (P_any_out_c,)) \end{comment} Biological Network Reconstruction Causal protein signaling networks derived from multiparamater single-cell data Sachs et al. Science 2005 $\sum_{x \in X2} P(X1, X2=x, \ldots) = P(X1, X3, \ldots)$ % Lecture 1 Major operations: Given a joint distribution $X1$, $P(X1, X2, X3, ... Xn)$ \textbf{Conditioning: Reduction} - observing a variable, which assigns it a value. Eliminates all possible assignments that are not consistent. Reduction on variable X2 by observing its value to be a constant x $P(X1, X2=x, X3, ..., Xn)$ \textbf{Conditioning: Re-normalization} - Takes an un-normalized measure and divides it by its sum. Turns it into a normalized probability distribution. $P(X1, X3, ..., Xn | X2=x)$ \textbf{Marginalization} -- takes a probability distribution over large set of variables and produces a probability distribution over a subset of values. Sums over all entries in the PDT (prob distribution table) that have a particular value. marginalize over X2 $\sum_{x \in X2} P(X1, X2=x, ...) = P(X1, X3, ...)$ summations can be pushed through the expression as long as it is not pushed through statements involving the variable. %--------------------------------- % Lecture 2 Factor \ni unnormalized measures, probability, conditional probability distribution Fundamental buildings blocks for defining high dimensional distributions. A factor is a function $\phi(X1, X2, X3) \righarrow \Real$ The set of arguments that the factor takes is the Scope Operations on factors: \textbf{Factor product} $\phi1(A, B) * \phi2(B, C) = \phi12(A, B, C)$ \textbf{Factor marginalization / reduction} %--------------------------------- % Lecture 3 Bayes Nets must be non-zero P is a product of CPDS CPDS are non-negative Need to prove it sums to 1. %--------------------------------- % Lecture 5 When can X influence Y? * When X is a parent of Y * When X is a child of Y X -> Y X <- Y * When X is an ancestor of Y * When X is an descendant of Y X -> W -> Y X <- W <- Y * When there is a common cause W of X and Y X <- W -> Y ONLY EXCEPTION Two causes have a joint effect this is V-structure * When X and Y have a common effect W X -> W <- Y \textbf{Active trail:} Given nothing: A trail $X_1 -- X_k$ is active if it has no V-structures $X_{i-1} -> X_i <- X_{i+1}$ Influence can not flow through observed variables Given $Z$: A trail $X_1 -- X_k$ is active given $Z$ if for any V-structure $X_{i-1} -> X_i <- X_{i+1}$ we have that $X_i$ or one of its decedents is observed (\ie{} $\in Z$) and no other $X_i$ is in $Z$ %--------------------------------- % Lecture 6 For random vars X and Y P \models X \perp Y if P(X, Y) = P(X) P(Y) P(X \given Y) = P(X) P(Y \given X) = P(Y) Conditional independence: For sets of rand vars X, Y, Z P \models (X \perp Y \given Z) if P(X, Y | Z) = P(X | Z) * P(Y | Z) P(X | Y,Z) = P(X | Z) P(Y | X,Z) = P(Y | Z) or Using factors P(X, Y, Z) \prop \phi1(X, Z) \phi2(Y, Z) %--------------------------------- d-separated (X, Y | Z) if there is no active trail between X and Y given Z If P factorizes over G, and $d-sep_G(X, Y | Z)$ then P satisfies (X \perp Y | Z) Any node is d-separated from its non-descendants given its parents I-maps (Independence statements) I(G) = all dependences in G (X \perp Y \given Z) \where d-sep(X, Y \given Z) If P factorizes over G, then G is an I-map for P. (means can read from G indecencies in P regardless of parameters) Also if G is an I-map for P, then P factorizes over GFalse Chain rule for probabilities P(An, ..., A1) = P(An | An-1, ... A1) P(An-1, ..., A1) P(A2, A1) = P(A2 | A1) * P(A1) P(A4,A3,A2,A1) = P(A4 | A3, A2, A1) * P(A3 | A2,A1) * P(A2|A1) * P(A1) % ---- Gibbs Sampling For some (gibbs) distribution $P_\Phi(X_1, \ldots, \X_n)$ These could come from a directed or undirected model they are just factors. We make each assignment a node in a markov chain. Transition model given starting state $x$ \begin{comment} python << endpython # P - probability distribution def gibbs_sample(P): """ X_list = [X1, X2, ..., Xn] """ # Start with an arbitrary ordering X_list = P.random_state() for i in range(n): # Sample value of x_i given the values of the rest Xs_sans_i = X_list[:i] + X_list[i + 1:] x_i = P.sample(given=Xs_sans_i) # Set x_i and iterate for a new variable X_list[i] = x_i class JointDistri(object) def sample(self, given): #The chain rule allows us to compute the numerator by simply multiplying all #factors together (operations are linear in the number of factors). We can #get the denominator by simply summing out Xi from the numerator (which is #linear in the number of values of Xi). Therefore it's always tractable. # sample(X_i, given=Xs_sans_i) = sample(X_i, Xs_sans_i) / P(Xs_sans_i) endpython \end{comment} \[x_i \sampled P_ P_\Phi(X_i \given x_{-i}\] ------------------ DIFFERENCE BETWEEN PROBABILITY AND LIKELIHOOD Given a conclusion A (name is unknown) and an observation B (query descriptors) Probability attempts to determine the conclusion given the evidence $\Pr(\A \given \B)$ Likelihood supposes a conclusion and reports how likely you were to have seen that evidence: $\Pr(\B \given \A)$. To convert these you must know the independant probability of the events and the evidence. -------- Probability Identities % http://math.arizona.edu/~jwatkins/a-basics.pdf % https://en.wikipedia.org/wiki/Probability_axioms %* \url{https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle} Let $P(A, B)$ = $P(A \and B)$ Symbols: $∩ = \isect = \and = , $ $∪ = \union = \or$ # In usage: $P(A \isect B) = P(A \and B) = P(A ∩ B) = P(A, B)$ $P(A \union B) = P(A \or B) = P(A ∪ B)$ # $\not{A} = A^c = ~A$ Bayes rule: $P(A | B) = P(B | A) P(A) / P(B)$ Complement Rule: * $P(A) + P(\not{A}) = 1$ Demorgans Laws: * $(A \union B)^c = A^c \isect B^c$ * $(A \isect B)^c = A^c \union B^c$ Axiom of probability: * $P(A ∩ B) = P(A | B) * P(B)$ * $P(A ∪ B) = P(A) + P(B) - P(A ∩ B)$ Law of total probability: * $P(A) = \sum_{b} P(A, B=b)$ * $P(A) = \sum_{n} P(A, B_n)$ * $P(A) = \sum_{n} P(A | B_n) P(B_n)$ # If $C$ is independent of any $B_n$ * $P(A | C) = \sum_{n} P(A | C \and B_n) P(B_n | C)$ * $P(A | C) = \sum_{n} P(A | C \and B_n) P(B_n)$ %Inclusion-Exclusion Rule: % * $P(A \or B) = P(A) + P(B) - P(A \and B)$ Bonferroni Inequality: * $P(A \or B) \leq P(A) + P(B)$ Conditional Probability: * $P(A | B) = \frac{P(A \and B)}{P(B)}$ General Inclusion/Exclusion * $P(\Union_{i=1}^n A_i) = P(any)$ * $P(\Union_{i=1}^n A_i) = \sum_{i}^n A_i - \sum_{1 \leq i \lt j \leq n} (A_i \isect A_j) + \ldots + \sum_{1 \leq i \lt j \lt k \leq n} (A_i \isect A_j \isect A_k) - ... + (-1)^{n-1} P(\Isect{i=1}^n A_i)$ * $P(\Union_{i=1}^n A_i) = \sum_{k=1}^n (-1)^k (\sum_{1 \leq i_1 ... \leq i_k \leq n} (\Isect{\ell=1}^k A_{i_\ell}))$ # so, for every possible subset J, we do * $P(\Union_{i=1}^n A_i) = \sum_{\nullset \neq J \subseteq{1, 2, ..., n}} (-1)^{|J| - 1} \Isect{j \in J} A_j$ Probabily Chain Rule: # Note, this is just a generalized version of the conditional probability rule * $P(\Isect{i=1}^n A_i) = P(A_n, ... A_1) = P(all(A))$ * $P(A_n, ... A_1) = P(A_n | A_{n-1} ... A_1) * P(A_{n-1} ... A_1)$ * $P(A_n | A_{n-1} ... A_1) = P(A_n, ... A_1) / P(A_{n-1} ... A_1)$ * $P(A_1, A_2, A_3) = P(A_1 | A_2, A_3) * P(A_2, A_3)$ * $P(A_1, A_2, A_3) = P(A_1 | A_2, A_3) * P(A_2 | A_3) * P(A_3)$ Conditional Independence: A and B are conditionaly independent given C iff * $P(A, B | C) = P(A | C) * P(B | C)$ or equivalently * $P(A | B, C) = P(A | C)$ If A and B are independant iff: * $P(A) = P(A | B)$ * $P(A \and B) = P(A \isect B) = P(A) * P(B)$ If A and B are mutually exclusive iff: * $P(A \or B) = P(A \union B) = P(A) + P(B)$ Maximum A-Posteriori (MAP) * $f(\theta | x) = f(x | \theta) g(\theta) / Z$ * $Z = \sum_{\nu \in \Theta} f(x | \nu) g(\nu)$ SEE ALSO: Bayes Estimators Inproper priors seem to only happen when there are infinite possibilities Lets handle the case of only a finite amount of possibilities Conditional ``Lensing'' Property: This is where we pick a variable $A_{n}$ and hold it constant ( this was derived from a conversation with Chuck, not sure if I made an error) * lensing on ($A_n$) * $P(A_1 | A_2 ... A_n) = P(A_2 ... A_{n-1} | A_1, A_n) P(A_1 | A_n) / Z$ * $Z = \sum_{A_1} P(A_2 ... A_{n-1} | A_1, A_n) P(A_1 | A_n)$ or more simply where we hold $C$ constant $P(A | B, C) = P(B | A, C) * P(A | C) / (\sum_{A'} P(B | A', C) P(A' | C))$ but $\sum_{A'} P(B | A', C) P(A' | C) = P(B) P(1) = P(B)$ therefore $P(A | B, C) = P(B | A, C) * P(A | C) / P(B)$ if $C$ was removed this becomes normal Bayes rule $P(A | B) = P(B | A) * P(A) / (\sum_{a} P(B | A=a) P(A=a))$ $(\sum_{a} P(B | A=a) P(A=a)) = P(B)$ $P(A | B) = P(B | A) * P(A) / P(B)$ DOES? $P(A, B | C) = \sum_{d} P(A, B, D=d | C)$ $P(A, B | C) = \sum_{d} P(A, B, | C, D=d) * p(D=d)$ $P(A, B, D | C) = P(A, B, | C, D) * p(D)$ $P(A, B | C, D) = P(A, B, | C, D) $ Special case where $P(\Isect{i=1}^n A_i)$ only depends on n \begin{equation} P(\bigcup_i^n A_i) = \sum_{k = 1}^n (-1)^{k-1} \binom{n}{k} a_k \end{equation} \end{document}