# Review Procedure Outline In this chapter we define an algorithm with "a human in the loop" to group a large set of unidentified annotations by individual. This is done by viewing annotations as nodes in a graph, drawing edges between matching individuals, and then inspecting the resulting connected components. Our one-vs-many algorithm searches for candidate edges and a learned classifier predicts the probability that each edge is a match. Automatic decisions are made edges with probability above a threshold. A human reviewer decides which (if any) of the remaining suggestions are correct. An inference mechanism uses the graph structure to determine which candidate edges need to be reviewed. A priority mechanism determines the order in which candidate edges are presented to a user. Together the inference and priority mechanism reduce the total number of manual reviews needed. Because human reviewers will occasionally make mistakes, the inference mechanism includes a redundancy criteria that detects and corrects these mistakes. A termination criteria determines when identification is complete. ## Introduction ### Definitions and Terms * The review procedure is formalized around the notion of a **decision graph**. * A decision graph is `G = (V, E)`. * The set of nodes `V` is the set of annotations. * The set of edges `E = E_p ∪ E_n ∪ E_i` is composed of three disjoint sets: * `E_p` is the set of positive edges between comparable annotations of the same animal. * `E_n` is the set of negative edges between comparable annotations between of different animals. * `E_i` is the set of incomparable edges between annotations that are not-comparable. * Individuals are identified by **positive connected components (PCC)**. * Each PCC is a connected component in a subgraph `G_p = (V, E_p) ⊆ G`. ### Identification Criteria * There are several criteria that a decision graph can satisfy. * Minimal Satisfaction Criteria: * Consistency: A PCC is consistent iff it contains no negative edges. * Completeness: For every pair of PCCs there is at least one negative edge between them. * Redundancy: * K-positive-redundancy: * A PCC is "k-positive-redundant" if it is consistent and it contains no cut-sets involving fewer than `k` positive edges. * This is the equivalent to "k-edge-connectedness". * Note: A PCC that is k-positive-redundant is also (k-1)-positive-redundant (for `k ≥ 2`). * A graph has this property if all PCCs have this property. * K-negative-redundancy: * A pair of PCCs is "k-negative-redundant" if there are `k` negative edges between them. * A graph has this property if all pairs of PCCs have this property. * For a graph to be "k-negative-redundant", it requires O(k N^2) negative decisions (where N is the number of individuals). * `1`-negative-redundancy is equivalent to completeness. * K-redundancy is tied to the number of decision mistakes that must be made in order for part of the graph to appear consistent or complete. Specifically, for a PCC or pair of PCCs to both contain a mistake and be k-redundant then at least `k` mistakes must be made. * Termination Criteria * Given parameters `kp` and `kn` * The review procedure terminates if the graph is consistent, complete, `(kp)`-positive-redundancy, and `(kn)`-negative-redundancy. * Note: * Negative edges do not contribute to the identification grouping, they only ensure that the positive PCCs are complete. * It will often be impractical to enforce that a graph must be complete in order to terminate. * At some point it will be necessary to infer that the remaining edges between PCCs are negative. This point must be chosen such that there is a high probability that the unreviewed pairs of PCCS are negative. ### Procedure Overview The review procedure is to construct a decision graph that satisfies the identification criteria. * All annotations to be identified are used as nodes. * An algorithm initializes a set of candidate edges. * An automatic classifier predicts a label for each edges and any confident decision is added to the decision graph. * The remaining candidate edges are prioritized and manually reviewed. * After a certain number of reviews candidate edges are recomputed using updated information. * This repeats until the termination criteria is satisfied. ### Discussion Outline * We first consider a simplified version of this procedure. We assume that review errors do not occur. In this context we: * Describe how candidate edges are chosen. * Describe how edges are automatically classified. * Define a dynamic priority-based ordering for remaining candidates that results in the minimum number of user reviews. * We then consider the practical case, where both users and algorithms make errors. In this context we: * Define and contrast inconsistencies versus mistakes. * We compare this context to the simplified context. * Describe an inconsistency recovery algorithm. * Describe a mistake discovery algorithm. * Discuss the implications of early stopping (stopping before all reviews are complete). * Discuss the implications of when it is not possible satisfy all identification criteria using the candidate edges. ## Review Procedure in a simplified context ### Simplifying assumptions: 1. Classifier thresholds are set such that no errors are made. 2. Users do not make errors. 3. A subset of the candidate edges can satisfy the identification criteria (i.e. candidate edges are computed only once). 4. Redundancy parameters `kp` and `kn` are set to `0`. ### Candidate edge search: * Run the one-vs-many algorithm. * Use each annotation as a query and the entire set of annotations as the database. * The result is a ranked list of database annotations * For each ranked list add the top `N` (we use `N=5`) query-result pairs to the set of candidate edges. ### Automatic classification: * Predict the probabilities `P(match-state)` and `P(photobomb-state)` for each candidate edge using the one-vs-one classifiers. * Automatically add an edge to the decision graph if we are confident in its matching state and we are confident that it is not a photobomb. * Given a `state-threshold` for each `state in {match, no-match, not-comp}`, automatically add an edge to the appropriate decision edge set if: * `P(state) > state-threshold`, and * `P.argmax() == state`, and * `P(photobomb) < photobomb-threshold`. ### Prioritizing edges for user review: * To satisfy minimal identification criteria we observe: * Each PCC needs enough reviews to be connected. * Therefore edges within existing PCCs do not need to be reviewed. * Only one negative edge between any two PCCs needs to be reviewed. * Therefore edges between PCCs with at least one other negative review do not need to be reviewed. * The minimum number of reviews is achieved by reviewing as many positive edges (between existing PCCs) as possible before reviewing any negative edges. See [Appendix] for the proof. * This scheme prevents review of redundant edges which would increase the amount of work required by the user. * Edges to review are stored in a priority queue. * The data structure is a hybrid of a dict and a max-heap. * The initial priority of each edge is `P(match)` --- the likelihood that the edge is positive. * The PCCS in `(V, E_p)` are dynamically maintained. * Dynamic connectivity data structure (Levels of Euler Tour Forests) [Holm-Poly-2001]. * Euler Tours are maintained in a binary search tree with fast split and join capabilities. [Sun-2016] * Edges are popped from the priority queue and presented to the user. In each case the user can make one of three decisions: * Reviewing an edge as `no-match` between two existing PCCs, indicates that the PCCs are different individuals. The priority of other edges between those two PCCs can be set to 0. * Reviewing an edge as `match` between two existing PCCs they are merged into a single PCC. The priority of edges within that PCC can now be set to 0. The priority of edges between the new PCC and all other PCCs that it is connected to via `no-match` edges are also set to 0. * Reviewing an edge as `not-comp` only sets its own priority to 0 and does not influence any other edge. * Identification is complete when the priority queue contains only no edges with positive priority. ## Review Procedure in a practical context Removing the assumptions in the simplified procedure has three consequences: 1. Some mistakes will pass thresholds. 2. Users will make errors. 3. The one-vs-many algorithm might not add all necessary edges to the decision graph. ### Contrasting inconsistency and mistakes * **Inconsistency** - A PCC containing more than zero negative edges. * **Mistake** - A review-state that differs from the groundtruth state. * Inconsistency Implications: * An inconsistency implies that a mistake is present, but it does not determine which edge contains the mistake. * Inconsistencies can be used to detect that mistakes exist. * Inconsistencies can only be present if a cycle exists in PCC (otherwise the PCC would simply be split in two). * If a graph contains inconsistent reviews then it necessarily contains redundant reviews. * Mistake Discovery: * Mistakes can be present without inconsistencies. * Carefully selecting redundant edges to review can cause inconsistencies and therefore expose the existence of mistakes. ### Comparison with the simplified context * Candidate edge selection is the same, except it is no longer guaranteed that it is possible to satisfy the termination criteria with these edges. * Some necessary positive and negative edges may be missing. * The likelihood that these missing edges are negative is high. * Therefore, unconnected PCCs can be said to have implicit negative edges. * This may cause a small number of false negatives. * Automatic decisions are the same except now mistakes can occur and possibly produce inconsistencies. * These inconsistencies are easily detected. * Detecting an inconsistency implies a mistake was made. * Resolving the inconsistency can correct a mistake. * The review priority algorithm is modified with an algorithm for resolving inconsistencies (details in [section]). * Discovering mistakes requires redundant reviews. * The simplified review priority algorithm avoids reviewing redundant edges. * This implies that it is impossible for inconsistencies to be added and therefore to discover mistakes. * The review priority algorithm is modified with an algorithm for discovering mistakes. (details in [section]). * The user will stop before all termination criteria has been satisfied. * In practice the user will stop reviewing early. * It is important that we are confident and have discovered as many PCCs as possible. It is less important that we verify all PCCs are indeed different. * This might cause false negatives, but does not cause serious practical problems because the probability of these unreviewed edges match is low. ### Inconsistency recovery * An inconsistent state can be detected as any PCC that contains a negative edge in the decision graph. * Within an inconsistent PCC find all edges labeled as `no-match`. * For each no-match edges with endpoints `(s, t)` run a weighted min-st cut using the number of times an edge was reviewed as the weight. * If the weight of the no-match edge is greater than the sum of the weights of the cut edges append the no-match edge to a list of candidate edges otherwise append the cut edges. * Stack and return the list of all the candidate edges. * The review priority algorithm is modified by increase the priority of these flagged edges by `2`. This causes flagged edges to move to the top of the queue and be resolved before new reviews are considered. ### Mistake discovery * To discover mistakes we must review redundant edges. * We should choose a small number to minimize the number of reviews. * We should choose edges that are likely to discover an inconsistency if one exists. * Observe that in the simplified algorithm a long chain of edges would be linked as a PCC. However, just a single error can cause the endpoints of this chain to be different individuals. By choosing an algorithm which would compare these "far away" nodes we can discover errors or be more certain that the PCC is indeed matched correctly. * Specifically we need to augment the minimal set of edges with a small set of redundant edges that is likely to expose any errors. * This is an edge-augmentation problem. * We use k-edge-connected augmentation because it has an intuitive interpretation. * For a PCC to be contain an undetected error, it must contain at least `k` internal errors. * Therefore as `k` increases the probability that there are undetected errors decreases. * See [Appendix] for details. * The review priority algorithm is modified such that on the modification of a PCC the all internal edges have their priority set to 0 except those that are the solution of the diameter augmentation instance. * Adding certain redundant edges to PCCs with mistakes will cause inconsistencies which can then be resolved with the Inconsistency Recovery Algorithm. ### Discussion of early stopping * In practice we do not review all negative edges between PCCs. * There are too many reviews to be made: `choose(len(pccs), 2)`. * Reviewing negative edges not add to the knowledge of who individuals are, it increases the confidence that the grouping is correct. * Because pairs with high `P(match)` are reviewed first, the probability that remaining edges contain a significant number of positive matches is small. Therefore we expect only a small amount of error if we assume unreviewed edges are different. ### Discussion of missing candidate edges * In practice we ignore matching edges that failed to be added to the review graph. * The reasoning is similar to why we don't review all negative edges, the probability that any particular edge not in the decision graph is actually between correct individuals is low. * In the case where perfect information is desired it is possible to simply review all edges in the priority queue and then add edges between annotations in pairs of unreviewed PCCs until all pairs of PCCs have a negative edge between them. ### Recomputing candidate edges * The one-vs-many algorithm is re-invoked with new name labels. (Note the nearest neighbor matching returns the `K` nearest neighbors not belonging to the same individual.) (Note this requires a bit of reworking in the actual pipeline and requires certain caches to be disabled). * A candidate match from one-vs-many that appears after rank (5) has a low probability of being correct. * After a "re-match" criteria has been met, the one-vs-many algorithm is run again to generate new candidate edges. * Nearest neighbors are not allowed to come from individuals in the same PCC (or potentially edges between negative PCCs) * If no positive matches are found in an annot's ranked list then one of the following cases has occurred: * There are no more matches to be found (i.e. all remaining comparable edges are negative) * A correct match exists further down in the ranked list. * A re-query would find this edge (due to improved LNBNN scores) * The matching algorithm simply fails for this pair * New merge edges may be found * New merge edges may be found * Details of "re-match" criteria. * After a `threshold` number of reviews, * After a `threshold` number of negative reviews, * Remaining review probability falls bellow `threshold` -------------- # Appendix ## Proof of optimality * Prioritizing review of matching edges (as measured by `P(match)`) minimizes the number of reviews * Let `pccs = { ... { ... a ... } ...}` be the true annotation clusters. * For each `pcc in pccs` at least `len(pcc) - 1` reviews will need to be made to connect the annotations. Thus the minimum number of positive reviews is: `sum(len(pcc) - 1 for pcc in pccs)`. * To ensure that there are no merge cases there must be at least one negative review between each pair of PCCs. Thus the minimum number of negative reviews needed is `choose(len(pccs), 2)`. (Note: `choose` is the binomial coefficient) * Because positive and negative edges are disjoint the minimum total number of reviews required is: `sum(len(pcc) - 1 for pcc in pccs) + choose(len(pccs), 2)`. * A second negative review between components or a second positive edge within a component is mutually exclusive from the set of minimal reviews and thus strictly increases the total number of reviews. * When reviewing a positive edge it is always possible to determine if it would be redundant with another positive edge that exists. (test if the edge adds a cycle to a spanning tree of the PCC). * On the other hand when reviewing a negative edge it is not always possible to determine if it is redundant with some other negative edge that exists. This is because redundant negative edges are defined between positive PCCs. (all positive PCCs must be determined before you can guarantee that any two arbitrary pairs of negative edges are not redundant). * Reviewing all positive edges first guarantees that all redundant edges are never encountered. * Therefore it is always better to review edges more likely to be positive first. ## Edge Augmentation * There are many types of edges augmentation * d-diameter augmentation * k-vertex-connected augmentation * k-edge-connected augmentation ## Augmentation K-edge-connected: * Add a minimum number of edges such that a graph is k-connected. * Polynomial time for unweighted case. * Edge-Connectivity Augmentation Problems [Watanabe-Nakamura-87] * `O(k * min(k, |V|) * |V|^4 * (k|V| + |E|))` * `O(k^3 * V^5 + k * V^4 * E)` * For 2-connected unweighted and undirected: * `O(|V| + |E|)` * For 2-connected weighted and undirected: * `O(|V|^2)` approximation algorithm * Weighted case is NP-hard * On a Smallest Augmentation to k-Edge-connect a Graph [Watanabe-Nakamura-84] * ## Augmentation Diameter-D: * Given target diameter d, graph `G=(V, E)` and candidate non-edges `~E`, find the smallest set of non-edges `E' ⊆ ~E` such that the diameter of the augmented graph `G'=(V, E ∪ E')` is at most `E'`. * In general this problem is NP-hard, but in the unit-cost unit-length case there is an approximation algorithm with approximation ratio O(log(choose(|V|, 2)) * log(d)). * The basic idea of the approximation algorithm is to: * Reformulate the problem as "Restricted Diameter-d" which adds the constraint that any new path between nodes that violate the diameter constraint in the original graph must contain either exactly 1 or exactly 2 consecutive edges in `~E`. It turns out that any solution to "Restricted Diameter-d" is a solution to "Diameter-d". * Construct an integer linear program (ILP) that solves "Restricted Diameter-d". For the details of this ILP see [Dodis-99]. * Solve a relaxed version of the ILP to obtain fractional values. * Either apply a randomized rounding scheme to achieve an integral solution OR define an instance of hitting set using the fractional values and then approximately solve the hitting set problem using a greedy approach. ## Automatic exemplar selection * Extract PCC for each individual. * Use maximum weight minimum set cover to select a subset of annotations. * Create covering sets by thresholding match probabilities on edges in the PCC * Use the one-vs-many algorithm to rank each annotation in the PCC against the PCC * Augment the PCC with the top 5 results from the one-vs-many algorithm. * Predict one-vs-one match probabilities for all edges in the PCC * Cut edges under a threshold * Each annotation's covering set is defined by its neighbors. * Adjust threshold to find optimal cover with a specific number of exemplars. * The optimal cover is the new set of exemplars. ------------- # New Idea Scrap ## Exemplars for Redundancy Criterion * Use exemplars for termination criterion * If we enforce that there are only 4-5 exemplars per viewpoint, then we can afford to augment each PCC into a complete graph and force the user to predict on that graph. # TODO: * ensure that the part where one-vs-many is rerun is incorporated * come up with a careful and clear motivation * come up with a very short and concise process summary * The other intro is fine for a thesis paper, but for a conference paper it has to get to the technical part quicker.