* Ideally we would have a graph containing a diverse and representative set of
  positive, negative, and incomparable edges.

* Often we are just given a set of annotations (or worse images) with name
  labels.

* a tree of positive edges can be added to each name in these datasets
* automatic viewpoint classifications can be used to guess if a pair is
  incomparable.

* each edge is given a user-id, categorical confidence, and timestamp to
  identify who did the review and their confidence level. 
    * Confidence levels are "unspecified", "guessing", "pretty sure", and "absolutely sure".
    * The user-id can store an algorithm name or a human username.
    * For a human confidence is self reported and defaults to "pretty sure"
    * For an algorithm that uses image information (like the pairwise classifier) the confidence is unspecified, meaning that the produced scores should be considered instead.
    * For an algorithm that uses heuristics (like algorithm that converts name labels to a tree) the confidence is specified as guessing.

* To curate a dataset the pairwise classifier can be trained using the heuristically generated edges.
* Then edges are prioritized for review based on their hardness. 
* Redundancy criteria is disabled in this case, but inconsistency recovery is not.
* The user then starts to correct errors and the resulting inconsistencies. 
* Whenever a user makes a review the confidence on the edge is updated, which
  prevents it from being shown again.


* GZMaster had no labeled incomparable cases and nothing can be inferred from
  viewpoint
* PZ_PB_RF_TRAIN has no labeled incomparable cases and a few can be inferred
  from viewpoint.


* When marking an existing positive edge as negative (and the other positive
  cases exist) you should predict edges within and between the new PCCs C and D
to make sure that the guessed edges don't have more of the same error.
This would prevent cycling between splitting and merging.