\begin{comment}
    ./texfix.py --outline --fpaths chapter1-intro.tex

    fixtex --fpaths chapter1-intro.tex --outline --asmarkdown --numlines=999 --shortcite -w && ./checklang.py outline_chapter1-intro.md

    ./texfix.py --grep "\\\\[A-Za-z]*[^{a-zA-Z]"
    ./texfix.py --reformat --fpaths figdef1.tex
\end{comment}


\chapter{INTRODUCTION}\label{chap:intro}

\section{IMAGE-BASED IDENTIFICATION APPLIED TO POPULATION ECOLOGY}

    Population ecology relies on estimating the number of individual animals that inhabit an
      area~\cite{krebs_ecological_1999}.
    Estimating a population size is done in two phases:
    data collection and analysis.
    Data are collected as sets of \glossterm{sighting} and \glossterm{resighting} observations.
    A sighting is the first observation of an individual, and a resighting is a subsequent observation of a
      previously sighted individual.
    The observed data are then analyzed using software such as ``program MARK''~\cite{white_program_1999,
      schwarz_jolly_seber_2006} or Wildbook that applies statistical models such as the Lincoln-Petersen
      index~\cite{seber_estimation_1982}, Jolly-Seber model~\cite{jolly_explicit_1965, seber_note_1965}, or other
      related models~\cite{cormack_estimates_1964, chao_estimating_1987,kenneth._h._pollock_statistical_1990}.
    For an ecologist recording that an individual has been observed is simple, but determining if that
      observation is a sighting or a resighting can be challenging.
    This requires the ecologist to identify the individual by comparing against all other observations in the
      data set.

    Current methods to estimate a population size are limited by the data collection
    phase~\cite{sundaresan_network_2007, rubenstein_ecology_2010}. The statistical population models require an
    observation sample size that grows with the size of the population being studied~\cite{seber_estimation_1982}.
    As the number of observations increases so does the difficulty of determining identity. Thus, the scope of a
    population study is limited by the number of raw observations that can be made, and by the rate of determining
    the individual identity within a set of observations. Overcoming these limitations is of particular importance
    to wildlife preservation because population statistics are necessary to guide conservation
    decisions~\cite{rubenstein_behavioral_1998}.

    Consider images as a source of sight-resight observations. There are numerous advantages. Many observations can
    be made rapidly and simultaneously, due to the simplicity and availability of cameras. Recording an observation
    is as cheap and simple as taking a picture. Camera traps can be employed for autonomous data collection. In a
    wildlife conservancy or national park, observations can be crowd-sourced by gathering images from safari
    tourists and citizen scientists. Images can be accumulated and stored in a large dynamic dataset of
    observations that could grow by thousands of images each day. However, the challenge of identifying the
    individuals in the images remains. Manual methods are infeasible due to the rapid rate at which images can be
    collected. Therefore, we must turn towards computer vision based methods.

    This \thesis{} develops the foundation of the image analysis component of the ``Image Based Ecological
      Information System'' (IBEIS).
    The purpose of this system is to gain ecological insight from images using computer vision.
    We focus on estimating the size of a population of animals as just one example of ecological insight that
      might be gained from images.
    Thus, we come to the core problem addressed in this \thesis{}:
    image-based identification of individual animals.

\section{CHALLENGES OF ANIMAL IDENTIFICATION}\label{sec:challenges}

    In animal identification we are given a database of images.
    This database may initially be empty.
    Each image is cropped to a bounding box around an animal of interest and labeled with that animal's identity.
    For a new query image, the goal is to determine if any other images of the individual are in the database.
    If the query is matched, it is added to the database as a resighting of that individual.
    If the query is not matched, then it is added as a new individual.

    In this work we focus on identifying individuals of species with distinguishing textures. Examples include
    zebras, giraffes, humpback whales, lionfish, nautiluses, hyenas, whale sharks, wildebeest, wild dogs, jaguars,
    cheetahs, leopards, frogs, toads, snails, and seals. The primary species that we will consider in this
    \thesis{} are plains and Grévy's zebras, but we will maintain a secondary focus on Masai giraffes and humpback
    whales. The difficulty of animal identification depends on the distinctiveness of the visual patterns that
    distinguish an individual from others of its species. In addition, the images we identify are collected ``in
    the wild'' and therefore contain occlusion, distracting features, variations in viewpoint and image quality.

    This section will present several examples to illustrate the challenges faced in animal identification. The
    discussion will begin with the challenges posed by the three primary species. Then problems common to all
    species will be described. These will be illustrated using plains zebras because they are the most challenging
    species considered in this \thesis{}.

    \subsection{Distinguishing textures of each species}
        The plains zebra --- shown in~\cref{fig:PlainsFigure} --- is challenging to visually identify because
        individuals have relatively few distinguishing texture features. For most plains zebras, the majority of distinctive
        information lies in a small area on the front shoulder. \Cref{fig:HardCaseFigure} illustrates that the patterns
        that distinguish two individuals can be subtle, even when the features are clearly visible. The matching
        difficulty greatly increases when features are partially occluded, the viewpoint changes, or the image quality
        is poor.

        In contrast, Masai giraffes and Grévy's zebras, shown in~\cref{fig:GirMasaiFigure}
        and~\cref{fig:GrevysFigure} respectively, have an abundance of distinctive features. Distinctive textures
        that are unique to each individual are spread across the entire body of a Masai giraffe. For a Grévy's
        zebra there is a high density of distinguishing information above both front and back legs, as well as a
        moderate density of distinctive textures along the side of the body. The high density of distinctive
        textures in Masai giraffes and Grévy's zebras increases the likelihood that the same distinctive features
        can be seen from different viewpoints. Even so, the problem is still difficult due to ``in the wild''
        conditions such animal pose, occlusion, and image quality.

        There are some species, like Humpback whales, where some individuals may contain distinguishing textures
          while others may lack them entirely.
        This means that only a subset of humpback whales will be able to be identified with the texture based
          techniques that we will consider in this thesis.
        However, other cues --- like the shape of the notches along the trailing edge of the fluke --- can be
          used to distinguish between different individuals.
        %The work of Hendrick Weideman~\cite{hendrick} addresses identifying humpback whales using shape features.
        The work of Weideman and Jablons~\cite{jablons_identifying_2016} addresses identifying
          humpback whales using trailing edge shape features.
        The example in~\cref{fig:HumpbackFig} illustrates individual humpback whales with and without distinctive
          textures.

        \PlainsFigure{}

        \HardCaseFigure{}

        \GirMasaiFigure{}

        \GrevysFigure{}

        \HumpbackFig{}

    \FloatBarrier{}
    \subsection{Viewpoint and pose}
        One of the most difficult challenges faced in the animal identification problem is viewpoint. Animals are seen
        in a variety of poses and viewpoints, which can cause distinctive features to appear distorted. The patterns on
        the left and right sides of animals are almost always asymmetric. Therefore, matches can only be established
        using overlapping viewpoints and only if the viewpoints are distinctive. Some viewpoints, such as the backs of
        plains zebras, lack distinguishing information as shown in~\cref{fig:BacksFigure}. The effect of pose and
        viewpoint variation can be seen in~\cref{fig:ThreeSixtyFigure} and~\cref{fig:PoseFigure}.

        \BacksFigure{}

        \ThreeSixtyFigure{}

        \PoseFigure{}

    \FloatBarrier{}
    \subsection{Occluders and distractors}
        Because images of animals are often taken ``in the wild'', other objects in the image can act as
        \glossterm{occluders} or \glossterm{distractors}. Objects such as grass, bushes, trees or other animals, can act
        as occluders by partially obscuring the features that distinguish one individual from another. The appearance of
        the other animals nearby can be distracting because features from these animals will match different animals in
        the database. These \glossterm{distractors} may also be from non-animal features when multiple pictures are
        taken against the same background as animals move through the same field of view. Several examples of occlusions
        and distractors are illustrated in~\cref{fig:OccludeFigure}.

        \OccludeFigure{}

    \FloatBarrier{}
    \subsection{Image quality}
        Image quality is influenced by lighting, shadows, the camera used, image resolution, and the size of the
        animal in the image. Outdoor images will naturally have large variations in illumination. Different cameras
        can produce visual differences between images of an object. Images taken out of focus, from far away, or
        with a non-steady camera can cause animals to appear blurred. The effects of outdoor shadow and
        illumination are illustrated in~\cref{fig:IlluminationFigure}. \Cref{fig:QualityFigure} illustrates five
        categories of image quality that will be described later in~\cref{sub:viewqual}.

        \IlluminationFigure{}

        \QualityFigure{}

    \FloatBarrier{}
    \subsection{Aging and injuries}
        The appearance of an individual changes over time due to aging and other factors including injuries. An example
        of the difference between a juvenile and adult zebra is shown in~\cref{fig:AgeFigure}. An example of how
        injuries can both remove distinctive features and add new ones is shown in~\cref{fig:GashFigure}.

        \AgeFigure{}

        \GashFigure{}

\FloatBarrier{}
\section{THE GREAT ZEBRA COUNT}\label{sec:introgzc}

    To further illustrate the problems addressed in this \thesis{}, we consider the ``Great Zebra Count'' (\GZC{}),
    held at Nairobi National Park on March 1\st{} and 2\nd{}, $2015$~\cite{rubenstein_great_2015}. This event was
    designed with two purposes in mind: (1) to involve citizens in the scientific data collection effort, thereby
    increasing their interest in conservation, and (2) to determine the number of plains zebras and Masai giraffes
    in the park.

    \subsection{Data collection}
        Volunteer participants --- each with his or her own camera --- arrived by car at the park.
        Some cars had more than one photographer.
        Each car was assigned a route to drive through the park.
        We attached a GPS dongle to each car to record time and location throughout the drive.
        Correlating this with the time stamp on each image (after adding a correction offset for each camera)
          allowed us to determine the geolocation of each image.
        Each photographer was given instructions guiding them toward taking quality images of the left sides of
          the animals they saw.
        When the cars returned --- some after just an hour or two, others after the whole day --- the images were
          copied from the cameras, a small sample of each photographer's images was immediately processed to
          illustrate what we would do with the data, and the entire set of images was stored for further
          processing.
        The result of this crowd-sourced collection event was a $\SI{48}{\giga\byte}$ dataset consisting of
          $9406$ images.

    \subsection{Data processing}\label{subsec:introdataprocess}

        After the event, the entire collection of images was processed using a preliminary version of the system in
        order to generate the final count. The preliminary system followed the workflow of: %
        \begin{enumin}
            %\item ingest images  %
            \item \occurrence{} grouping,  %
            \item animal detection, %
            \item viewpoint and quality labeling,  %
            \item \intraoccurrence{} matching, %
            \item \vsexemplar{} identification, %
            \item consistency checks,  and %
            \item population estimation.  %
        \end{enumin}
        %\Cref{chap:application} discusses this workflow
        %in greater detail. 
        Here, we provide a brief overview of each step involved in the processing of the \GZC{} image data, and then we
        will describe the challenges that arose.

        \subsubsection{Occurrence grouping}
            The images were first divided into \glossterm{\occurrences{}} --- a standard term defined by the Darwin
            Core~\cite{wieczorek_darwin_2012} to denote a collection of evidence (\eg{} images) that an organism exists
            within defined location and time-frame. In the scope of this application, an \occurrence{} is a cluster of
            images taken within a small window of time and space. Images are grouped into \occurrences{} using the GPS
            and time data. Details are provided in~\cref{app:occurgroup}.

            These computed occurrences are valuable measurements for multiple components of the IBEIS software.
            At its core an occurrence describes \wquest{when} a group of animals was seen and \wquest{where} that
              group was seen.
            From this starting point other algorithms can address questions like:
            \wquest{how many} animals there were, \wquest{who} an animal is, \wquest{who else} is an animal with,
              and \wquest{where else} have these animals been seen?
            
            Furthermore, there are computational and algorithmic benefits to first grouping images into an
              \occurrence{}.
            One benefit is that an \occurrence{} can be used as a semantic processing unit to distribute
              manageable chunks of work to users of the system.
            Another is that \occurrences{} can be used to improve the results of identification.
            Typically, there will be only a few individuals within an \occurrence{}, and it is not uncommon for
              each individual to photographed multiple times and from multiple viewpoints.
            This redundancy in images will be exploited in \Cref{chap:graphid}.

        \subsubsection{Animal detection}
            Before matching begins each image is cropped to focus on a particular animal and remove background
              distractors.
            A detection algorithm localizes animals within the images.
            Each verified detection generates an \glossterm{\annot{}} --- a bounding box around a single animal
              in an image.
            An example illustrating detection of plains zebras is shown in~\cref{fig:DetectFigure}.
            In the \GZC{} each detection was manually verified before becoming an \annot{}, but recent work
              introduces an automatic verification mechanism and reduces the need for complete manual review.
            The details of the detection algorithm are beyond the scope of this \thesis{}, and are described in
              the work of Parham~\cite{parham_photographic_2015,parham_detecting_2016}.

            \DetectFigure{}

        \subsubsection{Viewpoint and quality labeling}\label{sub:viewqual}
            When determining the number of animals in a population it is important to account for factors that can lead
            to over-counting. If two \annots{} of the same individual are not matched, then that individual will be
            counted twice. This could happen due to factors such as viewpoint and quality. For example, one \annot{}
            showing only the left side of an animal and another \annot{} showing only the right side the same animal
            cannot be matched because they are \glossterm{incomparable}. The two \annots{} are comparable when they
            share regions with distinguishing patterns that can be put in correspondence. Viewpoint is the primary
            reason that two \annots{} are not comparable. However, other factors like image quality and heavy occlusion
            can corrupt distinguishing patterns rendering the \annot{} unidentifiable --- not comparable with any other
            \annot{}. We must define what it means for two \annots{} to be comparable before we can estimate a
            population size.

            Determining if an individual can be identified is analogous to the
            notion of a marked-individual~\cite{seber_estimation_1982}. For an
            \annot{} to be identifiable the patterns that can distinguish it
            from the rest of the population must be clear and visible, otherwise
            the \annot{} may not be able to find or be compared to potential
            matches. This means an \annot{} is only identifiable if
            \begin{enumin}
                \item the image quality is high enough, and %
                \item it has a viewpoint that is comparable to all potential
                matches. %
            \end{enumin}
            
            To address this challenge we label each \annot{} with $5$ discrete quality labels and $8$ discrete viewpoint
            labels. The quality labels we define are: \qualJunk{}, \qualPoor{}, \qualOk{}, \qualGood{}, and
            \qualExcellent{}. The \qualJunk{} label is given to \annots{} that almost certainly will not be able to be
            identified, and \qualPoor{} labels are given to \annots{} that will likely be unidentifiable for a computer
            vision algorithm. The $\qualGood{}$ and \qualExcellent{} labels are given to clear, well illuminated
            \annots{} with little to no occlusion with \qualExcellent{} being reserved for the best of the best. All
            other \annots{} are labeled as $\qualOk$. The viewpoint labels we define are: \vpFront{}, \vpFrontLeft{},
            \vpLeft{}, \vpBackLeft{}, \vpBack{}, \vpBackRight{}, \vpBack{}, and \vpFrontRight{}. Note, that additional
            viewpoint labels like $\vpUp{}$ and $\vpDown{}$ may be necessary for animals such as lionfish or turtles.
            However, the $8$ labels we use are sufficient for animals like zebras and giraffes because they are most
            commonly seen in upright positions.

            In an effort to ensure that all \annots{} used in the \GZC{} were comparable, we did not include any
            \annot{} that had junk or poor qualities. We also did not include \annots{} not labeled with a left or
            frontleft viewpoint to account for limitations in the initial ranking algorithm. All labelings of
            viewpoint and quality were generated manually. Since then, we have trained viewpoint and quality
            classifiers using this manual data. Automatic detection of quality and viewpoint is discussed in the
            work of Parham~\cite{parham_photographic_2015}.

        \subsubsection{Matching within each \occurrence{}} %
            Animals often have multiple redundant views within an \occurrence{}, each of which can be the same,
            better, or complementary to other views. The images in~\cref{fig:OccurrenceComplementFigure} illustrate
            redundant and complementary views of an individual in an \occurrence{}. Merging all of an individual's
            views is a challenge, but also potentially an advantage as we can exploit redundancy to better handle
            missing features, subtle viewpoint changes, and occlusions.

            We exploit this redundancy to gain the benefit of complementary views by matching all \annots{} within an
            \occurrence{} in a process called \glossterm{\intraoccurrence{} matching}. In the \GZC{}, each \annot{} was
            queried against all other \annots{} in its \occurrence{}, returning a ranked list of candidate matches. The
            person running the software made the final decisions about which \annots{} match. Details about the ranking
            algorithm are given in~\cref{chap:ranking}.

            The result of \intraoccurrence{} matching is a set of \glossterm{\encounters{}}. \Aan{\encounter{}} is a
            group of \annots{} that were matched within an \occurrence{}. Each \encounter{} is either (1) the first
            sighting an individual or a (2) resighting. The task now becomes to determine which of these is the case by
            identifying each \encounter{} against a \masterdatabase{}.

            \OccurrenceComplementFigure{}
 
        \subsubsection{Matching against the \masterdatabase{}} %
            To determine if \aan{\encounter{}} is a new sighting or a resighting of an individual, it is matched
            against the \masterdatabase{} in a process called \glossterm{\vsexemplar{} matching}. Before matching
            begins the \masterdatabase{} is prepared for search. For each \name{} in the \masterdatabase{} a subset
            of \glossterm{\exemplar{}} \annots{} is chosen to represent the appearance of that individual. The
            \exemplars{} are indexed using a search data structure.

            After the \masterdatabase{} has been prepared, the ranking algorithm is able to issue a subset of the
              \encounter{}'s \annots{} as a query.
            The result is a ranked list of \exemplars{} that are visually similar to the \encounter{}.
            The top \exemplars{} in the ranked list are used as candidate matches.
            Then, the candidate matches are reviewed, and the \encounter{} is either merged into an existing
              \mastername{} or added to the \masterdatabase{} as a new \mastername{}.

        \subsubsection{Consistency checks}
            When merging \encounters{} into the \masterdatabase{} it is possible that mistakes were made.  Two
            error cases commonly occur.
            %%%
            \begin{enumln}
            \item  A \glossterm{split case} occurs when a set of \annots{} from two or more different animals is
            incorrectly labeled with the same \name{}.  The main cause of this error is when distracting features are
            matched causing the \annots{} to appear visually similar.
            %%/
            \item A \glossterm{merge case} occurs when two sets of \annots{} from the same animal are incorrectly
            labeled with different \names{}.  This is caused by an algorithm or human error where a query \encounter{}
            was not correctly matched to the database \exemplars{}.
            \end{enumln}
            %%%
            These errors usually occur because the query and database \annots{} have a low degree of \emph{comparability} (\eg{}
            differences in viewpoint or low quality).  Of course, if no visual overlap exists between the two sets ---
            such as one set exclusively from the left side and another exclusively from the right --- nothing can be
            done.  This is why the animal must be seen from a predetermined view in order to be counted.  In the \GZC{}
            this is the left side.

            In the \GZC{} suspect individuals were flagged for split checks using various heuristics such as the
            number of \annots{} in the \name{} or the apparent speed of the animal's movement as GPS and time data.
            To check a flagged individual we used the ranking algorithm to search for pairs of \annots{} with low
            matching scores that belong to the flagged \name{}. Low similarity between two \annots{} within a
            \name{} suggested that an error had occurred. These low scoring results were then manually reviewed.
            When breaking apart split cases, care was taken to account for the fact that right and left images
            should not match. Likewise, care was taken to ensure that an intermediate \annot{} linking two disjoint
            \annots{} has enough information to establish the link. 

            Merge checks issue all \exemplars{} as queries against all other \exemplars{}.
            High similarity between two different \names{} suggested that a match was missed.
            These high scoring results were manually reviewed.
            More sophisticated error detection and recovery will be discussed in \Cref{sec:incon}.

        \subsubsection{Population estimation}
            The final step for the \GZC{} workflow was to estimate the number of animals in the park.
            Using the identification algorithm we defined which \annots{} were sightings and which were
              resightings.
            Because we were using a preliminary version of the system we were conservative in defining when an
              animal was sighted by only using the left and frontleft \annots{} with quality labels of ok, good, or
              excellent.
            Each individual that met these criteria was counted as a sighting.
            If a sighted individual had an \annot{} from both days, then we counted that individual as resighted.

    \subsection{Processing challenges}
        Our experience with the Great Zebra Count has highlighted a number of challenges that must be addressed if this
        system is to be applied in future events. These challenges include the number of manual reviews required, the
        detection of and recovery from manual errors, and the overall lack of a systematic identification framework.

        Perhaps the greatest challenge faced during the \GZC{} was the considerable amount of time that was
          required to manually verify identification results.
        It can take several seconds to manually verify if a pair of \annots{} is a correct match even if the
          results are presented in a ranked list.
        This task is illustrated in~\cref{fig:RankFigure}.
        Requiring the manual verification of each result is untenable for a system that accepts thousands of new
          images a day.
        The lack of a systematic approach for identification meant that whenever two \annots{} were matched, the
          name labels of all annotations of those names were changed.
        This made it difficult to tease apart errors when they occurred.
        Furthermore, manual errors (likely caused by fatigue from the large number of manual reviews) resulted in
          numerous identification errors cases that were not able to be detected and resolved until the end of the
          process.
        Reviews of results were also done in order of matching scores regardless of previous decisions, causing
          the manual reviewer to inefficiently review redundant results between the same individual.
        Additionally, no stopping criterion for reviews was defined resulting in an ad hoc approach to
          determining when all matches were found.

        Motivated by these observations we seek to develop a semi-automatic approach to animal identification.
        This approach will should be governed by a system that reduces the number of manual reviews and is able
          to detect and recover from errors, and determine when to stop searching for new matches.

        %Furthermore, as new \exemplars{} are added to the system the search
        %  data structure must be updated before additional queries can be made.
        %Rebuilding this data structure is another source of delays.
        %We consider addressing this problem as two separate challenges.
        %The first challenge is algorithmic, and the second challenge is system
        %  based.
        %We will use these challenges to motivate the development a system that
        %is able to dynamically detect and identify individual animals in large
        %volumes of images.
        %The algorithmic challenge is to develop a confidence-based decision
        %  mechanism.

        %We will use these challenges to motivate a verification mechanism that
        %automatically accepts or dismisses candidate matches. 
        %Only a subset of the most difficult identification results should be
        %  manually reviewed, the rest should be handled automatically.
        %This motivates developing a 

        %On the system side, the challenge is to dynamically update the search
        %  data structure.
        %This involves intelligent bookkeeping because the image analysis
        %  system is designed as a stateless API{}.
        %Statelessness is essential if multiple users are to access the same
        %  instance of image analysis and makes the system compatible with web
        %  technologies.
        %A stateless API is allowed to cache results, but it cannot maintain a
        %  single canonical object such as an indexer.
        %Instead the API{} works by accepting and responding to requests.
        %This has the effect of enforcing that objects are immutable, but also
        %  eliminates bugs due to race conditions, gives the program a large
        %  degree of thread safety, and encourages extensible and testable coding
        %  practices.
        %Updating search structures dynamically is a challenging problem in a
        %  stateless framework, but it can be addressed with careful system
        %  design.
              
        \RankFigure{}

\section{APPROACH}
    The problem addressed in this \thesis{} is to identify individual animals ``in the wild'' and to count the
      individuals in a population.
    We are given a set of images containing \annots{} of the same species.
    The images are collected in an uncontrolled environment and likely contain imaging challenges such as
      occlusion, distracting features, viewpoint variations, pose variations, and quality variations.
    Furthermore, the images may be collected either over many years or over just a few days as in the \GZC{}.
    Each \annot{} is labeled with time, GPS, quality, and viewpoint.
    We may also be given an initial partial \name{} labeling of the annotations --- \eg{} in the case where we
      identify a new set of annotations against a previously identified set --- but this need not be the case.
    We want to label each \annot{} with a \glossterm{\name{}} that uniquely identifies the individual.
    In other words, our task is to label all \annots{} from the same individual with the same \name{} and give
      \annots{} from different individuals different \names{}.
    After this is complete, the resulting database will contain the information needed to estimate the size of
      the population using techniques from sight-resight statistics.

    The first step of the identification process is a ranking algorithm. The inputs to the algorithm are a single query
    \annot{} and a set of database \annots{}. Sparse patch-based features are localized in all \annots{}, and a
    descriptor vector is extracted for each feature. The descriptors of the database \annots{} are indexed for fast
    nearest neighbor search. We then find a set of matches in the database for each descriptor in the query \annot{}.
    The matches are scored based on visual similarity, distinctiveness within the database, and likelihood of belonging
    to the foreground. Matches are combined across multiple \exemplar{} \annots{} to produce a matching score for each
    \name{} in the database, resulting in a ranked list of results for each query.

    We then extend the ranking algorithm by developing a classifier able to automatically review its results.
    First, we construct a pairwise feature that captures relationships between two annotations using local
      feature correspondence and global properties such as time and GPS.
    Then, we learn a classifier to predict if a pair of annotations --- \ie{} a result in the ranked list --- is
      correct or incorrect.

    In the final part of our approach, we place the problem of animal identification in a graph framework able to
    systematically guide the identification process. This is done by placing each annotation in a graph as a vertex and
    placing labeled edges between annotations to represent how they are related. Using the graph framework we will be
    able to detect and recover from errors by taking advantage of multiple images seen of each individual.

    We evaluate the ranking, verification, and graph identification algorithm by performing experiments on two
      main databases of plains zebras and Grévy's zebras.
    Some additional experiments are also performed on databases of Masai giraffes and humpback whales.
    First, the ranking experiments test the algorithm's ability to find potential matches of an individual animal
      over large periods of time, different viewpoints, different sized databases, and different numbers of
      \exemplars{}.
    Then, the verification experiments will test the extent to which the correct results from the ranking
      algorithm can be separated from the incorrect results using our learned classifier.
    Finally, the graph identification experiments will demonstrate the algorithm's ability to reduce the number
      of required manual reviews and recover from errors.
    We determine the configuration of each algorithm that works best for identifying each species.
    
    %To do this we
    %develop both a suite of algorithms and a software system. The algorithms
    %will allow us to infer properties about images and \annots{}. The system
    %will allow us to maintain the images, \annots{}, algorithms, and inferred
    %properties in a controlled and reproducible manner.
    
    %We build a workflow on top of the matching algorithm.
    %This workflow accepts new \annots{} in groups defined by \occurrences{}.
    %The matching algorithm groups \annots{} within the \occurrence{}, and
    %  then leverages redundant and multiple viewpoints to perform identification
    %  against the database.
    %As the database grows we handle multiple views of each \exemplar{} by
    %  maintaining a set of \exemplars{} for each \name{}.
    %We develop methods for recovering from any errors in identification when
    %  multiple individuals are grouped into the same \exemplar{} as well as when
    %  multiple \exemplars{} actually represent the same individual.

    %To address the challenges introduced by this workflow we extend the core
    %  matching algorithm using a probabilistic graph-based inference algorithm.
    %We will learn the probability of matching given two \annots{} as well as a
    %  confidence in that estimate.
    %We will use this information build a weighted graph of potential matches.
    %To perform inference on this graph we propose to develop a decision
    %  mechanism that will make probabilistic decisions about \intraoccurrence{}
    %  matching, \vsexemplar{} matching, and consistency checks.

    %To support continuous and dynamic use of the system we develop a caching
    %scheme that supports seamless invalidation of outdated data, computes
    %requested data on the fly, and disallows duplicate data. We use this scheme
    %to dynamically update the underlying data structures as more data is added
    %to the system. This is all accomplished in a stateless framework which
    %allows for the image analysis software to be used concurrently by web-based
    %frameworks.

\section{ORGANIZATION} %
    This \thesis{} is organized as follows:
    %
    \Cref{chap:relatedwork} describes related work.
    The focus is on the details of techniques used in the system, while an overview is given for those which are
      indirectly related.
    %
    \Cref{chap:ranking} describes the ranking algorithm for identifying individual animals, one \annot{} at a
      time, against a database of \exemplars{}.
    This chapter includes an experimental evaluation of the ranking algorithm.
    This is the algorithm that was used in the \GZC{}.
    \Cref{chap:pairclf} addresses the problem of semi-automatic verification of results from the ranking
      algorithm.
    %
    \Cref{chap:graphid} combines the ranking and verification algorithm into a semi-automatic framework that
      detects and corrects errors while reducing the number of manual reviews.
    %
    \Cref{chap:conclusion} concludes this \thesis{} and summarizes its contributions.