\begin{comment} ./texfix.py --outline --fpaths chapter1-intro.tex fixtex --fpaths chapter1-intro.tex --outline --asmarkdown --numlines=999 --shortcite -w && ./checklang.py outline_chapter1-intro.md ./texfix.py --grep "\\\\[A-Za-z]*[^{a-zA-Z]" ./texfix.py --reformat --fpaths figdef1.tex \end{comment} \chapter{INTRODUCTION}\label{chap:intro} \section{IMAGE-BASED IDENTIFICATION APPLIED TO POPULATION ECOLOGY} Population ecology relies on estimating the number of individual animals that inhabit an area~\cite{krebs_ecological_1999}. Estimating a population size is done in two phases: data collection and analysis. Data are collected as sets of \glossterm{sighting} and \glossterm{resighting} observations. A sighting is the first observation of an individual, and a resighting is a subsequent observation of a previously sighted individual. The observed data are then analyzed using software such as ``program MARK''~\cite{white_program_1999, schwarz_jolly_seber_2006} or Wildbook that applies statistical models such as the Lincoln-Petersen index~\cite{seber_estimation_1982}, Jolly-Seber model~\cite{jolly_explicit_1965, seber_note_1965}, or other related models~\cite{cormack_estimates_1964, chao_estimating_1987,kenneth._h._pollock_statistical_1990}. For an ecologist recording that an individual has been observed is simple, but determining if that observation is a sighting or a resighting can be challenging. This requires the ecologist to identify the individual by comparing against all other observations in the data set. Current methods to estimate a population size are limited by the data collection phase~\cite{sundaresan_network_2007, rubenstein_ecology_2010}. The statistical population models require an observation sample size that grows with the size of the population being studied~\cite{seber_estimation_1982}. As the number of observations increases so does the difficulty of determining identity. Thus, the scope of a population study is limited by the number of raw observations that can be made, and by the rate of determining the individual identity within a set of observations. Overcoming these limitations is of particular importance to wildlife preservation because population statistics are necessary to guide conservation decisions~\cite{rubenstein_behavioral_1998}. Consider images as a source of sight-resight observations. There are numerous advantages. Many observations can be made rapidly and simultaneously, due to the simplicity and availability of cameras. Recording an observation is as cheap and simple as taking a picture. Camera traps can be employed for autonomous data collection. In a wildlife conservancy or national park, observations can be crowd-sourced by gathering images from safari tourists and citizen scientists. Images can be accumulated and stored in a large dynamic dataset of observations that could grow by thousands of images each day. However, the challenge of identifying the individuals in the images remains. Manual methods are infeasible due to the rapid rate at which images can be collected. Therefore, we must turn towards computer vision based methods. This \thesis{} develops the foundation of the image analysis component of the ``Image Based Ecological Information System'' (IBEIS). The purpose of this system is to gain ecological insight from images using computer vision. We focus on estimating the size of a population of animals as just one example of ecological insight that might be gained from images. Thus, we come to the core problem addressed in this \thesis{}: image-based identification of individual animals. \section{CHALLENGES OF ANIMAL IDENTIFICATION}\label{sec:challenges} In animal identification we are given a database of images. This database may initially be empty. Each image is cropped to a bounding box around an animal of interest and labeled with that animal's identity. For a new query image, the goal is to determine if any other images of the individual are in the database. If the query is matched, it is added to the database as a resighting of that individual. If the query is not matched, then it is added as a new individual. In this work we focus on identifying individuals of species with distinguishing textures. Examples include zebras, giraffes, humpback whales, lionfish, nautiluses, hyenas, whale sharks, wildebeest, wild dogs, jaguars, cheetahs, leopards, frogs, toads, snails, and seals. The primary species that we will consider in this \thesis{} are plains and Grévy's zebras, but we will maintain a secondary focus on Masai giraffes and humpback whales. The difficulty of animal identification depends on the distinctiveness of the visual patterns that distinguish an individual from others of its species. In addition, the images we identify are collected ``in the wild'' and therefore contain occlusion, distracting features, variations in viewpoint and image quality. This section will present several examples to illustrate the challenges faced in animal identification. The discussion will begin with the challenges posed by the three primary species. Then problems common to all species will be described. These will be illustrated using plains zebras because they are the most challenging species considered in this \thesis{}. \subsection{Distinguishing textures of each species} The plains zebra --- shown in~\cref{fig:PlainsFigure} --- is challenging to visually identify because individuals have relatively few distinguishing texture features. For most plains zebras, the majority of distinctive information lies in a small area on the front shoulder. \Cref{fig:HardCaseFigure} illustrates that the patterns that distinguish two individuals can be subtle, even when the features are clearly visible. The matching difficulty greatly increases when features are partially occluded, the viewpoint changes, or the image quality is poor. In contrast, Masai giraffes and Grévy's zebras, shown in~\cref{fig:GirMasaiFigure} and~\cref{fig:GrevysFigure} respectively, have an abundance of distinctive features. Distinctive textures that are unique to each individual are spread across the entire body of a Masai giraffe. For a Grévy's zebra there is a high density of distinguishing information above both front and back legs, as well as a moderate density of distinctive textures along the side of the body. The high density of distinctive textures in Masai giraffes and Grévy's zebras increases the likelihood that the same distinctive features can be seen from different viewpoints. Even so, the problem is still difficult due to ``in the wild'' conditions such animal pose, occlusion, and image quality. There are some species, like Humpback whales, where some individuals may contain distinguishing textures while others may lack them entirely. This means that only a subset of humpback whales will be able to be identified with the texture based techniques that we will consider in this thesis. However, other cues --- like the shape of the notches along the trailing edge of the fluke --- can be used to distinguish between different individuals. %The work of Hendrick Weideman~\cite{hendrick} addresses identifying humpback whales using shape features. The work of Weideman and Jablons~\cite{jablons_identifying_2016} addresses identifying humpback whales using trailing edge shape features. The example in~\cref{fig:HumpbackFig} illustrates individual humpback whales with and without distinctive textures. \PlainsFigure{} \HardCaseFigure{} \GirMasaiFigure{} \GrevysFigure{} \HumpbackFig{} \FloatBarrier{} \subsection{Viewpoint and pose} One of the most difficult challenges faced in the animal identification problem is viewpoint. Animals are seen in a variety of poses and viewpoints, which can cause distinctive features to appear distorted. The patterns on the left and right sides of animals are almost always asymmetric. Therefore, matches can only be established using overlapping viewpoints and only if the viewpoints are distinctive. Some viewpoints, such as the backs of plains zebras, lack distinguishing information as shown in~\cref{fig:BacksFigure}. The effect of pose and viewpoint variation can be seen in~\cref{fig:ThreeSixtyFigure} and~\cref{fig:PoseFigure}. \BacksFigure{} \ThreeSixtyFigure{} \PoseFigure{} \FloatBarrier{} \subsection{Occluders and distractors} Because images of animals are often taken ``in the wild'', other objects in the image can act as \glossterm{occluders} or \glossterm{distractors}. Objects such as grass, bushes, trees or other animals, can act as occluders by partially obscuring the features that distinguish one individual from another. The appearance of the other animals nearby can be distracting because features from these animals will match different animals in the database. These \glossterm{distractors} may also be from non-animal features when multiple pictures are taken against the same background as animals move through the same field of view. Several examples of occlusions and distractors are illustrated in~\cref{fig:OccludeFigure}. \OccludeFigure{} \FloatBarrier{} \subsection{Image quality} Image quality is influenced by lighting, shadows, the camera used, image resolution, and the size of the animal in the image. Outdoor images will naturally have large variations in illumination. Different cameras can produce visual differences between images of an object. Images taken out of focus, from far away, or with a non-steady camera can cause animals to appear blurred. The effects of outdoor shadow and illumination are illustrated in~\cref{fig:IlluminationFigure}. \Cref{fig:QualityFigure} illustrates five categories of image quality that will be described later in~\cref{sub:viewqual}. \IlluminationFigure{} \QualityFigure{} \FloatBarrier{} \subsection{Aging and injuries} The appearance of an individual changes over time due to aging and other factors including injuries. An example of the difference between a juvenile and adult zebra is shown in~\cref{fig:AgeFigure}. An example of how injuries can both remove distinctive features and add new ones is shown in~\cref{fig:GashFigure}. \AgeFigure{} \GashFigure{} \FloatBarrier{} \section{THE GREAT ZEBRA COUNT}\label{sec:introgzc} To further illustrate the problems addressed in this \thesis{}, we consider the ``Great Zebra Count'' (\GZC{}), held at Nairobi National Park on March 1\st{} and 2\nd{}, $2015$~\cite{rubenstein_great_2015}. This event was designed with two purposes in mind: (1) to involve citizens in the scientific data collection effort, thereby increasing their interest in conservation, and (2) to determine the number of plains zebras and Masai giraffes in the park. \subsection{Data collection} Volunteer participants --- each with his or her own camera --- arrived by car at the park. Some cars had more than one photographer. Each car was assigned a route to drive through the park. We attached a GPS dongle to each car to record time and location throughout the drive. Correlating this with the time stamp on each image (after adding a correction offset for each camera) allowed us to determine the geolocation of each image. Each photographer was given instructions guiding them toward taking quality images of the left sides of the animals they saw. When the cars returned --- some after just an hour or two, others after the whole day --- the images were copied from the cameras, a small sample of each photographer's images was immediately processed to illustrate what we would do with the data, and the entire set of images was stored for further processing. The result of this crowd-sourced collection event was a $\SI{48}{\giga\byte}$ dataset consisting of $9406$ images. \subsection{Data processing}\label{subsec:introdataprocess} After the event, the entire collection of images was processed using a preliminary version of the system in order to generate the final count. The preliminary system followed the workflow of: % \begin{enumin} %\item ingest images % \item \occurrence{} grouping, % \item animal detection, % \item viewpoint and quality labeling, % \item \intraoccurrence{} matching, % \item \vsexemplar{} identification, % \item consistency checks, and % \item population estimation. % \end{enumin} %\Cref{chap:application} discusses this workflow %in greater detail. Here, we provide a brief overview of each step involved in the processing of the \GZC{} image data, and then we will describe the challenges that arose. \subsubsection{Occurrence grouping} The images were first divided into \glossterm{\occurrences{}} --- a standard term defined by the Darwin Core~\cite{wieczorek_darwin_2012} to denote a collection of evidence (\eg{} images) that an organism exists within defined location and time-frame. In the scope of this application, an \occurrence{} is a cluster of images taken within a small window of time and space. Images are grouped into \occurrences{} using the GPS and time data. Details are provided in~\cref{app:occurgroup}. These computed occurrences are valuable measurements for multiple components of the IBEIS software. At its core an occurrence describes \wquest{when} a group of animals was seen and \wquest{where} that group was seen. From this starting point other algorithms can address questions like: \wquest{how many} animals there were, \wquest{who} an animal is, \wquest{who else} is an animal with, and \wquest{where else} have these animals been seen? Furthermore, there are computational and algorithmic benefits to first grouping images into an \occurrence{}. One benefit is that an \occurrence{} can be used as a semantic processing unit to distribute manageable chunks of work to users of the system. Another is that \occurrences{} can be used to improve the results of identification. Typically, there will be only a few individuals within an \occurrence{}, and it is not uncommon for each individual to photographed multiple times and from multiple viewpoints. This redundancy in images will be exploited in \Cref{chap:graphid}. \subsubsection{Animal detection} Before matching begins each image is cropped to focus on a particular animal and remove background distractors. A detection algorithm localizes animals within the images. Each verified detection generates an \glossterm{\annot{}} --- a bounding box around a single animal in an image. An example illustrating detection of plains zebras is shown in~\cref{fig:DetectFigure}. In the \GZC{} each detection was manually verified before becoming an \annot{}, but recent work introduces an automatic verification mechanism and reduces the need for complete manual review. The details of the detection algorithm are beyond the scope of this \thesis{}, and are described in the work of Parham~\cite{parham_photographic_2015,parham_detecting_2016}. \DetectFigure{} \subsubsection{Viewpoint and quality labeling}\label{sub:viewqual} When determining the number of animals in a population it is important to account for factors that can lead to over-counting. If two \annots{} of the same individual are not matched, then that individual will be counted twice. This could happen due to factors such as viewpoint and quality. For example, one \annot{} showing only the left side of an animal and another \annot{} showing only the right side the same animal cannot be matched because they are \glossterm{incomparable}. The two \annots{} are comparable when they share regions with distinguishing patterns that can be put in correspondence. Viewpoint is the primary reason that two \annots{} are not comparable. However, other factors like image quality and heavy occlusion can corrupt distinguishing patterns rendering the \annot{} unidentifiable --- not comparable with any other \annot{}. We must define what it means for two \annots{} to be comparable before we can estimate a population size. Determining if an individual can be identified is analogous to the notion of a marked-individual~\cite{seber_estimation_1982}. For an \annot{} to be identifiable the patterns that can distinguish it from the rest of the population must be clear and visible, otherwise the \annot{} may not be able to find or be compared to potential matches. This means an \annot{} is only identifiable if \begin{enumin} \item the image quality is high enough, and % \item it has a viewpoint that is comparable to all potential matches. % \end{enumin} To address this challenge we label each \annot{} with $5$ discrete quality labels and $8$ discrete viewpoint labels. The quality labels we define are: \qualJunk{}, \qualPoor{}, \qualOk{}, \qualGood{}, and \qualExcellent{}. The \qualJunk{} label is given to \annots{} that almost certainly will not be able to be identified, and \qualPoor{} labels are given to \annots{} that will likely be unidentifiable for a computer vision algorithm. The $\qualGood{}$ and \qualExcellent{} labels are given to clear, well illuminated \annots{} with little to no occlusion with \qualExcellent{} being reserved for the best of the best. All other \annots{} are labeled as $\qualOk$. The viewpoint labels we define are: \vpFront{}, \vpFrontLeft{}, \vpLeft{}, \vpBackLeft{}, \vpBack{}, \vpBackRight{}, \vpBack{}, and \vpFrontRight{}. Note, that additional viewpoint labels like $\vpUp{}$ and $\vpDown{}$ may be necessary for animals such as lionfish or turtles. However, the $8$ labels we use are sufficient for animals like zebras and giraffes because they are most commonly seen in upright positions. In an effort to ensure that all \annots{} used in the \GZC{} were comparable, we did not include any \annot{} that had junk or poor qualities. We also did not include \annots{} not labeled with a left or frontleft viewpoint to account for limitations in the initial ranking algorithm. All labelings of viewpoint and quality were generated manually. Since then, we have trained viewpoint and quality classifiers using this manual data. Automatic detection of quality and viewpoint is discussed in the work of Parham~\cite{parham_photographic_2015}. \subsubsection{Matching within each \occurrence{}} % Animals often have multiple redundant views within an \occurrence{}, each of which can be the same, better, or complementary to other views. The images in~\cref{fig:OccurrenceComplementFigure} illustrate redundant and complementary views of an individual in an \occurrence{}. Merging all of an individual's views is a challenge, but also potentially an advantage as we can exploit redundancy to better handle missing features, subtle viewpoint changes, and occlusions. We exploit this redundancy to gain the benefit of complementary views by matching all \annots{} within an \occurrence{} in a process called \glossterm{\intraoccurrence{} matching}. In the \GZC{}, each \annot{} was queried against all other \annots{} in its \occurrence{}, returning a ranked list of candidate matches. The person running the software made the final decisions about which \annots{} match. Details about the ranking algorithm are given in~\cref{chap:ranking}. The result of \intraoccurrence{} matching is a set of \glossterm{\encounters{}}. \Aan{\encounter{}} is a group of \annots{} that were matched within an \occurrence{}. Each \encounter{} is either (1) the first sighting an individual or a (2) resighting. The task now becomes to determine which of these is the case by identifying each \encounter{} against a \masterdatabase{}. \OccurrenceComplementFigure{} \subsubsection{Matching against the \masterdatabase{}} % To determine if \aan{\encounter{}} is a new sighting or a resighting of an individual, it is matched against the \masterdatabase{} in a process called \glossterm{\vsexemplar{} matching}. Before matching begins the \masterdatabase{} is prepared for search. For each \name{} in the \masterdatabase{} a subset of \glossterm{\exemplar{}} \annots{} is chosen to represent the appearance of that individual. The \exemplars{} are indexed using a search data structure. After the \masterdatabase{} has been prepared, the ranking algorithm is able to issue a subset of the \encounter{}'s \annots{} as a query. The result is a ranked list of \exemplars{} that are visually similar to the \encounter{}. The top \exemplars{} in the ranked list are used as candidate matches. Then, the candidate matches are reviewed, and the \encounter{} is either merged into an existing \mastername{} or added to the \masterdatabase{} as a new \mastername{}. \subsubsection{Consistency checks} When merging \encounters{} into the \masterdatabase{} it is possible that mistakes were made. Two error cases commonly occur. %%% \begin{enumln} \item A \glossterm{split case} occurs when a set of \annots{} from two or more different animals is incorrectly labeled with the same \name{}. The main cause of this error is when distracting features are matched causing the \annots{} to appear visually similar. %%/ \item A \glossterm{merge case} occurs when two sets of \annots{} from the same animal are incorrectly labeled with different \names{}. This is caused by an algorithm or human error where a query \encounter{} was not correctly matched to the database \exemplars{}. \end{enumln} %%% These errors usually occur because the query and database \annots{} have a low degree of \emph{comparability} (\eg{} differences in viewpoint or low quality). Of course, if no visual overlap exists between the two sets --- such as one set exclusively from the left side and another exclusively from the right --- nothing can be done. This is why the animal must be seen from a predetermined view in order to be counted. In the \GZC{} this is the left side. In the \GZC{} suspect individuals were flagged for split checks using various heuristics such as the number of \annots{} in the \name{} or the apparent speed of the animal's movement as GPS and time data. To check a flagged individual we used the ranking algorithm to search for pairs of \annots{} with low matching scores that belong to the flagged \name{}. Low similarity between two \annots{} within a \name{} suggested that an error had occurred. These low scoring results were then manually reviewed. When breaking apart split cases, care was taken to account for the fact that right and left images should not match. Likewise, care was taken to ensure that an intermediate \annot{} linking two disjoint \annots{} has enough information to establish the link. Merge checks issue all \exemplars{} as queries against all other \exemplars{}. High similarity between two different \names{} suggested that a match was missed. These high scoring results were manually reviewed. More sophisticated error detection and recovery will be discussed in \Cref{sec:incon}. \subsubsection{Population estimation} The final step for the \GZC{} workflow was to estimate the number of animals in the park. Using the identification algorithm we defined which \annots{} were sightings and which were resightings. Because we were using a preliminary version of the system we were conservative in defining when an animal was sighted by only using the left and frontleft \annots{} with quality labels of ok, good, or excellent. Each individual that met these criteria was counted as a sighting. If a sighted individual had an \annot{} from both days, then we counted that individual as resighted. \subsection{Processing challenges} Our experience with the Great Zebra Count has highlighted a number of challenges that must be addressed if this system is to be applied in future events. These challenges include the number of manual reviews required, the detection of and recovery from manual errors, and the overall lack of a systematic identification framework. Perhaps the greatest challenge faced during the \GZC{} was the considerable amount of time that was required to manually verify identification results. It can take several seconds to manually verify if a pair of \annots{} is a correct match even if the results are presented in a ranked list. This task is illustrated in~\cref{fig:RankFigure}. Requiring the manual verification of each result is untenable for a system that accepts thousands of new images a day. The lack of a systematic approach for identification meant that whenever two \annots{} were matched, the name labels of all annotations of those names were changed. This made it difficult to tease apart errors when they occurred. Furthermore, manual errors (likely caused by fatigue from the large number of manual reviews) resulted in numerous identification errors cases that were not able to be detected and resolved until the end of the process. Reviews of results were also done in order of matching scores regardless of previous decisions, causing the manual reviewer to inefficiently review redundant results between the same individual. Additionally, no stopping criterion for reviews was defined resulting in an ad hoc approach to determining when all matches were found. Motivated by these observations we seek to develop a semi-automatic approach to animal identification. This approach will should be governed by a system that reduces the number of manual reviews and is able to detect and recover from errors, and determine when to stop searching for new matches. %Furthermore, as new \exemplars{} are added to the system the search % data structure must be updated before additional queries can be made. %Rebuilding this data structure is another source of delays. %We consider addressing this problem as two separate challenges. %The first challenge is algorithmic, and the second challenge is system % based. %We will use these challenges to motivate the development a system that %is able to dynamically detect and identify individual animals in large %volumes of images. %The algorithmic challenge is to develop a confidence-based decision % mechanism. %We will use these challenges to motivate a verification mechanism that %automatically accepts or dismisses candidate matches. %Only a subset of the most difficult identification results should be % manually reviewed, the rest should be handled automatically. %This motivates developing a %On the system side, the challenge is to dynamically update the search % data structure. %This involves intelligent bookkeeping because the image analysis % system is designed as a stateless API{}. %Statelessness is essential if multiple users are to access the same % instance of image analysis and makes the system compatible with web % technologies. %A stateless API is allowed to cache results, but it cannot maintain a % single canonical object such as an indexer. %Instead the API{} works by accepting and responding to requests. %This has the effect of enforcing that objects are immutable, but also % eliminates bugs due to race conditions, gives the program a large % degree of thread safety, and encourages extensible and testable coding % practices. %Updating search structures dynamically is a challenging problem in a % stateless framework, but it can be addressed with careful system % design. \RankFigure{} \section{APPROACH} The problem addressed in this \thesis{} is to identify individual animals ``in the wild'' and to count the individuals in a population. We are given a set of images containing \annots{} of the same species. The images are collected in an uncontrolled environment and likely contain imaging challenges such as occlusion, distracting features, viewpoint variations, pose variations, and quality variations. Furthermore, the images may be collected either over many years or over just a few days as in the \GZC{}. Each \annot{} is labeled with time, GPS, quality, and viewpoint. We may also be given an initial partial \name{} labeling of the annotations --- \eg{} in the case where we identify a new set of annotations against a previously identified set --- but this need not be the case. We want to label each \annot{} with a \glossterm{\name{}} that uniquely identifies the individual. In other words, our task is to label all \annots{} from the same individual with the same \name{} and give \annots{} from different individuals different \names{}. After this is complete, the resulting database will contain the information needed to estimate the size of the population using techniques from sight-resight statistics. The first step of the identification process is a ranking algorithm. The inputs to the algorithm are a single query \annot{} and a set of database \annots{}. Sparse patch-based features are localized in all \annots{}, and a descriptor vector is extracted for each feature. The descriptors of the database \annots{} are indexed for fast nearest neighbor search. We then find a set of matches in the database for each descriptor in the query \annot{}. The matches are scored based on visual similarity, distinctiveness within the database, and likelihood of belonging to the foreground. Matches are combined across multiple \exemplar{} \annots{} to produce a matching score for each \name{} in the database, resulting in a ranked list of results for each query. We then extend the ranking algorithm by developing a classifier able to automatically review its results. First, we construct a pairwise feature that captures relationships between two annotations using local feature correspondence and global properties such as time and GPS. Then, we learn a classifier to predict if a pair of annotations --- \ie{} a result in the ranked list --- is correct or incorrect. In the final part of our approach, we place the problem of animal identification in a graph framework able to systematically guide the identification process. This is done by placing each annotation in a graph as a vertex and placing labeled edges between annotations to represent how they are related. Using the graph framework we will be able to detect and recover from errors by taking advantage of multiple images seen of each individual. We evaluate the ranking, verification, and graph identification algorithm by performing experiments on two main databases of plains zebras and Grévy's zebras. Some additional experiments are also performed on databases of Masai giraffes and humpback whales. First, the ranking experiments test the algorithm's ability to find potential matches of an individual animal over large periods of time, different viewpoints, different sized databases, and different numbers of \exemplars{}. Then, the verification experiments will test the extent to which the correct results from the ranking algorithm can be separated from the incorrect results using our learned classifier. Finally, the graph identification experiments will demonstrate the algorithm's ability to reduce the number of required manual reviews and recover from errors. We determine the configuration of each algorithm that works best for identifying each species. %To do this we %develop both a suite of algorithms and a software system. The algorithms %will allow us to infer properties about images and \annots{}. The system %will allow us to maintain the images, \annots{}, algorithms, and inferred %properties in a controlled and reproducible manner. %We build a workflow on top of the matching algorithm. %This workflow accepts new \annots{} in groups defined by \occurrences{}. %The matching algorithm groups \annots{} within the \occurrence{}, and % then leverages redundant and multiple viewpoints to perform identification % against the database. %As the database grows we handle multiple views of each \exemplar{} by % maintaining a set of \exemplars{} for each \name{}. %We develop methods for recovering from any errors in identification when % multiple individuals are grouped into the same \exemplar{} as well as when % multiple \exemplars{} actually represent the same individual. %To address the challenges introduced by this workflow we extend the core % matching algorithm using a probabilistic graph-based inference algorithm. %We will learn the probability of matching given two \annots{} as well as a % confidence in that estimate. %We will use this information build a weighted graph of potential matches. %To perform inference on this graph we propose to develop a decision % mechanism that will make probabilistic decisions about \intraoccurrence{} % matching, \vsexemplar{} matching, and consistency checks. %To support continuous and dynamic use of the system we develop a caching %scheme that supports seamless invalidation of outdated data, computes %requested data on the fly, and disallows duplicate data. We use this scheme %to dynamically update the underlying data structures as more data is added %to the system. This is all accomplished in a stateless framework which %allows for the image analysis software to be used concurrently by web-based %frameworks. \section{ORGANIZATION} % This \thesis{} is organized as follows: % \Cref{chap:relatedwork} describes related work. The focus is on the details of techniques used in the system, while an overview is given for those which are indirectly related. % \Cref{chap:ranking} describes the ranking algorithm for identifying individual animals, one \annot{} at a time, against a database of \exemplars{}. This chapter includes an experimental evaluation of the ranking algorithm. This is the algorithm that was used in the \GZC{}. \Cref{chap:pairclf} addresses the problem of semi-automatic verification of results from the ranking algorithm. % \Cref{chap:graphid} combines the ranking and verification algorithm into a semi-automatic framework that detects and corrects errors while reducing the number of manual reviews. % \Cref{chap:conclusion} concludes this \thesis{} and summarizes its contributions.