<a href="https://colab.research.google.com/github/Shea-Fyffe/transforming-personality-scales/blob/main/tutorials/few-shot-learning-with-transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

---
# Few-Shot Learing with Transformers (GPT-3)
---

This code is written in **Python** as an illustration of *few-shot* learning, which occurs when few labeled training examples are available (see [Ruder, 2017](https://ruder.io/transfer-learning/)). When taking a standard approach to text classification with few labeled examples, transformer architectures commonly used for text classification (e.g., *BERT*; [Devlin et al., 2019](https://arxiv.org/abs/1810.04805)) suffer inconsistent performance ([Zhang et al., 2021](http://arxiv.org/abs/2006.05987)). To overcome this researchers may choose to "freeze" encoder layers (e.g., [Chronopoulou et al., 2019](https://doi.org/10.18653/v1/N19-1213)); however, merely reframing the a classification task to better align with a transformer's source task seems to be a more viable option ([Brown et al., 2020](https://arxiv.org/abs/2005.14165)).

By reframing a classification task into a *language modeling* task, transformers seem to better cope with a small number of training examples (e.g., [Chronopoulou et al., 2019](https://doi.org/10.18653/v1/N19-1213); [Schick & Schütze, 2021](https://arxiv.org/abs/2009.07118)). In a language modeling task, a model is trained to predict the next word in a sequence of words; this task is somewhat universal when it comes to pretraining a transformer model, so much so that it allows large decoder models (e.g., *GPT-3*; [Brown et al., 2020](https://arxiv.org/abs/2005.14165)), which are most often used for language generation tasks, to perform text classification tasks. We demonstrate this approach by using GPT-3 to perform few-shot classification. We provide a baseline by comparing this approach to a standard approach to test classification.

<br>

#### Things to Remember before Beginning
- You will need to register for an API key on OpenAI's website [here](https://beta.openai.com/). There are also several open source versions available; however, they've yet to achieve GPT-3's level of performance.
- We recommend only using the method illustrated here when researchers have **fewer than 40 examples per label.** Given how GPT-3's *Completions API* works, this method can become quite expensive for those with little $$ to their names (e.g., poor graduate students like me). For those with more than 40 examples (or even 20) but less than 100, it may be worth it to use a *fine-tuning* approach with GPT-3. Read more about that approach [here](https://platform.openai.com/docs/guides/fine-tuning).

### Libraries

Colab comes with a large number of Python libraries pre-installed. However, `openai` and `transformers` are not libraries pre-installed libraries, however, these library can be installed by using the code below.

In [None]:
#@title RUN: Installing OpenAI and Transformer Libraries
%%capture
! pip install openai
! pip install transformers

In [None]:
#@title RUN: Loading Libraries
# GPT3 related libraries
import openai
from transformers import GPT2TokenizerFast

# Data management libraries
import numpy as np
import pandas as pd
from collections import defaultdict
from google.colab import drive # optional for getting data

# General utility libraries
import os
import sys
import time # for sleeping between requests
import requests # for downloading url
from io import StringIO # for downloading data from url
from typing import Dict, List, Union # for type hinting
from sklearn.metrics import classification_report # for model evaluation

#### Mounting Google Drive
It is often a good idea to allow Colab to mount (or connect) to your Google Drive. This allows you to easily save models or—as we demonstrate—import data. By default, Colab's working directory is `/content/`, we can place our Google Drive root directory within this folder. If you've changed your current working directory, you can use `os.getcwd()` to see your current directory

In [None]:
#@title RUN: Connect to Drive *Optional*
# Connect the current working directory to a user's Google Drive account
drive.mount(os.getcwd() + '/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Classes and Functions

Here we define a class and several class functions that will be used to train and extract classifications from an instance of `GPT-3`.

In [None]:
#@markdown ## RUN: Model Class
class FewShotGPT3:
    """
    A Few-shot learning class for the transformer GPT-3
    """
    def __init__(self, data, api_token: str, model: str = 'davinci', max_token_length = 2048, instruction_template = None, context_template_function = None):
        """
        Initializing few shot model.

        Arguments
        ---------
        data: a Pandas DataFrame with text and labels that will be used for training.
        api_key: a string representing API token from beta.openai.com
        model: a string of GPT-3 model to be used (e.g., ada, babbage, curie, davinci) ('davinci' by default).
        max_token_length: a number representing the max token length of the model (2048 by default).
        instruction_template_function: a string to be used as instruction in the prompt (if None by will auto generate based on labels).
        context_template_function: a function to be used to form few shot context string (None by default).


        """    
        def default_instruction_template_fn(x: List[str]) -> str:
            """
             Default instruction template function.
            """
            instruction = f"Please classify a piece of text into the following categories: {', '.join(x)}."
            return f"{instruction.strip()}\n\n"

        def default_context_template_fn(text, labels = None) -> str:
            """
            Default context template function. 
            
            Used for example query as well.
            """
            if labels is None:
                text = text.replace("\n", " ").strip()
                return f"Text: {text}\nCategory:"
            context = []
            for ti, li in zip(text, labels):
                context.append("Text: {text}\nCategory: {label}\n---\n".format(
                    text=ti.replace("\n", " ").strip(),
                    label=li.replace("\n", " ").strip(),
                ))
            return "".join(context)

        openai.api_key = os.environ["OPENAI_API_KEY"] = self.api_key = api_token
        self.model = model
        self.tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
        self.max_token_len = max_token_length
        
        self.raw_results = []
        self.raw_prediction_results = []

        data[data.columns[1]] = self.format_labels(data[data.columns[1]])
        self.training_data = data
        self.instruction_template = instruction_template if instruction_template is not None else default_instruction_template_fn(list(self.label_to_index.values()))
        self.context_template_fn = context_template_function if context_template_function is not None else default_context_template_fn


    def __str__(self, verbose = False):
        """
        Custom print method 
        """
        if verbose:
            return str(self.__class__) + '\n'+ '\n'.join(('{} = {}'.format(item, self.__dict__[item]) for item in self.__dict__))

        print_res = "GPT-3 architecture: %s\nTraining data imported: %s\n" % (self.model, hasattr(self, "training_data"))
        if hasattr(self, "training_data"):
            label_str = [k + "(" + v + ")" for k, v in self.label_to_index.items()]
            print_res = print_res + "Training data size: %d\nText col: %s\nLabel col: %s\nLabel Mapping:\n%s" % (len(self.training_data),
                                                                                                              self.training_data.columns[0], 
                                                                                                              self.training_data.columns[1],
                                                                                                              '\n'.join(label_str))
        return print_res
    
    def format_labels(self, labels: List[str], ignore_case: bool = True, sort: bool = True) -> List[str]:
        """
        Format and tokenize labels.

        This function will always map labels to a single number (e.g., 0, 1, 2, 3, and so forth) 
        
        Arguments
        ---------
        labels: a list of labels
        ignore_case: a boolean flag to treat labels as case-insensitive (optional, default: True)
        sort: a boolean specifying if labels should be sorted alphabetically before recoding (optional, default: True)

        Returns
        -------
        List[str]
            a list of labels mapped to integer strings.
        Dict[str:str]
            a map of original labels and new label.
        Dict[str:int]
            a map of new labels and their GPT-3 token id.
        """
        labels = [label.replace("\n", " ").strip() for label in labels]  
        if ignore_case:
            labels = [label.lower() for label in labels]  
        unique_labels = list(set(labels))
        if sort:
            unique_labels.sort()
        label_to_index = {k: str(i) for i, k in enumerate(unique_labels)}
        index_to_label = {str(i): k for i, k in enumerate(unique_labels)}
        token_to_index = {self.tokenizer.encode(" " + str(i))[0]: str(i) for i, _ in enumerate(unique_labels)}
        
        labels_out = []
        for j in labels:
            labels_out.append(label_to_index[j])
        self.label_to_index = label_to_index
        self.index_to_label = index_to_label
        self.token_to_index = token_to_index
        self.num_labels = len(label_to_index)
        
        return labels_out

    def build_prompt(self, query_doc: str, add__instructions: bool = True) -> str:
        """
        Internal method for building prompt.

        Arguments
        --------- 
        add__instructions: a boolean determining whether to add instructions to the prompt before prediction (True by default).

        """
        if not hasattr(self, "few_shot_data"):
           raise AttributeError("Few shot examples have not been selected. Use `select_few_shot()` before proceeding.")
        instructions = self.instruction_template if add__instructions else ""
        context = self.context_template_fn(self.few_shot_data[self.few_shot_data.columns[0]],
                                           self.few_shot_data[self.few_shot_data.columns[1]])
        query = self.context_template_fn(query_doc)
        prompt = instructions + context + query
        n_total_tokens = len(self.tokenizer.encode(prompt))
        if n_total_tokens > self.max_token_len:
            raise Exception(
                user_message=f"The prompt contains {n_total_tokens} tokens, which is above the {self.max_token_len} token limit."
                f"Please consider setting add_instructions = False, selecting fewer few shot examples (by using select_few_shot()),"
                "or changing the context_template_fn."
            )
        return prompt  


    def select_few_shot(self, few_shot_k: int = 1,  select_indices: List[int] = None, seed: int = 42, shuffle = True):
        """
        Method to select (i.e., subsample) training examples for few shot classification.

        Arguments
        ---------
        few_shot_k: a number of random examples per class to select for few shot learning (1 by default).
        select_indices: an optional list of numeric indices in training data for few shot examples to be used instead of random sampling from training data (None by default).
        seed: a number of random seed for sub-sampling a few examples (k) (42 by default).
        shuffle: a boolean to shuffle rows in data using random seed (True by default).
        """
        if not hasattr(self, "training_data"):
           raise AttributeError("Training data not yet loaded. Use `import_training_data()` before proceeding.")
        
        few_shot_data = self.training_data
        
        if select_indices is not None:
            # add 1 to index to make it more intuitive
            few_shot_data = few_shot_data.iloc[[i+1 for i in select_indices if select_indices <= len(self.training_data)]]
        else:
            few_shot_data = few_shot_data.groupby(few_shot_data.columns[1], group_keys=False).apply(lambda x: x.sample(n=int(few_shot_k), random_state = seed))
        
        if shuffle:
            few_shot_data = few_shot_data.sample(frac=1, random_state = seed).reset_index(drop=True)

        self.few_shot_data = few_shot_data
        return print('Few shot examples selected successfully.')


    def predict(self, test: List[str], request_delay: int = 1, **kwargs):
        """
        Method to predict labels of unlabelled text documents.
        
        Arguments
        ---------
        test: a list of text documents to be predicted (imported via `import_test_data()` by default)
        request_delay: a number determining the time (in seconds) to wait between calls to API (1 by default).
        kwargs: additional keyword arguments to pass to `openai.Completion.create()`

        Returns
        -------
        Dict
            a dictionary of raw prediction data.
        """
        
        # reinitialize results every prediction
        self.raw_results = []
        self.prediction_results = []

        default_kwargs = {
            "engine": self.model,
            "temperature": 0.0,
            "logprobs": self.num_labels,
            "max_tokens": 1,
        }

        completion_args = { **default_kwargs, **kwargs }

        for test_doc in test:
            time.sleep(request_delay)
            err_message = {}
            completion_args['prompt'] = self.build_prompt(test_doc)
            try:
                completion_resp  = openai.Completion.create(**completion_args)
            except TypeError as err:
                err_message = {
                    'plain language message':'A test document may be blank or a number.\n' +
                                             'See error message below...\n===\n',
                    'error message': err,
                    'error class': err.__class__,
                }
            except openai.error.InvalidRequestError as err:
                err_message = {
                    'plain language message':'API Request to OpenAI was invalid.\n' +
                                             'May result from issue with API key or model id.\n' +
                                             'See error message below...\n===\n',
                    'error message': err,
                    'error class': err.__class__,
                }
            except openai.error.RateLimitError as err:
                err_message = {
                    'plain language message':'API Requests are being made too fast!\n' +
                                            'Please increase the request_delay argument and try again.\n' +
                                            'See error message below...\n===\n',
                    'error message': err,
                    'error class': err.__class__,
                }
            except Exception as err:
                err_message = {
                    'plain language message':'Some other error occurred.\n' +
                                             'See error message below...\n===\n',
                    'error message': err,
                    'error class': err.__class__,
                }
            if err_message:
                for key,value in err_message.items():
                    print(key, ":", value)
                print("Attempting to return partial data...")
                return self.raw_results
            
            self.raw_results.append(completion_resp)
            preds = self.extract_prediction(completion_resp)
            preds['text'] = test_doc
            self.prediction_results.append(preds)
        return print("Predictions complete! Please see <model>.prediction_results")


    def extract_prediction(self, x):
        """
        Internal method for extracting predictionss from GPT-3 Completion API
        """ 
        xi = x['choices'][0]['logprobs']['top_logprobs'][0]
        token_probs = defaultdict(float, {k: float() for k in self.token_to_index.keys()})
        for token, logp in sorted(xi.items()):
            token_probs[self.tokenizer.encode(token)[0]] += np.exp(logp)
        label_probs = {
            self.index_to_label[self.token_to_index[token]]: prob for token, prob in token_probs.items()
            if token in self.token_to_index.keys()
        }
        # Fill in the probability for the special 'unknown_label_p' label--which are predictions that weren't specified 
        label_probs['unknown_label_p'] = 0.0
        if sum(label_probs.values()) < 1.0:
            label_probs['unknown_label_p'] = 1.0 - sum(label_probs.values())
        label_probs["predicted_label"] = max(label_probs, key=label_probs.get)
        return label_probs


    def output_predictions(self, output_file: bool = False, output_file_name: str = "prediction-results.csv"):
        """
        Internal method to output test predictions to a csv file.
        
        Arguments
        ---------
        output_file: a boolean specifying results should be written to a .csv file (False by default).
        output_file_name: a string of a csv file path to write predictions to ('prediction-results.csv' by default).
        
        Returns
        -------
        pd.DataFrame
            A dataframe of prediction data.
        """
        if not hasattr(self, "prediction_results"):
           raise AttributeError("Predictions not yet created. Use `predict()` before proceeding.")
        out_data = pd.DataFrame(self.prediction_results)
        col = out_data.pop('text')
        out_data.insert(0, col.name, col)
        if output_file:
            out_data.to_csv(output_file_name, index=False)
            return print(f"file output to: {output_file_name}")
        return out_data

In [None]:
#@markdown ## RUN: Helper Functions

# import data
def import_data(csv_path: str, text_col: str = "text", label_col: str = "label",  enc: str = 'latin1', sort_labels: bool = True):
    """
    Function to import a csv of text documents with labels for few shot training.

    Arguments
    ---------
    csv_path: a string identifying the csv file path (or url).
    text_col: a string of the column name in csv containing text documents ('text' by default).
    label_col: a string of the column name containing labels ('label' by default).
    enc: File encoding to be used  ('latin1' by default).
    shuffle: shuffle rows in data (True by default).
    seed: Random seed for shuffling data (42 by default).
    
    Returns
    -------
    pd.DataFrame
        A dataframe of text and labels (unless label_col is None).
    """
    if (csv_path.startswith("http")):
        res = requests.get(csv_path,
                           headers= {'User-Agent': 'Mozilla/5.0',
                                     "X-Requested-With": "XMLHttpRequest"})
        csv_path = StringIO(res.text)
    df = pd.read_csv(csv_path, encoding = enc)
    subset_cols = [text_col]
    if label_col is not None:
         subset_cols.append(label_col)
    return df[subset_cols]

# get api key
def get_api_key(file_path: str = "/content/drive/MyDrive/Colab Notebooks/docs/gpt-3-api-key.txt"):
    """
    Helper function to retreive OpenAI api key from txt file.

    Arguments
    ---------
    file_path: a string identifying txt file storing an OPENAI API key.

    """
    if os.path.exists(file_path):
        f = open(file_path)
        return f.readline().strip()
    return ""

# Compute evaluation metrics
def evaluate_model(actual: List[str], predicted: List[str], label_values = None, **kwargs):
    """
    Calculate evaluation metrics on test data (given labels are available).

    A helper function that returns model evaluation metrics. 
    
    Arguments
    ---------
    actual: list of actual labels.
    predicted: list of predicted labels.
    label_values: a list of *unique ordered* labels (derives from actual labels by default).
    kwargs: additional keyword arguments to pass to ``sklearn.metrics.classification_report()``.

    Returns
    -------
    dict
      summary of the precision, recall, F1 score for each class
    """
    if label_values is not None:
        kwargs.update({'target_names': label_values})
    else:
        kwargs.update({'target_names': list(set(actual))})
        
    res = classification_report(y_true = actual, y_pred = predicted, output_dict = True, **kwargs)

    class_level = {k: res.get(k, None) for k in res.keys() if k in kwargs['target_names']}
    overall = {k: res.get(k, None)for k in res.keys() if k not in kwargs['target_names']}
    return {'overall' : pd.DataFrame(overall), 'by_label': pd.DataFrame(class_level)}

---
## Defining Parameters
---

In [None]:
#@markdown ## RUN: Entering API Key
# this can be stored as an environmental variable (ideal when using a local machine)
# openai.api_key = os.getenv("OPENAI_API_KEY")
API_KEY = get_api_key()

In [None]:
#@markdown ## RUN: Defining Number of Few Shot Examples
# here, we define a global variable for the number of examples (per label) to use during training
FEWSHOTK = 5

---
## Importing and Formatting Data
---
There are several ways to import training data (see our [tutorial]()). Importantly, the training data should be a `csv` (or url to a csv), and can be imported using the `import_data()` function.

By default, the `import_data` function assumes that the text is found in a column labeled `text` and the labels are found in the `label` column. However, this can be modified by changing the `text_col` and `label_col` arguments when calling the function.
<br>

#### Importing Training Data from Online Repository

While there are several ways to import data into Colab ([see here](https://colab.research.google.com/notebooks/io.ipynb)), the most intuitive way is to use the project's code repository url:

```python
# Assign the online data repository to a url so it does not have to be repeated later
repository_data_url = "https://raw.githubusercontent.com/Shea-Fyffe/transforming-personality-scales/main/data/text-classification/"

training_data = import_data(repository_data_url + "train-data.csv", text_col = 'docs', label_col = 'labels')
```
<br>

#### Importing Training Data from Local Repository

You can also upload a local `.csv` file. You can do this by:
- Visiting the project url above and clicking the `download file` button (top right in project repository)
- Clicking the ***Files*** pane in Colab (the folder icon on the left in Colab)
- Clicking the ***Upload to session storage*** icon (left-most icon in Colab)
- Selecting the local data file you would like to use (e.g., `.csv`,`.tsv`)

We demonstrate examples below.
<br>

#### Examples: Importing Data
```python
# If your csv file (e.g., train-data.csv) contains text data in the column 'docs' and labels in the column 'labels'
>>> training_data = import_data("train-data.csv", text_col = 'docs', label_col = 'labels')

# If your csv file (e.g., my-data.csv) contains text data in the column 'text_examples' and labels in the column 'classes'
>>> training_data = import_data("my-data.csv", text_col = 'text_examples', label_col = 'classes')
```

In [None]:
#@markdown #RUN: Import Training Data
# Assign the online data repository to a url so it doesn't have to be repeated laterr
repository_data_url = 'https://raw.githubusercontent.com/Shea-Fyffe/transforming-personality-scales/main/data/text-classification/'

# Import the training data
training_data = import_data(repository_data_url + "train-data.csv")

# View the first several rows of the training data
training_data.head()

Unnamed: 0,text,label
0,I rarely feel depressed.,neuroticism
1,I always know what I am doing.,conscientiousness
2,I do not put my mind on the task at hand.,conscientiousness
3,I keep things tidy.,conscientiousness
4,I laugh a lot.,extraversion


---
### Initializing GPT-3 Model
---
We've created a class `FewShotGPT3` that will serve as the model. When initially calling the class, three things will need to be defined: 

- `data` which is the training data (imported above using `import_data()`). The `data` argument must be a DataFrame with two columns (i.e., text documents and labels)
- `api_token` which is your OpenAI API token ([login here to view your api token](https://openai.com/api/)). While we've stored our api token above using the `API_KEY` object, you can manually enter it when initializing the model (see examples below).

There are several other arguements that can be seen by calling `print(FewShotGPT3.__init__.__doc__)`.
<br>

#### Examples: Initializing GPT-3 Model
```python
## Common use-case
# To initialized a few shot model in the most common case 
>>> few_shot_model = FewShotGPT3(data = training_data, api_token = API_KEY)

## Manual use-case
# If we wanted to manually input out api key and training data
training_data = pd.DataFrame(data =
    {
    'text':["document 1", "document 2", "document 3", "document 4"],
    'label': ["label a", "label b", "label b", "label a"],
    })
>>> few_shot_model = FewShotGPT3(data = training_data, api_token = 'a_fake_api_key_abc123')

## Varying architectures
# To initialized a few shot model object with 'ada' 
>>> few_shot_model = FewShotGPT3(data = training_data, api_token = API_KEY, model = 'ada')

# To initialized a few shot model object with 'curie' 
>>> few_shot_model = FewShotGPT3(data = training_data, api_token = API_KEY, model = 'curie')

## Additional cases
# changing instruction text for prompt 
>>> few_shot_model = FewShotGPT3(data = training_data, api_token = API_KEY, instruction_template = "Classify this text document.\n\n")

```

In [None]:
#@markdown ## RUN: Initialize Few Shot Model
few_shot_model = FewShotGPT3(data = training_data, api_token = API_KEY)

---
## Inspecting Model
---
The `few_shot_model` object, which is an *instance* of our `FewShotGPT3` class, stores several useful things after it's been initialized. Most are automatically derived from the training data. Some of the more important attributes are:
+ **label_to_index**: The labels identified from the training data mapped to a numeric representation or code (this is determined alphabetically be default). Thus, when classifying the Big Five, *agreeableness* = "0", *conscientiousness* = "1", *extraversion* = "2", and so forth. 
+ **token_to_index**: GPT *tokenizes* characters before prediction; this produces a series of position index values which represent the row of each label's numeric representation in GPT's pre-trained vocabulary. Words are often tokenized into sub-word units, which can lead to complications (especially when two different labels begin with the same tokens). Thus, we map labels to a numeric representation (e.g., "0","1", "2", "3") to avoid this problem.
+ **num_labels**: The number of labels identified.

In [None]:
#@markdown ## RUN: Inspect Model
# We can get an overview of the model using the print function
print(few_shot_model)

# Look at your first tokens to verify they are unique
few_shot_model.label_to_index

GPT-3 architecture: davinci
Training data imported: True
Training data size: 733
Text col: text
Label col: label
Label Mapping:
agreeableness(0)
conscientiousness(1)
extraversion(2)
neuroticism(3)
openness(4)


{'agreeableness': '0',
 'conscientiousness': '1',
 'extraversion': '2',
 'neuroticism': '3',
 'openness': '4'}

---
## Selecting Few Shot Examples
---
Since this is an illustration of *Few-Shot* learning. We can call the `select_few_shot()` function which either: 

**(A)** randomly samples `few_shot_k` examples *per* label. For example, if you have five labels, setting `few_shot_k = 2` will create a few shot dataset of size 10 in your model object. Different random seeds can be used via the `seed` argument.

**(B)** allows one to pick particular examples using the `select_indices` argument. For example, you would like to use the 1st, 3rd, 4th, 10th, and 15th cases in your `training data` as few shot examples you woud pass `select_indices = [1, 3, 4, 10, 15]` to the `select_few_shot()` function.

We provide examples of various use cases below.

#### Examples: Selecting Few Shot Examples
```python
## Common use-cases
# Randomly select 1 item per label
>>> few_shot_model.select_few_shot()

# Randomly select 2 item per label
>>> few_shot_model.select_few_shot(few_shot_k = 2)

# Randomly select 5 item per label
>>> few_shot_model.select_few_shot(5)

## Manual selection of examples
# use 5th, 9th, 20th, 30th cases in training data for few shot learning
>>> few_shot_model.select_few_shot(select_indices = [5, 9, 20, 30])

# Get every 4th case in training data for few shot learning
>>> indx = list(range(len(few_shot_model.training_data))
>>> fourth_indx = indx[0::4]
>>> few_shot_model.select_few_shot(select_indices = fourth_indx)
```

In [None]:
#@markdown ## RUN:  Select Few Shot Examples
#The select_few_shot method will update our model object by adding a few shot dataset
few_shot_model.select_few_shot(FEWSHOTK)
# You can check the newly created few shot dataset by typing in `.few_shot_data` after your model object
few_shot_model.few_shot_data

Few shot examples selected successfully.


Unnamed: 0,text,label
0,I make a fool of myself.,1
1,I feel crushed by setbacks.,3
2,I show my feelings.,0
3,Others perceive that I understand things quickly.,4
4,I joke around a lot.,2
5,I do not plan ahead.,1
6,I amuse my friends.,2
7,I look down on others.,0
8,I enjoy examining myself and my life.,4
9,I want things to proceed according to plan.,1


---
## Importing Testing Data
---
Again, we can use the repository url that was specified earlier. Since out "test set" is not really a test set (given labels are present), we will import the labels for model evaluation later on.

In [None]:
#@markdown ## RUN:  Import Test Data
test_data = import_data(repository_data_url + "test-data.csv", label_col='label')
# convert test text to a list for the predict function
test_text = test_data['text'].tolist()
# see first 5 cases
test_data.head()

Unnamed: 0,text,label
0,I avoid imposing my will on others.,agreeableness
1,I rarely put people under pressure.,agreeableness
2,I am out for my own personal gain.,agreeableness
3,I show my feelings when I am happy.,extraversion
4,I have a strong personality.,extraversion


#### Data Used for Prediction

By default, the `predict()` method will use the `few_shot_data` for training the model before predicting each text document passed via `predict(text = ...)`. However, if `few_shot_data` is *not* created by calling `select_few_shot()` the model will "attempt" to use the complete training data (this may lead to an error if all of the training examples do not fit in the prompt).

Test cases can also be specifed manually via the `test` argument:
```
# Instead of predicting the test data, predict manually entered text
two_new_test_docs = ['I enjoy playing group sports.', 'When getting things done, I like to boss people around.']
>>> few_shot_model.predict(test = two_new_test_docs)
```


---
## Predicting Labels of Test Cases
---
Since both training and testing data has been loaded, we can now classify the test cases. GPT-3's *Completions API* simplifies the training process by training the model and predicting test cases concurrently. **One limitation to note**, however, is that the Completions API **may only predict one test case at a time.** Thus, the `predict()` function will loop through each test example.

Additionally, with the exception of the `test` and `request_delay` arguments, the `predict()` function allows for arguments to be passed directly to `openai.Completion.create()`(i.e., the Completions API). To see a list of additional arguments, visit the [Completions API documentation](https://platform.openai.com/docs/api-reference/completions). We provide several examples below.


#### Examples: Customizing GPT-3 Predictions
```python
## Common use-cases
# import test data
>>> test_data = import_data("test-data.csv", label_col='label')
>>> test_text = test_data['text'].tolist()
>>> predictions = few_shot_model.predict(test = test_text)

## Additional Examples
# increasing the temperature
#|- Not recommended (for classification), however, this could be used in cases ...
#|- where one would like to see possible confounding labels.
>>> few_shot_model.predict(test = test_text, temperature = 0.10)

# Use a different version of Davinci
>>> few_shot_model.predict(test = test_text, model = 'davinci-instruct-beta')

# Have model return only top predicted label
>>> few_shot_model.predict(test = test_text, logprobs = 1)
```

### Run Predictions
We will now run the predict method.

In [None]:
#@markdown ## RUN:  Predict Test Cases
few_shot_model.predict(test_text)

Predictions complete! Please see <model>.prediction_results


---
## Inspecting and Outputting Predictions
---

#### Calculating Label Probabilities
Raw prediction data is stored as a list of dictionaries (one for each test case) within the model object `.results`. The `index` in the dataframe (generated by the example code block below) represents the position in the sequence that GPT-3 generated. The the token it selected is in the second column and its log probability is in the 3rd. In the `top_logprobs` column you can view the tokens GPT-3 selected among. The selected token has the lowest logprob in the `top_logprobs` cell.

```
>>> pd.DataFrame(few_shot_model.results[0]["choices"][0]["logprobs"])

```

We provide an internal function (i.e,. `extract_predictions()`) to calculate label probabilities. This returns probability estimates for each label. Given we weight our label tokens prior to classification (multi-class classification), GPT-3 usually picks among the labels specified. However, in cases where tokens were generated that are different than the labels presented, we offer an `'unknown_label'`.

#### Outputting Predictions as Object
Predictions can be extracted for additional purposes (e.g., model evaluation) using the `output_predictions()` method, and assigning the result to a new object (see examples below).

#### Outputting Predictions to File
Predictions can be output to a `csv` file using the `output_predictions()` function. **Importantly**, one must set the `output_file = True` and give the output file a name using the `output_file_name =` argument. Here are some examples:

<br>

#### Examples: Outputting Predictions
```python
## Common use-cases
# assigning prediction data to a new object then outputing file
>>> test_predictions = few_shot_model.output_predictions()
>>> test_predictions.to_csv("test-preds.csv")
# alternatively, using the output_file flag
>>> few_shot_model.output_predictions(output_file = True, output_file_name = "test-preds.csv")
```

In [None]:
#@markdown ## RUN: Output Predictions to CSV File
few_shot_model.output_predictions(output_file = True, output_file_name = f'few-shot-{FEWSHOTK}-results.csv')

file output to: few-shot-5-results.csv


---
## Evaluating the Model
---

In a case where we are provided the *ground truth* test labels (e.g., the *'label'* column in the `raw_test_data` dataset), we provide the `evaluate_model()` function to calculate model evaluation metrics. 

**Note:** The `predicted` argument represents predictions and `actual` argument represents ground truth labels.

In [None]:
#@markdown ## RUN: Evaluate Model
#test_predictions = few_shot_model.output_predictions()
print(test_data['label'])
test_predictions["predicted_label"] = test_predictions["predicted_label"].replace(few_shot_model.index_to_label)
eval_metrics = evaluate_model(actual = test_data["label"], predicted = test_predictions["predicted_label"]) 
# Print Results
eval_metrics

0          agreeableness
1          agreeableness
2          agreeableness
3           extraversion
4           extraversion
             ...        
114        agreeableness
115    conscientiousness
116    conscientiousness
117          neuroticism
118          neuroticism
Name: label, Length: 119, dtype: object


{'overall':            accuracy   macro avg  weighted avg
 precision  0.512605    0.579812      0.578634
 recall     0.512605    0.505756      0.512605
 f1-score   0.512605    0.503291      0.505618
 support    0.512605  119.000000    119.000000,
 'by_label':             openness  conscientiousness  agreeableness  extraversion  \
 precision   0.385965           0.625000       0.666667      0.571429   
 recall      0.880000           0.400000       0.347826      0.380952   
 f1-score    0.536585           0.487805       0.457143      0.457143   
 support    25.000000          25.000000      23.000000     21.000000   
 
            neuroticism  
 precision     0.650000  
 recall        0.520000  
 f1-score      0.577778  
 support      25.000000  }