I think geowatch is onto a really nice way of phrasing model training as fit -> predict -> evaluate. 


TODO: netharn-ish plugins for lightning

- [x] callbacks or drawing batches and dumping tensorboard to pngs via matplotlib / seaborn
- [ ] I want to implement netharn style logging (instead of global loggers - because avoiding globals is nice - each instance of the Trainer will get a unique logging object).
- [ ] Directory structure -  I'm making , currently have .  I need to get something to manage the directory structure as well. Probably wherever there is a "version_x" folder I might make a symlink to "recent" for most recent run with that "name" - which is a another thing I'd like control over.


The key idea is if you can phrase a machine learning problem in bash. 
For example, consider this toy problem: 

```bash

# Location of this experiment
DATA_DPATH=$HOME/data/work/toy_change
mkdir -p $DATA_DPATH
cd $DATA_DPATH
# Generate toy datasets
kwcoco toydata vidshapes8-multispectral --bundle_dpath $DATA_DPATH/vidshapes_train
kwcoco toydata vidshapes4-multispectral --bundle_dpath $DATA_DPATH/vidshapes_vali
kwcoco toydata vidshapes2-multispectral --bundle_dpath $DATA_DPATH/vidshapes_test


DATA_DPATH=$HOME/data/work/toy_change
python -m geowatch.tasks.fusion.fit \
    --train_dataset=$DATA_DPATH/vidshapes_train/data.kwcoco.json \
    --vali_dataset=$DATA_DPATH/vidshapes_vali/data.kwcoco.json \
    --test_dataset=$DATA_DPATH/vidshapes_test/data.kwcoco.json \
    --package_fpath=deployed.pt \
    --max_epochs=1 \
    --max_steps=1 --gpus 1 # [**train_hyperparams]

DATA_DPATH=$HOME/data/work/toy_change
python -m geowatch.tasks.fusion.predict \
    --test_dataset=$DATA_DPATH/vidshapes_test/data.kwcoco.json \
    --package_fpath=deployed.pt \
    --thresh=0.0605 --gpus 1 \
    --pred_dataset=$DATA_DPATH/vidshapes_test_pred/pred.kwcoco.json  # [**pred_hyperparams]

# jq .images[0] $DATA_DPATH/vidshapes_test/pred/pred.kwcoco.json 
kwcoco show $DATA_DPATH/vidshapes_test_pred/pred.kwcoco.json --gid 1 --channels B1

DATA_DPATH=$HOME/data/work/toy_change
python -m geowatch.tasks.fusion.evaluate \
    --true_dataset=$DATA_DPATH/vidshapes_test/data.kwcoco.json \
    --pred_dataset=$DATA_DPATH/vidshapes_test_pred/pred.kwcoco.json \
    --eval_dpath=$DATA_DPATH/vidshapes_test_pred_eval  # [**eval_hyperparams]


# tree $DATA_DPATH --filelimit 5 -L 2
# tree $DATA_DPATH/vidshapes_test_pred
```


See Help 

```bash
python -m geowatch.tasks.fusion.fit --help
python -m geowatch.tasks.fusion.predict --help
python -m geowatch.tasks.fusion.evaluate --help
```


### Notes

There are parts of netharn that could be ported to lightning

The logging stuff
    - [x] loss curves (odd they aren't in tensorboard)

The auto directory structure
    - [x] save multiple checkpoints
    - [ ] delete them intelligently

The run management
    - [ ] The netharn/cli/manage_runs.py

The auto-deploy files
    - [x] Use Torch 1.9 Packages instead of Torch-Liberator

Automated dynamics / plugins?

- [X] Rename --dataset argument to --datamodule

- [ ] Rename KWCocoVideoDataModule to ChangeDataModule

- [ ] Need to figure out how to connect configargparse with ray.tune

- [ ] Distributed Training:
    - [ ] How do do DistributedDataParallel
    - [ ] On one machine
    - [ ] On multiple machines

- [ ] Add Data Modules:
    - [ ] SegmentationDataModule
    - [ ] ClassificationDataModule
    - [ ] DetectionDataModule
    - [ ] <Problem>DataModule


### 

Need to consolidate code in scheduler inference and repackage to get into a
flow where models are trained and evaluated