# Hi-C scaffolding for chromosome-length genome Hi-C data was generated by [Phase Genomics](https://phasegenomics.com/) using gonad tissues from Mf1 that was flash frozen immediately after dissection. ## Hi-C data QC Hi-C data was examined using [HiC-Pro](https://github.com/nservant/HiC-Pro) (v3.0.0). The configuration file for running HiC-Pro was copied to the working directory as ```config-hicpro_mfran.txt``` and the following lines were modified: ``` BOWTIE2_IDX_PATH = /data/prj/urchin/red-urchin-genome/hi-c/hic-pro_0810/input/ REFERENCE_GENOME = mfran-v1 GENOME_SIZE = /data/prj/urchin/red-urchin-genome/hi-c/hic-pro_0810/input/chromosome_sizes.tbl GENOME_FRAGMENT = /data/prj/urchin/red-urchin-genome/hi-c/hic-pro_0810/input/mfran_dpnii.bed LIGATION_SITE = GATCGATC ``` Please reference 1_Mf1-hicpro.sh. Plots generated can be found in "HiC-Pro_output". ## Hi-C scaffolding with Juicer and 3D-DNA The method used for HiC scaffolding followed the steps outlined in [Dudchenko et al. 2017](https://science.sciencemag.org/content/356/6333/92.full) and the ["Genome Assembly Cookbook"](http://aidenlab.org/assembly/manual_180322.pdf) by the Aidan Lab at the Baylor College of Medicine. Prior to actual HiC scaffolding, fragments shorter than 15 Kbp were removed from the preliminary assembly using script [removesmall.pl](https://github.com/drtamermansour/p_asteroides/blob/master/scripts/removesmalls.pl). ``` perl ./removesmall.pl 15000 ../../Mfran_genome-v1_no-mt.fa > ./draft_15kb+.fa ``` This removed 1,658,831 bp (1.66 Mb) = 0.2% of draft assembly (161 scaffolds). [Juicer](https://github.com/aidenlab/juicer) (v1.6) was then run (see 2_Mf1-juicer.sh), and thttps://github.com/aidenlab/juicerhe file ```merged_nodups.txt``` generated by Juicer was used as input to generate the new scaffolds, assembly file, and hic file with [3D-DNA](https://github.com/aidenlab/3d-dna) (V201008). ## Manual review with JuiceBox The HiC map generated by 3D-DNA clearly showed 21 distinct chromosomes (see included image), but the fasta output was as a single chromosome. [Juicebox Assembly Tools](https://github.com/aidenlab/Juicebox/wiki/Download) was used to manually split the assembly into the 21 chromosome-length scaffolds. The ".assembly" file was exported from JBAT and used in 3D-DNA's post-review script (see 4_Mf1-post-review.sh) to generate the final assembly's fasta and hic files. *Analysis completed by Jennifer Polinski*