# Synteny analysis of long- vs short-lived sea urchins Analysis completed by Kate Castellano Purpose: Look for chromosomal rearrangments in the long- versus short-lived sea urchins to identify novel rearrangments associated with longevity/negligible senescence Summary: Pairwise synteny comparisons were inferred between M. franciscancus and S. purpuratus, M. franciscancus and L. variegatus, M. franciscancus and L. pictus, and L. variegatus and L. pictus using MCscan. S. purpuratus version 5.0 gene annotations were obtained from Echinobase (https://www.echinobase.org), version 3.0 of the L. variegatus gene annotations obtained from Echinobase and version 2.0 of the L. pictus gene annotations obtained from NCBI. LAST (v 1445) was used for genome wide alignments of coding regions, and filtering of tandem duplications and weak hits. Linkage clustering into syntenic blocks and visualization was performed with the MCscan python workflow (https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)). Microsynteny visualization of the Hox cluster was also completed through MCscan modules. # *Mesocentrotus franciscanus* versus *Lytechinus variegatus* MCscan (python version)
program to download: https://github.com/tanghaibao/jcvi
MCscan manual/workflow: https://github.com/tanghaibao/jcvi/wiki/MCscan-%28Python-version%29#dependencies
## Reformat files for MCscan

L variegatus (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/018/143/015/GCF_018143015.1_Lvar_3.0/)

reformat gff file with MCscan

reformat CDS file with MCscan

edit headers of the CDS file to match the column 4 of the bed file (otherwise you will get errors when you run MCscan)

    
##Manually Remove noncoding, miscRNA and rRNA sequences
awk '/^>/ {P=index($0,"ncRNA")==0} {if(P) print} ' Lvariegatus.cds > Lvariegatus_test.cds
awk '/^>/ {P=index($0,"miscrna")==0} {if(P) print} ' Lvariegatus_test.cds > Lvariegatus_test2.cds
awk '/^>/ {P=index($0,"rRNA")==0} {if(P) print} ' Lvariegatus_test2.cds > Lvariegatus_test3.cds
rm Lvariegatus.cds
mv Lvariegatus_test3.cds Lvariegatus.cds
rm Lvariegatus_test.cds
rm Lvariegatus_test2.cds
rmLvariegatus_test3.cds


##edit header to contain only the transcript ID
    #remove everything from the beginning of the header up to transcript_id=; add the ">" back
        sed -i 's/.*transcript_id=/>/g' Lvariegatus.cds
    #remove everything after "]"
        sed -i 's/].*//g' Lvariegatus.cds

M. franciscanus

convert gtf to gff file

reformat gff file with MCscan

reformat CDS file with MCscan

## Run MCscan pairewise synteny analysis and visualize - *M. franciscanus* vs *L. variegatus*

Run MCscan
This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
At the end of the run, you'll see the summary statistics of the synteny blocks.
- input: Mfranciscanus.cds, Mfranciscanus.bed, Lvariegatus.cds, Lvariegatus.bed
- output:
- script:_ 1_MCscan_synteny_MfranvsLvar.sh

Analyze if synteny is 1:1

input: Mfranciscanus.Lvariegatus.anchors
output: 2_MCscan_depth_MfranvsLvar.log

    
Genome Mfranciscanus depths:
Depth 0: 3,148 of 22,306 (14.1%)
Depth 1: 15,494 of 22,306 (69.5%)
Depth 2: 3,367 of 22,306 (15.1%)
Depth 3: 247 of 22,306 (1.1%)
Depth 4: 46 of 22,306 (0.2%)
Depth 5: 4 of 22,306 (0.0%)
Genome Lvariegatus depths:
Depth 0: 5,355 of 33,669 (15.9%)
Depth 1: 27,330 of 33,669 (81.2%)
Depth 2: 971 of 33,669 (2.9%)
Depth 3: 13 of 33,669 (0.0%)
Mfranciscanus vs Lvariegatus syntenic depths
1:2 pattern

script: 2_MCscan_depth_MfranvsLvar.sh

Visualize Macrosynteny

create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line
1. count length of M. franciscanus chromosomes and order largest to smallest
2. sort and remove ">" charater
3. add ordered IDs to seqid file
```
    
awk '{print $1}' Mfran_chr_length_sort.txt | uniq | paste -d, -s >> seqids
awk '{print $1}' Lvariegatus.bed | uniq | paste -d, -s >> seqids
  
```
Create the simple file
- input: Mfranciscanus.Lvariegatus.anchors
- output: Mfranciscanus.Lvariegatus.anchors.simple and Mfranciscanus.Lvariegatus.anchors.new
- script: MCscan_mkSimpleFile_MfranvsLvar.sh

create the layout file (see below)

    
# y, xstart, xend, rotation, color, label, va,  bed
.6,     .1,    .8,       0, #f1b6da, Mfranciscanus, top, Mfranciscanus.bed
.4,     .1,    .8,       0, #4dac26, Lvariegatus, top, Lvariegatus.bed
# edges
 e, 0, 1, Mfranciscanus.Lvariegatus.anchors.simple

Run macrosynteny analysis
- input: seqids, layout
- output: MfranvsLvar_karyotype.pdf
- script: 4_MCscan_macrosynteny_MfranvsLvar.sh

## Hox Clusters - *M. franciscanus* vs *L. variegatus*

Get blocks for the hox cluster

get the line numbers for the first and last hox gene

    
#get the line numbers for the first and last hox gene
grep -n "Mfran_g4479" Mfranciscanus.Lvariegatus.i1.blocks
10306:Mfran_g4479       XM_041602940.1
grep -n "Mfran_g4498" Mfranciscanus.Lvariegatus.i1.blocks
10315:Mfran_g4498       XM_041604911.1
    

#pull out the lines of the hox cluster (+ 2 genes on either end)
sed -n '10304,10317p' Mfranciscanus.Lvariegatus.i1.blocks > Mfran_hox.blocks

Edit layout file
Run microsynteny analysis

input: Mfran_hox.blocks, hoxblock.layout, Mfranciscanus_Lvariegatus.bed
output: Mfran_hox.pdf
script: 6_MCscan_microsynteny_chr_MfranvsLvar.sh

# *Mesocentrotus franciscanus* versus *Lytechinus pictus* *M.franciscanus* files are the same files edited above ## Reformat *L. pictus* files for MCscan

L pictus https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/018/143/015/GCF_018143015.1_Lpictus_3.0/

reformat gff file with MCscan

reformat CDS file with MCscan

edit headers of the CDS file to match the column 4 of the bed file (otherwise you will get errors when you run MCscan)

    
##Manually edit header - remove everything from the beginning of the header up to transcript_id=; add the ">" back
    sed -i 's/.*transcript_id=/>/g' Lpictus.cds

## Run MCscan pairewise synteny analysis and visualize - *M. franciscanus* vs *L. pictus*

Run MCscan
This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
At the end of the run, you'll see the summary statistics of the synteny blocks.
- input: Mfranciscanus.cds, Mfranciscanus.bed, Lpictus.cds, Lpictus.bed
- output:
- script: 1_MCscan_synteny_MfranvsLpictus.sh

Analyze if synteny is 1:1

input: Mfranciscanus.Lpictus.anchors
output: 2_MCscan_depth_MfranvsLpictus.log

    
Genome Mfranciscanus depths:
Depth 0: 3,515 of 22,306 (15.8%)
Depth 1: 17,828 of 22,306 (79.9%)
Depth 2: 930 of 22,306 (4.2%)
Depth 3: 33 of 22,306 (0.1%)
        Genome Lpictus depths:
Depth 0: 5,737 of 28,631 (20.0%)
Depth 1: 21,633 of 28,631 (75.6%)
Depth 2: 1,227 of 28,631 (4.3%)
Depth 3: 34 of 28,631 (0.1%)
Mfranciscanus vs Lpictus syntenic depths
1:1 pattern

script: 2_MCscan_depth_MfranvsLpictus.sh

Visualize Macrosynteny

create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line
1. count length of L. pictus chromosomes and order largest to smallest
2. sort and remove ">" charater
3. add ordered IDs to seqid file
```
    
awk '{print $1}' Mfran_chr_length_sort.txt | uniq | paste -d, -s > seqids
awk '{print $1}' Lpictus_chr_length_sort.txt | uniq | paste -d, -s >> seqids
  
```
Create the simple file
- input: Mfranciscanus.Lpictus.anchors
- output: Mfranciscanus.Lpictus.anchors.simple and Mfranciscanus.Lpictus.anchors.new
- script: MCscan_mkSimpleFile_MfranvsLpictus.sh

create the layout file (see below)

    
# y, xstart, xend, rotation, color, label, va,  bed
.6,     .1,    .8,       0, #f1b6da, Mfranciscanus, top, Mfranciscanus.bed
.4,     .1,    .8,       0, #4dac26, Lpictus, top, Lpictus.bed
# edges
 e, 0, 1, Mfranciscanus.Lpictus.anchors.simple

Run macrosynteny analysis
- input: seqids, layout
- output: MfranvsLpictus_karyotype.pdf
- script: 4_MCscan_macrosynteny_MfranvsLpictus.sh

# *Mesocentrotus franciscanus* versus *Strongylocentrotus purpuratus* *M.franciscanus* files are the same files edited above ## Reformat *S. purpuratus* files for MCscan

S. purpuratus

reformat gff file with MCscan

reformat CDS file with MCscan

edit headers of the CDS file to match the column 4 of the bed file (otherwise you will get errors when you run MCscan)

    
##Manually edit header - remove everything from the beginning of the header up to transcript_id=; add the ">" back
    sed -i 's/.*transcript_id=/>/g' Spurp.cds

## Run MCscan pairewise synteny analysis and visualize - *M. franciscanus* vs *s. Purpuratus*

Run MCscan
This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
At the end of the run, you'll see the summary statistics of the synteny blocks.
- input: Mfranciscanus.cds, Mfranciscanus.bed, Spurp.cds, Spurp.bed
- output:
- script: 1_MCscan_synteny_MfranvsSpurp.sh

Analyze if synteny is 1:1

input: Mfranciscanus.Spurp.anchors
output: 2_MCscan_depth_MfranvsSpurp.log

    
Genome Mfranciscanus depths:
Death 0: 747 of 22,306 (3.3%)
Death 1: 20,811 of 22,306 (93.3%)
Death 2: 737 of 22,306 (3.3%)
Death 3: 11 of 22,306 (0.0%)
Genome Spurp depths:
Death 0: 2,015 of 29,585 (6.8%)
Death 1: 27,486 of 29,585 (92.9%)
Death 2: 84 of 29,585 (0.3%)
Mfranciscanus vs Spurp syntenic depths
1:1 pattern

script: 2_MCscan_depth_MfranvsSpurp.sh

Visualize Macrosynteny

create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line
1. count length of S. purp chromosomes and order largest to smallest
2. sort and remove ">" charater
3. add ordered IDs to seqid file
```
    
awk '{print $1}' Mfran_chr_length_sort.txt | uniq | paste -d, -s > seqids
awk '{print $1}' Spurp_chr_length_sort.txt | uniq | paste -d, -s >> seqids
  
```
Create the simple file
- input: Mfranciscanus.Spurp.anchors
- output: Mfranciscanus.Spurp.anchors.simple and Mfranciscanus.Spurp.anchors.new
- script: MCscan_mkSimpleFile_MfranvsSpurp.sh

create the layout file (see below)

    
# y, xstart, xend, rotation, color, label, va,  bed
.6,     .1,    .8,       0, #f1b6da, Mfranciscanus, top, Mfranciscanus.bed
.4,     .1,    .8,       0, #4dac26, Spurp, top, Spurp.bed
# edges
 e, 0, 1, Mfranciscanus.Spurp.anchors.simple

Run macrosynteny analysis
- input: seqids, layout
- output: MfranvsSpurp_karyotype.pdf
- script: 4_MCscan_macrosynteny_MfranvsSpurp.sh

# *Lytechinus pictus* versus *Lytechinus variegatus* No reformating is required since this has been completed for both species above ## Run MCscan pairewise synteny analysis and visualize - *L. variegatus* vs *L. pictus*

Run MCscan
This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
At the end of the run, you'll see the summary statistics of the synteny blocks.
- input: Lvariegatus.cds, Lvariegatus.bed, Lpictus.cds, Lpictus.bed
- output:
- script: 1_MCscan_synteny_LvarvsLpictus.log

Analyze if synteny is 1:1

input: Lvariegatus.Lpictus.anchors
output: 2_MCscan_depth_LvarvsLpictus.log

    
Genome Lvariegatus depths:
Depth 0: 1,577 of 33,669 (4.7%)
Depth 1: 31,863 of 33,669 (94.6%)
Depth 2: 209 of 33,669 (0.6%)
Depth 3: 20 of 33,669 (0.1%)
Genome Lpictus depths:
Depth 0: 1,809 of 28,631 (6.3%)
Depth 1: 23,516 of 28,631 (82.1%)
Depth 2: 3,267 of 28,631 (11.4%)
Depth 3: 39 of 28,631 (0.1%)
Lvariegatus vs Lpictus syntenic depths
2:1 pattern

script: 2_MCscan_depth_LvarvsLpictus.sh

Visualize Macrosynteny

create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line already made in previous comparisons so copied from previous analyses
Create the simple file
- input: Lvariegatus.Lpictus.anchors
- output: Lvariegatus.Lpictus.anchors.simple and Lvariegatus.Lpictus.anchors.new
- script: 3_MCscan_mkSimpleFile_MfranvsLpictus.sh

create the layout file (see below)

    
# y, xstart, xend, rotation, color, label, va,  bed
.6,     .1,    .8,       0, #f1b6da, Lvariegatus, top, Lvariegatus.bed
.4,     .1,    .8,       0, #4dac26, Lpictus, top, Lpictus.bed
# edges
e, 0, 1, Lvariegatus.Lpictus.anchors.simple

Run macrosynteny analysis
- input: seqids, layout
- output: LvarvsLpictus_karyotype.pdf
- script: 4_MCscan_macrosynteny_LvarvsLpictus.sh

# *Strongylocentrotus purpuratus* versus *Lytechinus variegatus* No reformating is required since this has been completed for both species above ## Run MCscan pairewise synteny analysis and visualize - *S. purpuratus* vs *L. variegatus*

Run MCscan
This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
At the end of the run, you'll see the summary statistics of the synteny blocks.
- input: Lvariegatus.cds, Lvariegatus.bed, Spurp.cds, Spurp.bed
- output:
- script: 1_MCscan_synteny_SpurpvsLvar.sh

Analyze if synteny is 1:1

input: Spurp.Lvariegatus.anchors
output: 2_MCscan_depth_SpurpvsLvar.log

    
Genome Spurp depths:
Depth 0: 5,186 of 29,585 (17.5%)
Depth 1: 19,627 of 29,585 (66.3%)
Depth 2: 4,384 of 29,585 (14.8%)
Depth 3: 308 of 29,585 (1.0%)
Depth 4: 80 of 29,585 (0.3%)
Genome Lvariegatus depths:
Depth 0: 5,037 of 33,669 (15.0%)
Depth 1: 26,363 of 33,669 (78.3%)
Depth 2: 2,120 of 33,669 (6.3%)
Depth 3: 149 of 33,669 (0.4%)
Spurp vs Lvariegatus syntenic depths
1:2 pattern

script: 2_MCscan_depth_SpurpvsLvar.sh

Visualize Macrosynteny

create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line already made in previous comparisons so copied from previous analyses
Create the simple file
- input: Spurp.Lvariegatus.anchors
- output: Spurp.Lvariegatus.anchors.simple and Spurp.Lvariegatus.anchors.new
- script: 3_MCscan_mkSimpleFile_SpurpvsLVar.sh

create the layout file (see below)

    
# y, xstart, xend, rotation, color, label, va,  bed
.6,     .1,    .8,       0, #f1b6da, Spurpuratus, top, Spurp.bed
.4,     .1,    .8,       0, #4dac26, Lvariegatus, bottom, Lvariegatus.bed
# edges
e, 0, 1, Spurp.Lvariegatus.anchors.simp

Run macrosynteny analysis
- input: seqids, layout
- output: SpurpvsLvar_karyotype.pdf
- script: 4_MCscan_macrosynteny_SpurpvsLvar.sh

# *Strongylocentrotus purpuratus* versus *Lytechinus pictus* No reformating is required since this has been completed for both species above ## Run MCscan pairewise synteny analysis and visualize - *S. purpuratus* vs *L. pictus*

Run MCscan
This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
At the end of the run, you'll see the summary statistics of the synteny blocks.
- input: Lpictus.cds, Lpictus.bed, Spurp.cds, Spurp.bed
- output:
- script: 1_MCscan_synteny_SpurpvsLvar.sh

Analyze if synteny is 1:1

input: Spurp.Lpictus.anchors
output: Spurp.Lpictus.anchors

    
Genome Spurp depths:
Depth 0: 5,186 of 29,585 (17.5%)
Depth 1: 19,627 of 29,585 (66.3%)
Depth 2: 4,384 of 29,585 (14.8%)
Depth 3: 308 of 29,585 (1.0%)
Depth 4: 80 of 29,585 (0.3%)
Genome Lpictus depths:
Depth 0: 5,037 of 33,669 (15.0%)
Depth 1: 26,363 of 33,669 (78.3%)
Depth 2: 2,120 of 33,669 (6.3%)
Depth 3: 149 of 33,669 (0.4%)
Spurp vs Lpictus syntenic depths
1:2 pattern

script: 2_MCscan_depth_SpurpvsLvar.sh

Visualize Macrosynteny

create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line already made in previous comparisons so copied from previous analyses
Create the simple file
- input: Spurp.Lpictus.anchors
- output: Spurp.Lpictus.anchors.simple and Spurp.Lpictus.anchors.new
- script: 3_MCscan_mkSimpleFile_SpurpvsLVar.sh

create the layout file (see below)

    
# y, xstart, xend, rotation, color, label, va,  bed
.6,     .1,    .8,       0, #f1b6da, Spurpuratus, top, Spurp.bed
.4,     .1,    .8,       0, #4dac26, Lpictus, bottom, Lpictus.bed
# edges
e, 0, 1, Spurp.Lpictus.anchors.simp

Run macrosynteny analysis
- input: seqids, layout
- output: SpurpvsLvar_karyotype.pdf
- script: 4_MCscan_macrosynteny_SpurpvsLvar.sh

# Create karyotype image containg all four species, *Strongylocentrotus purpuratus*, *Mesocentrotus franciscanus*, *Lytechinus pictus*, and *Lytechinus variegatus* This is still only a pairwise comparison but the goal is to visualize all four species together. The files from each of the following comparisons were used and came from analyses described above:

Strongylocentrotus purpuratus versus Mesocentrotus franciscanus
Mesocentrotus franciscanus versus Lytechinus pictus
Lytechinus pictus versus Lytechinus variegatus

## Run MCscan pairewise synteny analysis and visualize - *S. purpuratus* vs *L. pictus*

Visualize Macrosynteny

create the seqID file
Create the simple file
These files were already created:
- Mfranciscanus.Spurp.anchors.simple
- Mfranciscanus.Lpictus.anchors.simple
- Lvariegatus.Lpictus.anchors.simple

create the layout file (see below)

    
# y, xstart, xend, rotation, color, label, va,  bed
.7,     .1,     .8,     0,      , Spurp, top, Spurp.bed
.5,     .1,     .8,     0,      , Mfran, top, Mfranciscanus.bed
.3,     .1,     .8,     0,      , Lpictus, bottom, Lpictus.bed
.1,     .1,     .8,     0,      , Lvar, bottom, Lvariegatus.bed
# edges
e, 0, 1, Mfranciscanus.Spurp.anchors.simple
e, 1, 2, Mfranciscanus.Lpictus.anchors.simple
e, 2, 3, Lvariegatus.Lpictus.anchors.simple

Run macrosynteny analysis
- input: seqids, layout
- output: Spurp_Mfran_Lpictus_Lvar_karyotype
- script: 4_MCscan_macrosynteny_Spurp_Mfran_Lpictus_Lvar