# Synteny analysis of long- vs short-lived sea urchins Analysis completed by Kate Castellano Purpose: Look for chromosomal rearrangments in the long- versus short-lived sea urchins to identify novel rearrangments associated with longevity/negligible senescence Summary: Pairwise synteny comparisons were inferred between M. franciscancus and S. purpuratus, M. franciscancus and L. variegatus, M. franciscancus and L. pictus, and L. variegatus and L. pictus using MCscan. S. purpuratus version 5.0 gene annotations were obtained from Echinobase (https://www.echinobase.org), version 3.0 of the L. variegatus gene annotations obtained from Echinobase and version 2.0 of the L. pictus gene annotations obtained from NCBI. LAST (v 1445) was used for genome wide alignments of coding regions, and filtering of tandem duplications and weak hits. Linkage clustering into syntenic blocks and visualization was performed with the MCscan python workflow (https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)). Microsynteny visualization of the Hox cluster was also completed through MCscan modules. # *Mesocentrotus franciscanus* versus *Lytechinus variegatus* MCscan (python version)
program to download: https://github.com/tanghaibao/jcvi
MCscan manual/workflow: https://github.com/tanghaibao/jcvi/wiki/MCscan-%28Python-version%29#dependencies
## Reformat files for MCscan
  1. L variegatus (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/018/143/015/GCF_018143015.1_Lvar_3.0/)
    1. reformat gff file with MCscan
    2. input: GCF_018143015.1_Lvar_3.0_genomic.gff
      output: Lvariegatus.bed
      script: convert_gff2bed_MCscan.sh
    3. reformat CDS file with MCscan
    4. input: GCF_018143015.1_Lvar_3.0_rna_from_genomic.fna
      output: Lvariegatus.cds
      script: reformatCDS_MCscan.sh
    5. edit headers of the CDS file to match the column 4 of the bed file (otherwise you will get errors when you run MCscan)

        
    ##Manually Remove noncoding, miscRNA and rRNA sequences
    awk '/^>/ {P=index($0,"ncRNA")==0} {if(P) print} ' Lvariegatus.cds > Lvariegatus_test.cds
    awk '/^>/ {P=index($0,"miscrna")==0} {if(P) print} ' Lvariegatus_test.cds > Lvariegatus_test2.cds
    awk '/^>/ {P=index($0,"rRNA")==0} {if(P) print} ' Lvariegatus_test2.cds > Lvariegatus_test3.cds
    rm Lvariegatus.cds
    mv Lvariegatus_test3.cds Lvariegatus.cds
    rm Lvariegatus_test.cds
    rm Lvariegatus_test2.cds
    rmLvariegatus_test3.cds
    
    ##edit header to contain only the transcript ID #remove everything from the beginning of the header up to transcript_id=; add the ">" back sed -i 's/.*transcript_id=/>/g' Lvariegatus.cds #remove everything after "]" sed -i 's/].*//g' Lvariegatus.cds
  2. M. franciscanus
    1. convert gtf to gff file
    2. input: Mfran_braker-CORRECTED.gtf
      output: Mfran_braker-CORRECTED.gff and gff2gtf.log
      script: gff2gtf.sh
    3. reformat gff file with MCscan
    4. input: Mfran_braker-CORRECTED.gtf
      output: Mfranciscanus.gff
      script: convert_gff2bed_MCscan.sh
    5. reformat CDS file with MCscan
    6. input: Mfran_braker-transcripts_FINAL.fa
      output: Mfranciscanus.cds
      script: reformatCDS_MCscan.sh
## Run MCscan pairewise synteny analysis and visualize - *M. franciscanus* vs *L. variegatus*
  1. Run MCscan
    This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
    A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
    At the end of the run, you'll see the summary statistics of the synteny blocks.

  2. Analyze if synteny is 1:1
  3. Visualize Macrosynteny
    1. create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line
      1. count length of M. franciscanus chromosomes and order largest to smallest
        • input: Mfran_genome_final.fa
        • output: Mfran_chr_length.txt
        • script: countSequenceLength.py and run_countSequenceLength.py
      2. sort and remove ">" charater
      3.     
        sort -n Mfran_chr_length.txt | sed 's/>//g' > Mfran_chr_length_sort.txt
        rm Mfran_chr_length.txt
          
        
      4. add ordered IDs to seqid file
          
      awk '{print $1}' Mfran_chr_length_sort.txt | uniq | paste -d, -s >> seqids
      awk '{print $1}' Lvariegatus.bed | uniq | paste -d, -s >> seqids
        
      
    2. Create the simple file
      • input: Mfranciscanus.Lvariegatus.anchors
      • output: Mfranciscanus.Lvariegatus.anchors.simple and Mfranciscanus.Lvariegatus.anchors.new
      • script: MCscan_mkSimpleFile_MfranvsLvar.sh
    3. create the layout file (see below)
          
      # y, xstart, xend, rotation, color, label, va,  bed
      .6,     .1,    .8,       0, #f1b6da, Mfranciscanus, top, Mfranciscanus.bed
      .4,     .1,    .8,       0, #4dac26, Lvariegatus, top, Lvariegatus.bed
      # edges
       e, 0, 1, Mfranciscanus.Lvariegatus.anchors.simple
          
      
    4. Run macrosynteny analysis
      • input: seqids, layout
      • output: MfranvsLvar_karyotype.pdf
      • script: 4_MCscan_macrosynteny_MfranvsLvar.sh
## Hox Clusters - *M. franciscanus* vs *L. variegatus*
  1. Get blocks for the hox cluster
# *Mesocentrotus franciscanus* versus *Lytechinus pictus* *M.franciscanus* files are the same files edited above ## Reformat *L. pictus* files for MCscan
  1. L pictus https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/018/143/015/GCF_018143015.1_Lpictus_3.0/
    1. reformat gff file with MCscan
    2. input:
      output: Lpictus.bed
      script: convert_gff2bed_MCscan.sh
    3. reformat CDS file with MCscan
    4. input: lytpic2.0.all.maker.transcripts.LPI.fasta
      output: Lpictus.cds
      script: reformatCDS_MCscan.sh
    5. edit headers of the CDS file to match the column 4 of the bed file (otherwise you will get errors when you run MCscan)

        
    ##Manually edit header - remove everything from the beginning of the header up to transcript_id=; add the ">" back
        sed -i 's/.*transcript_id=/>/g' Lpictus.cds
      
    
## Run MCscan pairewise synteny analysis and visualize - *M. franciscanus* vs *L. pictus*
  1. Run MCscan
    This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
    A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
    At the end of the run, you'll see the summary statistics of the synteny blocks.

  2. Analyze if synteny is 1:1
  3. Visualize Macrosynteny
    1. create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line
      1. count length of L. pictus chromosomes and order largest to smallest
        • input: GCA_015342785.1_UCSD_Lpic_2.0_genomic_CHRONLY_editHeader.fna
        • output: Lpictus_chr_length.txt
        • script: countSequenceLength.py and run_countSequenceLength.py
      2. sort and remove ">" charater
      3.     
        sort -k 2 -nr Lpictus_chr_length.txt | sed 's/>//g' > Lpictus_chr_length_sort.txt
        rm Lpictus_chr_length.txt
          
        
      4. add ordered IDs to seqid file
          
      awk '{print $1}' Mfran_chr_length_sort.txt | uniq | paste -d, -s > seqids
      awk '{print $1}' Lpictus_chr_length_sort.txt | uniq | paste -d, -s >> seqids
        
      
    2. Create the simple file
      • input: Mfranciscanus.Lpictus.anchors
      • output: Mfranciscanus.Lpictus.anchors.simple and Mfranciscanus.Lpictus.anchors.new
      • script: MCscan_mkSimpleFile_MfranvsLpictus.sh
    3. create the layout file (see below)
          
      # y, xstart, xend, rotation, color, label, va,  bed
      .6,     .1,    .8,       0, #f1b6da, Mfranciscanus, top, Mfranciscanus.bed
      .4,     .1,    .8,       0, #4dac26, Lpictus, top, Lpictus.bed
      # edges
       e, 0, 1, Mfranciscanus.Lpictus.anchors.simple
          
      
    4. Run macrosynteny analysis
      • input: seqids, layout
      • output: MfranvsLpictus_karyotype.pdf
      • script: 4_MCscan_macrosynteny_MfranvsLpictus.sh
# *Mesocentrotus franciscanus* versus *Strongylocentrotus purpuratus* *M.franciscanus* files are the same files edited above ## Reformat *S. purpuratus* files for MCscan
  1. S. purpuratus
    1. reformat gff file with MCscan
    2. input: sp5_0_GCF_top21chr.gff3
      output: Spurp.bed
      script: convert_gff2bed_MCscan.sh
    3. reformat CDS file with MCscan
    4. input: sp5_0_GCF_CDS.fa
      output: Spurp.cds
      script: reformatCDS_MCscan.sh
    5. edit headers of the CDS file to match the column 4 of the bed file (otherwise you will get errors when you run MCscan)

        
    ##Manually edit header - remove everything from the beginning of the header up to transcript_id=; add the ">" back
        sed -i 's/.*transcript_id=/>/g' Spurp.cds
      
    
## Run MCscan pairewise synteny analysis and visualize - *M. franciscanus* vs *s. Purpuratus*
  1. Run MCscan
    This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
    A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
    At the end of the run, you'll see the summary statistics of the synteny blocks.

  2. Analyze if synteny is 1:1
  3. Visualize Macrosynteny
    1. create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line
      1. count length of S. purp chromosomes and order largest to smallest
        • input:
        • output: Spurp_chr_length.txt
        • script: countSequenceLength.py and run_countSequenceLength.py
      2. sort and remove ">" charater
      3.     
        sort -k 2 -nr Spurp_chr_length.txt | sed 's/>//g' > Spurp_chr_length_sort.txt
        rm Spurp_chr_length.txt
          
        
      4. add ordered IDs to seqid file
          
      awk '{print $1}' Mfran_chr_length_sort.txt | uniq | paste -d, -s > seqids
      awk '{print $1}' Spurp_chr_length_sort.txt | uniq | paste -d, -s >> seqids
        
      
    2. Create the simple file
      • input: Mfranciscanus.Spurp.anchors
      • output: Mfranciscanus.Spurp.anchors.simple and Mfranciscanus.Spurp.anchors.new
      • script: MCscan_mkSimpleFile_MfranvsSpurp.sh
    3. create the layout file (see below)
          
      # y, xstart, xend, rotation, color, label, va,  bed
      .6,     .1,    .8,       0, #f1b6da, Mfranciscanus, top, Mfranciscanus.bed
      .4,     .1,    .8,       0, #4dac26, Spurp, top, Spurp.bed
      # edges
       e, 0, 1, Mfranciscanus.Spurp.anchors.simple
          
      
    4. Run macrosynteny analysis
      • input: seqids, layout
      • output: MfranvsSpurp_karyotype.pdf
      • script: 4_MCscan_macrosynteny_MfranvsSpurp.sh
# *Lytechinus pictus* versus *Lytechinus variegatus* No reformating is required since this has been completed for both species above ## Run MCscan pairewise synteny analysis and visualize - *L. variegatus* vs *L. pictus*
  1. Run MCscan
    This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
    A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
    At the end of the run, you'll see the summary statistics of the synteny blocks.

  2. Analyze if synteny is 1:1
  3. Visualize Macrosynteny
    1. create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line already made in previous comparisons so copied from previous analyses
    2. Create the simple file
      • input: Lvariegatus.Lpictus.anchors
      • output: Lvariegatus.Lpictus.anchors.simple and Lvariegatus.Lpictus.anchors.new
      • script: 3_MCscan_mkSimpleFile_MfranvsLpictus.sh
    3. create the layout file (see below)
          
      # y, xstart, xend, rotation, color, label, va,  bed
      .6,     .1,    .8,       0, #f1b6da, Lvariegatus, top, Lvariegatus.bed
      .4,     .1,    .8,       0, #4dac26, Lpictus, top, Lpictus.bed
      # edges
      e, 0, 1, Lvariegatus.Lpictus.anchors.simple
          
      
    4. Run macrosynteny analysis
      • input: seqids, layout
      • output: LvarvsLpictus_karyotype.pdf
      • script: 4_MCscan_macrosynteny_LvarvsLpictus.sh
# *Strongylocentrotus purpuratus* versus *Lytechinus variegatus* No reformating is required since this has been completed for both species above ## Run MCscan pairewise synteny analysis and visualize - *S. purpuratus* vs *L. variegatus*
  1. Run MCscan
    This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
    A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
    At the end of the run, you'll see the summary statistics of the synteny blocks.

  2. Analyze if synteny is 1:1
  3. Visualize Macrosynteny
    1. create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line already made in previous comparisons so copied from previous analyses
    2. Create the simple file
      • input: Spurp.Lvariegatus.anchors
      • output: Spurp.Lvariegatus.anchors.simple and Spurp.Lvariegatus.anchors.new
      • script: 3_MCscan_mkSimpleFile_SpurpvsLVar.sh
    3. create the layout file (see below)
          
      # y, xstart, xend, rotation, color, label, va,  bed
      .6,     .1,    .8,       0, #f1b6da, Spurpuratus, top, Spurp.bed
      .4,     .1,    .8,       0, #4dac26, Lvariegatus, bottom, Lvariegatus.bed
      # edges
      e, 0, 1, Spurp.Lvariegatus.anchors.simp
          
      
    4. Run macrosynteny analysis
      • input: seqids, layout
      • output: SpurpvsLvar_karyotype.pdf
      • script: 4_MCscan_macrosynteny_SpurpvsLvar.sh
# *Strongylocentrotus purpuratus* versus *Lytechinus pictus* No reformating is required since this has been completed for both species above ## Run MCscan pairewise synteny analysis and visualize - *S. purpuratus* vs *L. pictus*
  1. Run MCscan
    This calls LAST to do the comparison, filter the LAST output to remove tandem duplications and weak hits.
    A single linkage clustering is performed on the LAST output to cluster anchors into synteny blocks.
    At the end of the run, you'll see the summary statistics of the synteny blocks.

  2. Analyze if synteny is 1:1
  3. Visualize Macrosynteny
    1. create the seqID file this file should have a list separated by columns of chromosome IDs with each species on their own line already made in previous comparisons so copied from previous analyses
    2. Create the simple file
      • input: Spurp.Lpictus.anchors
      • output: Spurp.Lpictus.anchors.simple and Spurp.Lpictus.anchors.new
      • script: 3_MCscan_mkSimpleFile_SpurpvsLVar.sh
    3. create the layout file (see below)
          
      # y, xstart, xend, rotation, color, label, va,  bed
      .6,     .1,    .8,       0, #f1b6da, Spurpuratus, top, Spurp.bed
      .4,     .1,    .8,       0, #4dac26, Lpictus, bottom, Lpictus.bed
      # edges
      e, 0, 1, Spurp.Lpictus.anchors.simp
          
      
    4. Run macrosynteny analysis
      • input: seqids, layout
      • output: SpurpvsLvar_karyotype.pdf
      • script: 4_MCscan_macrosynteny_SpurpvsLvar.sh
# Create karyotype image containg all four species, *Strongylocentrotus purpuratus*, *Mesocentrotus franciscanus*, *Lytechinus pictus*, and *Lytechinus variegatus* This is still only a pairwise comparison but the goal is to visualize all four species together. The files from each of the following comparisons were used and came from analyses described above: ## Run MCscan pairewise synteny analysis and visualize - *S. purpuratus* vs *L. pictus*
  1. Visualize Macrosynteny
    1. create the seqID file
    2. this file should have a list separated by columns of chromosome IDs with each species on their own line already made in previous comparisons so copied from previous analyses
    3. Create the simple file
      These files were already created:
      • Mfranciscanus.Spurp.anchors.simple
      • Mfranciscanus.Lpictus.anchors.simple
      • Lvariegatus.Lpictus.anchors.simple
    4. create the layout file (see below)
          
      # y, xstart, xend, rotation, color, label, va,  bed
      .7,     .1,     .8,     0,      , Spurp, top, Spurp.bed
      .5,     .1,     .8,     0,      , Mfran, top, Mfranciscanus.bed
      .3,     .1,     .8,     0,      , Lpictus, bottom, Lpictus.bed
      .1,     .1,     .8,     0,      , Lvar, bottom, Lvariegatus.bed
      # edges
      e, 0, 1, Mfranciscanus.Spurp.anchors.simple
      e, 1, 2, Mfranciscanus.Lpictus.anchors.simple
      e, 2, 3, Lvariegatus.Lpictus.anchors.simple
          
      
    5. Run macrosynteny analysis
      • input: seqids, layout
      • output: Spurp_Mfran_Lpictus_Lvar_karyotype
      • script: 4_MCscan_macrosynteny_Spurp_Mfran_Lpictus_Lvar