# Prepare files to make a gene density karyotype image for M franciscanus
Analysis Completed by Kate Castellano
Purpose: Look at genome wide gene density
- Split the genome into 550kb windows and count the number of genes in each window
Multiple window sizes were tried (10 kb, 50kb, 100kb) but a window size of 550kb was picked to match P. lividus paper Figure 1B https://www.cell.com/cell-genomics/pdf/S2666-979X(23)00061-7.pdf)
- Get the length of each chromosome (done previously for synteny - see detailed notes)
- python script: 1_countSequenceLength.py (do not edit or run this script - I created it as a function that can be easily used. See script to run below)
- command to run: python3 2_run_countSequenceLength.py (This is the file you will edit with you input and output file names)
- input: genome file: Mfran_genome_FINAL.fa
- output: text file with Scaffold name and length: Mfran_chr_length.txt
- Edit file for bedtools
#remove ">" symbol before scaffold names
sed -i 's/>//g' Mfran_chr_length.txt
#Add a column with "0" value and then reorder columns with awk
sed "s/$/\t0/" Mfran_chr_length.txt | awk -v OFS="\t" '{print $1,$3,$2}' > Mfran_chr_length_forBedtools.txt
#Final output should look like this with Scaffold ID, and then the range (I always do 0 - whatever the length of the chrom is)
HiC_scaffold_1 0 46325628
- convert gene annotation file to bed format (using awk) and sort with bedtools
- input: annotation file: Mfran_braker-FINAL.gtf
- output: Mfran_braker_transcriptsSort.bed
- script: 3_convertAnnotationFile.sh
- Make windows with bedtools make windows, sort with bedtools and count the number of genes within those windows (bedtools map)
- input: gene annotation file (from Step 3): Mfran_chr_length_sort.txt
- output: sorted windows file: Mfran_genome_FINAL_550kb.windowsSort.bed and Number of genes in each window: Mfran_genecount_550kb.bed
- script: 4_makeWindows_geneDensity.sh
- Map onto chromosomes using RIdeogram
Tutorial: https://cran.r-project.org/web/packages/RIdeogram/vignettes/RIdeogram.html#:~:text=RIdeogram%20is%20a%20R%20package,genome%2Dwide%20data%20on%20idiograms.
- I do this on my Rstudio so I transfer Mfran_genecount_10kb.bed and Mfran_chr_length_forBedtools_sort.txt to my computer
- See R markdown: MfranciscanusRIdeogram_Gene Density.rmd