Analysis of unmapped reads

The unmapped reads were independently analyzed using an alignment free and an alignment based approach

The alignment free approach

download kraken2 standard database (https://benlangmead.github.io/aws-indexes/k2)
note the database should contain human sequence
process unmapped reads with kraken2

kraken2 --db {input.kraken2db}  --unclassified-out {params.outprefix}  --report {output.report} --paired --use-names  {input.unmapped_1} {input.unmapped_2}

The alignment based approach

As rRNA dominates bacterial reads, first extract rRNA reads with sortMeRNA

extract rRNA reads, follow https://github.com/sortmerna/sortmerna

sortmerna --paired_in -ref rRNA.fasta --idx-dir sortmerna-index --reads unmapped/${sample_id}_1.fastq.gz --reads unmapped/${sample_id}_2.fastq.gz --fastx --workdir output/microbe-rRNA-split/${sample_id} --index 0 --out2 --other --threads 20

map rRNA reads to SLIVA rRNA

bowtie2 --no-unal -p 4 -1 ${fastq_1} -2 ${fastq_2}  --un-conc-gz output/rRNA/unmapped/${sample_id}_%.fastq.gz --no-discordant --end-to-end -x rRNA/bowtie2-index/rRNA | samtools view -b > ${output}

count rRNA reads

# -s means strand of the RNA-seq library. Take a look at metadata.txt
bin/count-rRNA.py -m 0 -b $bam -s $lib -c output/rRNA/count/control-MAPQ0/${sample_id}.txt --stats output/rRNA/count/control-MAPQ0/${sample_id}.stat