bam, v2.0.0, bead barcodes, molecular barcodes, bam tags
v2.0.0 does not generate a bam file where reads are associated with bead or molecular barcodes. However, the association can be generated by merging two of the pipeline’s intermediate output (r1-db and r2-db). The output will be a parquet file where each row is a read - read IDs and bead barcode sequences are defined for each row.
Merging of r1-db and r2-db involves running a module named gen-merged-r1-r2-db in the curioseeker singularity container, following steps below:
-
Find the two intermediate files in the work folder for step PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB in
${root_output_dir}/work/${step_hash}
The beginning of the ${step_hash} of this folder can be identified in this file:
${root_output_dir}/results/pipeline_info/execution_trace_${date}_${time}.txt
Below is an example of the beginning of ${step_hash} (yellow box) for PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB (red box) in file execution_trace_${date}_${time}.txt
Once you identified the value boxed in yellow, look for this folder:${root_output_dir}/work/${step_hash}
Note value in yellow is the beginning of the step_hash, use ‘tab’ to find the full path. - Copy your samplesheet.csv to this folder.
- Make sure that the ${Sample_ID}-r1-db, ${Sample_ID}-r2-db, and the samplesheet.csv file for your sample of interest can be located in this folder.
- Run the below mentioned command (in the same folder)
singularity exec ${path_to_curioseekerv2_singularity_container} \ curio-seeker-pipeline \
gen-merged-r1-r2-db \
--samplesheet="${path_to_samplesheet}" \
--sample=${sample_id}${path_to_curioseekerv2_singularity_container}: you can find this path in the nextflow.config file (curioseeker-2.0.0/nextflow.config) as defined by parameter curio_seeker_singularity.
${path_to_samplesheet}: path to the samplesheet.csv you used to process this sample
${sample_id}: Sample_ID used for processing this sample
Example Command:
singularity exec /home/.singularity/curio-seeker-singularity:v2.0.0.sif \
curio-seeker-pipeline \
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \
--sample=Mouse_spleen - After a successful run, a folder named ${Sample_ID}-r1-r2-merged will be created in the same work folder containing chunked parquet files where each row is a read. Read id is defined in column read1_id, bead barcode in column BM. Additionally, only rows with only column r1_proper_structure_matched == True, column XS == Assigned should be included.
Troubleshooting:
If the above command gives this error: Invalid value for '--samplesheet': Path' samplesheet.csv does not exist, include --bind flag shown below to fix the issue.
singularity exec \
--bind ${root_samplesheet_folder}
${path_to_curioseekerv2_singularity_container} \
curio-seeker-pipeline \
gen-merged-r1-r2-db \
--samplesheet="${path_to_samplesheet}" \
--sample=${sample_id}
Example Command:
singularity exec --bind /mnt/ /home/.singularity/curio-seeker-singularity:v2.0.0.sif \
curio-seeker-pipeline \
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \
--sample=Mouse_spleen
Here, the --bind flag allows mounting of a directory (/mnt/) from the host machine into the container, enabling access to the content of the directory by the container.