Where can I find the bam file where reads are associated with bead barcodes and/or molecular barcodes?

bam, v2.0.0, bead barcodes, molecular barcodes, bam tags

v2.0.0 does not generate a bam file where reads are associated with bead or molecular barcodes. However, the association can be generated by merging two of the pipeline’s intermediate output (r1-db and r2-db). The output will be a parquet file where each row is a read - read IDs and bead barcode sequences are defined for each row.

Merging of r1-db and r2-db involves running a module named gen-merged-r1-r2-db in the curioseeker singularity container, following steps below:

Find the two intermediate files in the work folder for step PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB in
```
${root_output_dir}/work/${step_hash}
```
The beginning of the ${step_hash} of this folder can be identified in this file:
```
 ${root_output_dir}/results/pipeline_info/execution_trace_${date}_${time}.txt  
```
Below is an example of the beginning of ${step_hash} (yellow box) for PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB (red box) in file execution_trace_${date}_${time}.txt

Once you identified the value boxed in yellow, look for this folder:
```
${root_output_dir}/work/${step_hash}
```
Note value in yellow is the beginning of the step_hash, use ‘tab’ to find the full path.
Copy your samplesheet.csv to this folder.
Make sure that the ${Sample_ID}-r1-db, ${Sample_ID}-r2-db, and the samplesheet.csv file for your sample of interest can be located in this folder.

Run the below mentioned command (in the same folder)

singularity exec ${path_to_curioseekerv2_singularity_container} \ curio-seeker-pipeline \ 
gen-merged-r1-r2-db \ 
--samplesheet="${path_to_samplesheet}" \
--sample=${sample_id}

${path_to_curioseekerv2_singularity_container}: you can find this path in the nextflow.config file (curioseeker-2.0.0/nextflow.config) as defined by parameter curio_seeker_singularity.

${path_to_samplesheet}: path to the samplesheet.csv you used to process this sample

${sample_id}: Sample_ID used for processing this sample

Example Command:

singularity exec /home/.singularity/curio-seeker-singularity:v2.0.0.sif \ 
curio-seeker-pipeline \
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \ 
--sample=Mouse_spleen

After a successful run, a folder named ${Sample_ID}-r1-r2-merged will be created in the same work folder containing chunked parquet files where each row is a read. Read id is defined in column read1_id, bead barcode in column BM. Additionally, only rows with only column r1_proper_structure_matched == True, column XS == Assigned should be included.

Troubleshooting:

If the above command gives this error: Invalid value for '--samplesheet': Path' samplesheet.csv does not exist, include --bind flag shown below to fix the issue.

singularity exec \
--bind ${root_samplesheet_folder}
${path_to_curioseekerv2_singularity_container} \
curio-seeker-pipeline \ 
gen-merged-r1-r2-db \ 
--samplesheet="${path_to_samplesheet}" \
--sample=${sample_id}

Example Command:

singularity exec --bind /mnt/ /home/.singularity/curio-seeker-singularity:v2.0.0.sif \ 
curio-seeker-pipeline \
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \ 
--sample=Mouse_spleen

Here, the --bind flag allows mounting of a directory (/mnt/) from the host machine into the container, enabling access to the content of the directory by the container.