Where can I find the bam file where reads are associated with bead barcodes and/or molecular barcodes?

bam, v2.0.0, bead barcodes, molecular barcodes, bam tags

v2.0.0 does not generate a bam file where reads are associated with bead or molecular barcodes. However, the association can be generated by merging two of the pipeline’s intermediate output (r1-db and r2-db). The output will be a parquet file where each row is a read - read IDs and bead barcode sequences are defined for each row.


Merging of r1-db and r2-db involves running a module named gen-merged-r1-r2-db in the curioseeker singularity container, following steps below:

  1. Find the two intermediate files in the work folder for step PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB in

    ${root_output_dir}/work/${step_hash}

    The beginning of the  ${step_hash} of this folder can be identified in this file:

     ${root_output_dir}/results/pipeline_info/execution_trace_${date}_${time}.txt  

    Below is an example of the beginning of  ${step_hash} (yellow box) for PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB (red box) in file execution_trace_${date}_${time}.txt



    Once you identified the value boxed in yellow, look for this folder:
    ${root_output_dir}/work/${step_hash}
    Note value in yellow is the beginning of the step_hash, use ‘tab’ to find the full path.  
  2. Copy your samplesheet.csv to this folder. 
  3. Make sure that the ${Sample_ID}-r1-db, ${Sample_ID}-r2-db, and the samplesheet.csv file for your sample of interest can be located in this folder.
  4. Run the below mentioned command (in the same folder) 
    singularity exec ${path_to_curioseekerv2_singularity_container} \ curio-seeker-pipeline \ 
    gen-merged-r1-r2-db \
    --samplesheet="${path_to_samplesheet}" \
    --sample=${sample_id}

    ${path_to_curioseekerv2_singularity_container}: you can find this path in the nextflow.config file (curioseeker-2.0.0/nextflow.config) as defined by parameter curio_seeker_singularity.

    ${path_to_samplesheet}: path to the samplesheet.csv you used to process this sample

    ${sample_id}: Sample_ID used for processing this sample

    Example Command: 

    singularity exec /home/.singularity/curio-seeker-singularity:v2.0.0.sif \ 
    curio-seeker-pipeline \
    gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \ 
    --sample=Mouse_spleen
  5. After a successful run, a folder named ${Sample_ID}-r1-r2-merged will be created in the same work folder containing chunked parquet files where each row is a read. Read id is defined in column read1_id, bead barcode in column BM. Additionally, only rows with only column r1_proper_structure_matched == True, column XS ==  Assigned should be included.

Troubleshooting:

If the above command gives this error: Invalid value for '--samplesheet': Path' samplesheet.csv does not exist, include  --bind  flag shown below to fix the issue.

singularity exec \
--bind ${root_samplesheet_folder}
${path_to_curioseekerv2_singularity_container} \
curio-seeker-pipeline \
gen-merged-r1-r2-db \
--samplesheet="${path_to_samplesheet}" \
--sample=${sample_id}

Example Command: 

singularity exec --bind /mnt/ /home/.singularity/curio-seeker-singularity:v2.0.0.sif \ 
curio-seeker-pipeline \
gen-merged-r1-r2-db \ --samplesheet=/mnt/seeker/work/99/387e66e5dd67a13f/samplesheet.20240115_Mouse_spleen.csv \
--sample=Mouse_spleen

Here, the --bind flag allows mounting of a directory (/mnt/) from the host machine into the container, enabling access to the content of the directory by the container.