CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB step failed with the following error. How do I resolve it?

GEN_GENE_BARCODE_UMI_DB, CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB, pyarrow

Failure mode.

Process name: NF:CURIOSEEKER:PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB

pyarrow.lib.ArrowNotImplementedError:Unsupported cast from string to null using function cast_null)

Answer

This is a sporadic error that will have a permanent fix in V3 of the Curio Seeker Bioinformatics Pipeline. There are two ways to fix the error in the meantime, both involving re-running the pipeline.

Solution 1

Re-run the pipeline from scratch.

Remove the entire work folder ${root_output_dir}/work/ and results folder ${root_output_dir}/results/
Re-trigger the pipeline without the -resume option.

Solution 2

To avoid re-running the pipeline completely from scratch, alternatively you can remove the work folders for steps CURIOSEEKER_READ1_DB and CURIOSEEKER_READ2_DB and resume the pipeline from these steps.

This can be done as follows:

Find the associated work folders for these two steps at ${root_output_dir}/work/${step_hash}. The beginning of the ${step_hash} of this folder can be identified in this file: ${root_output_dir}/results/pipeline_info/execution_trace_${date}_${time}.txt.

The screen-shot below is an example of the execution_trace_${date}_${time}.txt file showing the the beginning of ${step_hash} for CURIOSEEKER_READ1_DB and CURIOSEEKER_READ2_DB.

Once you find the two work folders (from step 1), look for them at ${root_output_dir}/work/${step_hash} and delete these two folders. Note value in blue is the beginning of the ${step_hash}, use ‘tab’ to find the full path.
Resume the pipeline with -resume.