CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB step failed with the following error. How do I resolve it?

GEN_GENE_BARCODE_UMI_DB, CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB, pyarrow

Failure mode. 

Process name: NF:CURIOSEEKER:PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB

pyarrow.lib.ArrowNotImplementedError:Unsupported cast from string to null using function cast_null)

 

Answer

This is a sporadic error that will have a permanent fix in V3 of the Curio Seeker Bioinformatics Pipeline.  There are two ways to fix the error in the meantime, both involving re-running the pipeline. 


Solution 1

Re-run the pipeline from scratch. 

  1. Remove the entire work folder ${root_output_dir}/work/ and results folder ${root_output_dir}/results/
  2. Re-trigger the pipeline without the -resume option.

Solution 2

To avoid re-running the pipeline completely from scratch, alternatively you can remove the work folders for steps CURIOSEEKER_READ1_DB and CURIOSEEKER_READ2_DB and resume the pipeline from these steps.


This can be done as follows: 


  1. Find the associated work folders for these two steps at  ${root_output_dir}/work/${step_hash}. The beginning of the  ${step_hash} of this folder can be identified in this file: ${root_output_dir}/results/pipeline_info/execution_trace_${date}_${time}.txt. 

The screen-shot below is an example of the execution_trace_${date}_${time}.txt file showing the the beginning of  ${step_hash} for CURIOSEEKER_READ1_DB and CURIOSEEKER_READ2_DB.


  1. Once you find the two work folders (from step 1), look for them at ${root_output_dir}/work/${step_hash} and delete these two folders. Note value in blue is the beginning of the ${step_hash}, use ‘tab’ to find the full path. 
  2. Resume the pipeline with -resume.