GEN_GENE_BARCODE_UMI_DB, CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB, pyarrow
Failure mode.
Process name: NF:CURIOSEEKER:PRIMARY_JOINED:CURIOSEEKER_GEN_GENE_BARCODE_UMI_DB
pyarrow.lib.ArrowNotImplementedError:Unsupported cast from string to null using function cast_null)
Answer
This is a sporadic error that will have a permanent fix in V3 of the Curio Seeker Bioinformatics Pipeline. There are two ways to fix the error in the meantime, both involving re-running the pipeline.
Solution 1
Re-run the pipeline from scratch.
- Remove the entire work folder ${root_output_dir}/work/ and results folder ${root_output_dir}/results/
- Re-trigger the pipeline without the -resume option.
Solution 2
To avoid re-running the pipeline completely from scratch, alternatively you can remove the work folders for steps CURIOSEEKER_READ1_DB and CURIOSEEKER_READ2_DB and resume the pipeline from these steps.
This can be done as follows:
- Find the associated work folders for these two steps at ${root_output_dir}/work/${step_hash}. The beginning of the ${step_hash} of this folder can be identified in this file: ${root_output_dir}/results/pipeline_info/execution_trace_${date}_${time}.txt.
The screen-shot below is an example of the execution_trace_${date}_${time}.txt file showing the the beginning of ${step_hash} for CURIOSEEKER_READ1_DB and CURIOSEEKER_READ2_DB.
- Once you find the two work folders (from step 1), look for them at ${root_output_dir}/work/${step_hash} and delete these two folders. Note value in blue is the beginning of the ${step_hash}, use ‘tab’ to find the full path.
- Resume the pipeline with -resume.