SOP: Quality Assurance & Best Practices for Running CWL Workflows on Cavatica

Version: 1.0
Date: 2026-01-15
Team: BTI-BFX-Engineering


Table of Contents

  1. Purpose
  2. Scope
  3. Guiding Principles
  4. Pre-Run Task Preparation (Cavatica-Specific)
  5. File Inputs
  6. CWL Workflow Validation
  7. Versioning Requirements
  8. CWL Runtime Settings
  9. Workflow Design Standards
  10. Required Validation Steps
  11. Output Contract
  12. Logging Requirements
  13. Task Planning & Execution on Cavatica
  14. Small-Batch Validation
  15. Peer Review
  16. Criteria Before Full Launch
  17. Preventing Reruns
  18. Preventing Task Deletion
  19. Exporting Data Safely
  20. Documentation Requirements
  21. Continuous Improvement
  22. Roles & Responsibilities
  23. Appendices

1. Purpose

This SOP defines standards and procedures for designing, validating, launching, and exporting data from CWL workflows run on Cavatica, focusing on reducing task deletions, reruns, and increasing workflow reliability.


2. Scope

This SOP applies to all CWL workflows executed on Cavatica.


3. Guiding Principles

  • Reproducibility
  • Validation before execution
  • Predictable outputs
  • Immutability
  • Traceability

4. Pre-Run Task Preparation (Cavatica-Specific)

4.1 File Inputs

  • Validate file types and metadata
  • Confirm file IDs
  • Verify references

4.2 CWL Workflow Validation (Before Depolying to Cavatica)

  • Use cwltool --validate
  • Validate input schema

4.3 Versioning Requirements

  • Document CWL version, Docker digest, reference bundle

4.4 CWL Runtime Settings

  • Set resource requirements
  • Avoid hard-coded paths

5. Workflow Design Standards

5.1 Required Validation Steps

  • Input metadata validation
  • Reference integrity checks
  • File existence checks

5.2 Output Contract (Cavatica)

  • Define final outputs
  • Checksums
  • Naming conventions

5.3 Logging Requirements

  • Structured logs
  • Summary log
  • Docker stdout/stderr

5.4 Workflow I/O Documentation (File Types + Globs + Paths)

  • Document expected input file extensions (e.g., .fastq.gz, .bam{,.bai}, .vcf.gz{,.tbi}, .json/.tsv)
  • Document expected output file extensions and where they land (e.g., outputs/**, logs/**, qc/**, checksums/**)
  • Include canonical glob patterns for discovery/validation (e.g., inputs/**/*.{fastq,fq}.gz, outputs/**/*.vcf.gz{,.tbi})
  • List potential/allowed project paths (Cavatica project folders, mounted reference locations) and prohibit hard-coded absolute paths

6. Task Planning & Execution on Cavatica

6.1 Small-Batch Validation

Run 1–3 samples end-to-end before full launch.

6.2 Peer Review

Another engineer reviews inputs, versions, references.

6.3 Criteria Before Full Launch

All validations passed, parameters confirmed.


7. Preventing Reruns

Use version-locked references, docker digests, validation scripts.


8. Preventing Task Deletion (Cavatica Best Practices)

  • Use dev projects for testing
  • Enforce naming conventions
  • Avoid overwriting outputs

9. Exporting Data Safely from Cavatica

Pre-Export

Validate outputs, checksums

Post-Export

Spot QC, document export details


10. Documentation Requirements

  • README
  • Input schema
  • Output contract
  • Changelog

11. Continuous Improvement

Quarterly reviews, post-mortems.


12. Roles & Responsibilities

Role Responsibility
Engineering Workflow development
Data Ops QC & exports
Leads Approvals
All Users SOP compliance

13. Appendices

A. Sample Task Description Template

Workflow: WGS Alignment v2.4.0  
Commit: f1c2e7a  
Docker: quay.io/childrens-bti/wgs:v2.4.0@sha256:...  
Reference: GRCh38_refbundle_v1  
Inputs validated: Yes  
Export path: s3://bti-data/harmonization/wgs/v2.4.0/  
QC reviewer: name  
Run date: YYYY-MM-DD

B. Metadata Schema Template

(To be filled per workflow)

C. Output Contract Example

(To be added per workflow)


End of Document