SOP: Quality Assurance & Best Practices for Running CWL Workflows on Cavatica
Version: 1.0
Date: 2026-01-15
Team: BTI-BFX-Engineering
Table of Contents
- Purpose
- Scope
- Guiding Principles
- Pre-Run Task Preparation (Cavatica-Specific)
- File Inputs
- CWL Workflow Validation
- Versioning Requirements
- CWL Runtime Settings
- Workflow Design Standards
- Required Validation Steps
- Output Contract
- Logging Requirements
- Task Planning & Execution on Cavatica
- Small-Batch Validation
- Peer Review
- Criteria Before Full Launch
- Preventing Reruns
- Preventing Task Deletion
- Exporting Data Safely
- Documentation Requirements
- Continuous Improvement
- Roles & Responsibilities
- Appendices
1. Purpose
This SOP defines standards and procedures for designing, validating, launching, and exporting data from CWL workflows run on Cavatica, focusing on reducing task deletions, reruns, and increasing workflow reliability.
2. Scope
This SOP applies to all CWL workflows executed on Cavatica.
3. Guiding Principles
- Reproducibility
- Validation before execution
- Predictable outputs
- Immutability
- Traceability
4. Pre-Run Task Preparation (Cavatica-Specific)
4.1 File Inputs
- Validate file types and metadata
- Confirm file IDs
- Verify references
4.2 CWL Workflow Validation (Before Depolying to Cavatica)
- Use
cwltool --validate - Validate input schema
4.3 Versioning Requirements
- Document CWL version, Docker digest, reference bundle
4.4 CWL Runtime Settings
- Set resource requirements
- Avoid hard-coded paths
5. Workflow Design Standards
5.1 Required Validation Steps
- Input metadata validation
- Reference integrity checks
- File existence checks
5.2 Output Contract (Cavatica)
- Define final outputs
- Checksums
- Naming conventions
5.3 Logging Requirements
- Structured logs
- Summary log
- Docker stdout/stderr
5.4 Workflow I/O Documentation (File Types + Globs + Paths)
- Document expected input file extensions (e.g.,
.fastq.gz,.bam{,.bai},.vcf.gz{,.tbi},.json/.tsv) - Document expected output file extensions and where they land (e.g.,
outputs/**,logs/**,qc/**,checksums/**) - Include canonical glob patterns for discovery/validation (e.g.,
inputs/**/*.{fastq,fq}.gz,outputs/**/*.vcf.gz{,.tbi}) - List potential/allowed project paths (Cavatica project folders, mounted reference locations) and prohibit hard-coded absolute paths
6. Task Planning & Execution on Cavatica
6.1 Small-Batch Validation
Run 1–3 samples end-to-end before full launch.
6.2 Peer Review
Another engineer reviews inputs, versions, references.
6.3 Criteria Before Full Launch
All validations passed, parameters confirmed.
7. Preventing Reruns
Use version-locked references, docker digests, validation scripts.
8. Preventing Task Deletion (Cavatica Best Practices)
- Use dev projects for testing
- Enforce naming conventions
- Avoid overwriting outputs
9. Exporting Data Safely from Cavatica
Pre-Export
Validate outputs, checksums
Post-Export
Spot QC, document export details
10. Documentation Requirements
- README
- Input schema
- Output contract
- Changelog
11. Continuous Improvement
Quarterly reviews, post-mortems.
12. Roles & Responsibilities
| Role | Responsibility |
|---|---|
| Engineering | Workflow development |
| Data Ops | QC & exports |
| Leads | Approvals |
| All Users | SOP compliance |
13. Appendices
A. Sample Task Description Template
Workflow: WGS Alignment v2.4.0
Commit: f1c2e7a
Docker: quay.io/childrens-bti/wgs:v2.4.0@sha256:...
Reference: GRCh38_refbundle_v1
Inputs validated: Yes
Export path: s3://bti-data/harmonization/wgs/v2.4.0/
QC reviewer: name
Run date: YYYY-MM-DD
B. Metadata Schema Template
(To be filled per workflow)
C. Output Contract Example
(To be added per workflow)
End of Document