Complex Chromosomal Rearrangements Resolver (CCRR) is a self‑contained toolkit that turns whole‑genome sequencing data into an annotated catalogue of complex tumour rearrangements. Packaged in a single Docker image, it installs all dependencies automatically, including six SV callers, five CNV callers, tools for purity–ploidy estimation, and a panel of complex event detectors (ShatterSeek, CTLPScanner, SeismicAmplification, AmpliconArchitect, Starfish, and gGnome via JaBba). A single command runs the full pipeline on tumour/normal BAM files, merges the results into high-confidence consensus SVs and copy number states, infers purity and ploidy, applies the complex event detection tools, and generates publication-ready Circos and track plots.
For users who already have SV and CNV results, a companion web server (https://www.ccrr.life/) provides an interactive interface that supports one-click execution of the same event detection suite. It accepts standard VCF and segment files as input, and also allows custom JSON-formatted SV/CNV data for flexible analysis and visualization. Users receive an interactive dashboard and downloadable result summaries and figures.
The source code is freely available at https://github.com/laslk/CCRR, and the workflow runs seamlessly on any Linux or Windows host with Docker support.
wget -O ccrr1.2.zip https://www.ccrr.life/download_file/ccrr1.2.zip
unzip -q ccrr1.2.zip
install.py
python install.py -sequenza -manta -delly -svaba -gridss \
-lumpy -soreca -purple -sclust -cnvkit \
-ref 'hg19&hg38'
This script will automatically download the dependency data for the tools you selected and build the Dockerfile.
-sequenza use sequenza for cn, cellularity and ploidy
-manta use manta for sv
-delly use delly for sv and cn
-svaba use svaba for sv
-gridss use gridss for sv
-lumpy use lumpy for sv
-soreca use soreca for sv
-purple use purple for cn
-sclust use sclust for cn
-cnvkit use cnvkit for cn
-ref hg19 or 'hg19&hg38'
You should select at least one SV tool and one CN tool.If you want to run in the fastest way, you can use only Delly to obtain SV and CN.
Gurobi: Apply for a WLS Compute Server license and store it in the same directory as the Dockerfile, named gurobi.lic
. For more information, visit www.gurobi.com
Mosek: Obtain a Mosek license and store it in the same directory as the Dockerfile, named mosek.lic
. For more information, visit www.mosek.com
docker build --pull --rm --build-arg GITHUB_PAT=[GITHUB_PAT] \
--build-arg SCRIPT_DIR="/home/0.script" --build-arg TOOL_DIR="/home/1.tools" \
--build-arg DATABASE_DIR="/home/2.share" --build-arg WORK_DIR="/home/3.wd" \
-f Dockerfile -t ccrr:v1.2 .
To ensure the installation proceeds correctly, you need to provide a GitHub token GITHUB_PAT
.
You can specify these four parameters: SCRIPT_DIR
for the script directory, default is /home/0.script
; TOOL_DIR
for the tool directory, default is /home/1.tools
; DATABASE_DIR
for the database and mounted shared directory, default is /home/2.share
; WORK_DIR
for the work directory, default is /home/3.wd
.
docker run -v $(pwd)/share:/home/2.share -v $(pwd)/wd:/home/3.wd -d -it --name ccrr ccrr:v1.2
docker exec -it ccrr /bin/bash
This command mounts the current directory's share
folder to DATABASE_DIR
inside the container, creating a shared path between the host and the container.
ccrr -mode test
This will test the necessary environment required by the process with an accompanying small sample data, which may take up to half an hour.
nohup ccrr -mode default \
-normal [normal.bam] --normal-id [normal-id] \
-tumor [tumor.bam] --tumor-id [tumor-id] \
--genome-version hg38 -reference [hg38.fa] \
-threads 30 -g 200 >log 2>&1 &
This will run the entire process in default mode, allowing for up to 30 threads for possible multi-threaded tasks, and a memory cap of 200GB. Processing a pair of tumor/normal control BAM files, each sized around 106GB, approximately takes 80 hours.
ccrr --help
-prefix
option. Please clear it before starting a new task. -mode {fast,custom,default,test,clear} choose mode to run
Your input should be a pair of normal/tumor control whole-genome sequencing BAM files and their reference genome. Supported reference genome versions are hg19 and hg38.
-prefix task id
-normal NORMAL normal bam
--normal-id NORMAL_ID Identifier for the normal sample, typically from the BAM header
-tumor TUMOR tumor bam
--tumor-id TUMOR_ID Identifier for the tumor sample, typically from the BAM header
--genome-version {hg19,hg38} Set the reference, hg19 or hg38
-reference REFERENCE reference fq
If not set, the default memory allocation is 8GB, which may not suffice for the memory demands of certain steps. We recommend setting it higher.
Please note that the default number of threads is 8. Some software may not support multithreading acceleration, and for others, there might be a soft cap on the number of threads that can be effectively utilized, meaning that setting a higher number of threads may not result in the expected speed-up.
-threads THREADS Set the number of processes if possible
-g G set the amount of available RAM If possible
In custom mode, you can freely choose which software to use for generating SV and CN data. The built-in software includes:
-sequenza use sequenza for cellularity, ploidy and cn
-delly use delly for sv and cn
-manta use manta for sv
-svaba use svaba for sv
-gridss use gridss for sv
-lumpy use lumpy for sv
-soreca use soreca for sv
-sclust use sclust for cn
-purple use purple for cn
-cnvkit use cnvkit for cn
You can conveniently filter the results of each software based on quality before merging:
--manta-filter MANTA_FILTER Filter for manta
--delly-filter DELLY_FILTER Filter for delly sv
--delly-cnvsize DELLY_CNVSIZE min cnv size for delly
--svaba-filter SVABA_FILTER Filter for svaba
--gridss-filter GRIDSS_FILTER Filter for gridss
--lumpy-filter LUMPY_FILTER Filter for lumpy
When results from two different SV callers are adjacent in the genome and the distance between them is less than a specified threshold, they will be considered the same SV. The default threshold is 150bp.
--sv-threshold SV_THRESHOLD
Select the method for merging results from different SV callers. If you wish to retain only the results supported by all the SV callers used, choose intersection
; if you prefer to keep all results from all SV callers without duplicates, choose union
; if you want to customize to retain results supported by X or more software tools, select x-or-more
, and specify the number in --sv-x
. If X
is not specified, the default will be 3, meaning that results supported by two or more software tools will be retained.
--sv-merge-method {intersection,union,x-or-more}
Choose a sv merging method:
1. 'intersection': Merges only the SVs that are identified by all SV callers.
2.'union': Merges all SVs identified by any of the SV callers.
2. 'x-or-more': Merges SVs that are identified by at least x SV callers. if only one svcaller is prepared , then this parameter is irrelevant
--sv-x {1,2,3,4,5,6}
Specify the x. This argument is required when '--merge-method' is set to 'x-or-more'. Must be among the provided input files. default=3
If you wish to prioritize a specific SV caller, setting --sv-primary-caller
will retain all results outputted by it.
--sv-primary-caller {manta,delly,svaba,gridss,lumpy,soreca}
Specify the primary SV caller to keep all of its result.
Setting --cn-threshold
allows you to adjust the threshold, defining the maximum allowable distance for determining overlap among copy number change regions from different tools when merging copy number variant analysis results. The default threshold is 5000bp.
--cn-threshold CN_THRESHOLD
threshold for determining cn, defaults to 5000bp
-complex COMPLEX complex rearrangement analysis
--cellularity-ploidy-tool {sequenza,purple}
Complex rearrangement analysis is conducted by default. If you only wish to obtain merged results, you can use -complex False
.
You can also specify the tool used for cellularity and ploidy estimation with --cellularity-ploidy-tool, choosing between sequenza (default) and purple. This setting influences tools like JaBba and gGnome.
${WORK_DIR}/[task id]
will serve as the working directory, retaining the output results of each part. A summary of the complex rearrangement analysis can be found in ${WORK_DIR}/[task id]/complex/summary
.
Once a module is completed, it will be recorded in the ${WORK_DIR}/[task id]/history
file. If the process is unexpectedly interrupted, rerunning the entire process will skip the parts that have been successfully executed according to the records in the history file, resuming from the point of interruption.
Of course, you can manually modify this file to skip any steps you wish to bypass.
You can run each module step by step according to your analytical needs. For example:
svmerge.py
to Merge SV Data You can specify the output results from each SV caller as input files for merging.
-manta MANTA manta vcf result
-delly DELLY delly vcf result
-svaba SVABA svaba vcf result
-gridss GRIDSS GRIDSS vcf result
-lumpy LUMPY LUMPY vcf result
-soreca SORECA soreca result
Filter the output results of each SV caller based on quality.
--manta-filter MANTA_FILTER
Filter threshold for manta
--delly-filter DELLY_FILTER
Filter threshold for delly
--svaba-filter SVABA_FILTER
Filter threshold for svaba
--gridss-filter GRIDSS_FILTER
Filter threshold for GRIDSS
--lumpy-filter LUMPY_FILTER
Filter threshold for LUMPY
Determine thresholds, merging methods, and specify a trusted SV caller as described previously.
--threshold THRESHOLD
threshold for determination, defaults to 150bp
--merge-method {intersection,union,x-or-more}
Choose a merging method: 1. 'intersection': Merges only the SVs that are identified by all SV callers.
2. 'union': Merges all SVs identified by any of the SV callers.
3.'x-or-more': Merges SVs that are identified by at least x SV callers. if only one svcaller is prepared , then this parameter is irrelevant
--primary-caller {None,manta,delly,svaba,gridss,lumpy,soreca}
Specify the primary SV caller to keep all of its result.
-x {1,2,3,4,5,6} Specify the x. This argument is required when '--merge-method' is set to 'x-or-more'.
Must be among the provided input files.
Set the output path and enable multi-process execution.
-o O output path
-t T Set the number of processes
consensus_cn.py
to merge CN data. python ${SCRIPT_DIR}/consensus_cn.py \
-sclust SCLUST -delly DELLY -purple PURPLE -cnvkit CNVKIT \
-ref hg19 -gender male \
-o OUT
Parameters
-sclust SCLUST sclust cn result
-delly DELLY delly cn result
-purple PURPLE purple cn result
-cnvkit CNVKIT cnvkit cn result
-sequenza SEQUENZA sequenza cn result
--threshold THRESHOLD
threshold for determination, defaults to 5000bp
-o O output path
-ref REF hg19 or hg38
complex.py
to analyze complex rearrangements. python ${SCRIPT_DIR}/complex.py -prefix task_id \
--tumor-id example -sv SV -cn CN \
--genome-version hg19 -gender male \
-shatterseek -starfish -gGnome -SA -ctlpscanner \
-threads 30 -g 200
Required inputs, the format of SV and CN files as shown in the examples.
https://www.ccrr.life/static/examplefile/custom_sv.bed
https://www.ccrr.life/static/examplefile/custom_cn.bed
-prefix task id
--tumor-id TUMOR_ID
-sv SV sv input
-cn CN cn input
--genome-version GENOME_VERSION
Set the reference, hg19 or hg38
Select the tools for complex rearrangement analysis.
-shatterseek use shatterseek
-starfish use starfish
-gGnome use jabba and gGnome
-SA use Seismic Amplification
-ctlpscanner use CTLPscanner
The AmpliconArchitect requires an input of BAM files.
-AA use Amplicon Architect
-normal NORMAL normal bam
--normal-id NORMAL_ID
-tumor TUMOR tumor bam
--tumor-id TUMOR_ID
Set the available memory and number of threads.
-threads THREADS Set the number of processes if possible
-g G set the amount of available RAM If possible
{WORK_DIR}/{PREFIX}/complex/summary.png
A visual summary of CN, SV integration, and analysis results of various complex rearrangements generated by the CCRR workflow.
The tracks, from outer to inner, display:
Chromosomes:
Shows the start and end points of chromosomal regions and the centromeres.
CN:
Regional colors indicate copy number gains (red) or losses (green);
a black solid line represents a smoothed curve showing actual copy numbers,
with a straight black line representing the default normal copy number state (CN=2).
Shatterseek:
Highlights chromosomal shatter regions with high confidence (orange) and low confidence (yellow)
(criteria do not include statistical validation).
CTLPScanner:
Marks Chromothripsis-like Pattern areas, with region colors representing the log likelihood ratio (lg(LR) ≥ 5).
Seismic Amplification:
Indicates seismic amplification event areas (green).
Starfish:
Highlights complex genomic rearrangement areas (cyan).
gGnome:
Shows various complex event areas (details available in gGnome results).
AmpliconArchitect:
Marks ecDNA (blue), linear amplification (green), and BFB (yellow) areas
(not available on the web).
SV:
Indicates different types of structural variation:
Merge result:{WORK_DIR}/{PREFIX}/cnmerge/consensus_cn.bed
Segment count plot: {WORK_DIR}/{PREFIX}/cnmerge/segment_count.pdf
A bar plot showing the count of copy number segments across different length intervals.
Bias and volatility plot: {WORK_DIR}/{PREFIX}/cnmerge/Bias_and_volatility_for_CN_all_ranges.pdf
This figure shows the distribution of bias and volatility for each tool across different region lengths.
Bias reflects systematic deviation from the consensus copy number, while volatility captures the magnitude of variation.
Both are length-weighted and log-scaled to allow fair comparison across tools.
Merge result: {WORK_DIR}/{PREFIX}/svmerge/sv_merge.bed
SV caller consensus (Upset plot): {WORK_DIR}/{PREFIX}/svmerge/sv_merged.pdf
An Upset plot illustrating the overlap and consensus of structural variant calls among different SV tools.
The web services support multiple input options. You may begin with any of the following:
By clicking on "From tools," you can upload results from various structural variant analysis tools.
You have the option to directly input the corresponding files, with examples available for review by clicking on the respective 'example' links. These include:
Delly: delly_example.sv.somatic.pre.vcf
Manta: manta_example.somaticSV.vcf
Gridss: gridss_example.gripss.filtered.vcf
(processed with GRIPSS)
Lumpy: lumpy_example.gt.vcf
SvABA: svaba_example.somatic.sv.vcf
Soreca: soreca_example_unsnarl.txt
These sample files, derived from the public dataset SRR2020636, serve only as format references and hold no analytical significance.
Upload Options:
You can upload results from one to six different structural variant analysis tools. If only one file is uploaded, we will convert its format and proceed with the complex structural variant analysis. If two or more files are uploaded, they will first be merged.
Custom Data:
If you wish to use your own structural variant data, you can click on "From Custom" to upload your customized data.
The format for custom structural variant data should be as follows:
DEL
, DUP
, h2hINV
, t2tINV
, TRA
)Formatting Requirements:
\t
).The example data available via the Example link is sourced from the PCAWG consensus public structural variant data (source link), specifically from the dataset 0c0038ff-6cc4-b0b0-e050-11ac0d483d73
, which can be used for demonstration analyses. You can click the "Load Example" button to load the sample file.
By clicking on "From tools", you can upload copy number analysis results from various tools.
You have the option to directly input the corresponding files, with examples available for review by clicking on the respective "example" links. These include:
sequenza_example_segments.txt
sclust_example_iCN.seg
purple_example.cnv.somatic.tsv
delly_example.segmentation.bed
cnvkit_example_CNV_CALLS.bed
These sample files, derived from the public dataset SRR2020636, serve only as format references and hold no analytical significance.
Upload Options:
You can upload results from one to four different copy number variant analysis tools.
Custom Data:
If you wish to use your own copy number data, you can click on "From Custom" to upload your customized data.
The format for custom copy number data should be as follows:
Formatting Requirements:
\t
), without a header.The example data available via the Example link is sourced from the PCAWG consensus public copy number variant data (source link), specifically from the dataset 0c0038ff-6cc4-b0b0-e050-11ac0d483d73
, which can be used for demonstration analyses.
Click on "options" to expand the options card and customize parameters for the analysis. The parameters include:
Maximum Allowed Distance to Infer Identical SV Breakends Across Tools:
This is the maximum permissible distance for determining whether two breakpoints from different tools represent the same event when merging structural variant analysis results.
It should be an integer between 1–1000, with a default of 150.
Note: This parameter is irrelevant if you upload only one file or a custom file, as there will be no merging.
The Number of Structural Variation Callers Needed to Reach a Consensus on an SV Event:
Range of 1–6.
Maximum Allowed Distance to Infer Overlap of CN Change Regions Across Tools:
Specifies the maximum distance to consider copy number change regions from different tools as overlapping when merging CNV results. Must be 1-50000 (default: 5000). Higher values allow more relaxed merging, resulting in fewer final CNV segments. Ignored if only one file or a custom file is used.
Note: This parameter is also irrelevant if you upload only one file or a custom file.
Purity and Ploidy:
Estimated fraction of tumor cells in the sample and average total copy number across the genome. These parameters are used by gGnome to model copy number and junction balance. Leave blank to allow jaBbA to estimate it automatically.
Genome Version:
Reference human genome build used for coordinate mapping (hg19 / GRCh37 or hg38 / GRCh38).
Email (optional):
If you expect a long waiting time, you can leave your email address to receive a notification with the results page upon completion of the analysis.
Ensure that you have:
Then, click "start". You will see a waiting page indicating that your analysis is either in queue or in progress. Once the analysis is complete, you will be redirected to the results page.
This is an interactive web interface designed for exploring complex genomic rearrangement results through an intuitive Circos-based view.
The result page will automatically load the analysis results based on the files you uploaded.
If you wish to explore results interactively using your own data or view outputs from a local CCRR pipeline run, you can visit https://www.ccrr.life/customize-data, where you are allowed to upload your own .json
result file.
The control panel on the left side of the interface provides key functionalities:
chr:start-end
format. Multiple regions can be specified using semicolons.The central Circos plot offers a dynamic visualization of genome-wide CN and SV integration, with multiple inner and outer tracks showing different types of variation and complex events.
Interactive features:
Nearly all elements in the Circos plot are interactive, including:
When an element in the plot is clicked, a detailed popup appears on the right, showing:
This allows uses to nspect specific regions or events in-depth and trace their origin or biological rielrevance.
Users can personalize the visualization via the control panel:
This flexible interface supports efficient exploration of complex SV and CNV landscapes in tumor genomes or other rearrangement-rich datasets.
Create a Working Directory
mkdir ccrr1.2
cd ccrr1.2
Download CCRR
wget -O ccrr1.2.zip https://www.ccrr.life/download_file/ccrr1.2.zip
unzip -q ccrr1.2.zip
Download Dependencies and Create a Dockerfile
python install.py -sequenza -manta -delly -svaba -gridss -lumpy -soreca -purple -sclust -cnvkit -ref 'hg19&hg38'
Prepare licenses for Mosek and Gurobi
cp /path/to/gurobi.lic ./gurobi.lic
cp /path/to/mosek.lic ./mosek.lic
Prepare Input Data
We use data from the breast cancer cell line HCC1395/HCC1395BL, part of a multi-center study (DOI: 10.1186/s13059-022-02816-6). The BAM files were downloaded from: ftp://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/seqc/Somatic_Mutation_WG/data/WGS/
Files are placed at:
./share/hcc1395/WGS_FD_T_1.bam # Tumor
./share/hcc1395/WGS_FD_N_1.bam # Normal
./share/hcc1395/WGS_FD_T_1.bam.bai # Index
./share/hcc1395/WGS_FD_N_1.bam.bai # Index
The corresponding reference genome used for alignment is in
./share/data/ref/GRCh38.d1
Build Docker image
docker build --pull --rm --build-arg GITHUB_PAT=[GITHUB_PAT] --build-arg SCRIPT_DIR="/home/0.script" --build-arg TOOL_DIR="/home/1.tools" --build-arg DATABASE_DIR="/home/2.share" --build-arg WORK_DIR="/home/3.wd" -f Dockerfile -t ccrr:v1.2 .
Run a container
docker run -v $(pwd)/share:/home/2.share -v $(pwd)/wd:/home/3.wd -d -it --name ccrr ccrr:v1.2
docker exec -it ccrr /bin/bash
Here, we selected all tools except SoReCa for this analysis.
nohup ccrr -mode custom -prefix hcc1395 -normal /home/2.share/hcc1395/WGS_FD_N_1.bam --normal-id WGS_FD_N_1 -tumor /home/2.share/hcc1395/WGS_FD_T_1.bam --tumor-id WGS_FD_T_1 --genome-version hg38 -reference /home/2.share/data/ref/GRCh38.d1/GRCh38.d1.vd1.fa -cnvkit -delly -manta -lumpy -gridss -svaba -purple -sclust --cellularity-ploidy-tool sequenza -threads 30 -g 200 >log 2>&1 &
The analysis completes in about 100 hours.
In /home/3.wd/hcc1395/svmerge
, you can find the SV results from individual tools, the merged SV calls sv_merged.bed
, and an UpSet plot sv_merged.pdf
illustrating the overlaps among different SV datasets.
In /home/3.wd/hcc1395/cnmerge
, you will find the CN analysis results from individual tools, as well as the merged consensus result:consensus_cn.bed
segment_count.pdf
, counts of CNV segments by length
Bias_and_volatility_for_CN_all_ranges.pdf
: shows bias (deviation from consensus) and volatility (variation across tools) across segment sizes.
In /home/3.wd/hcc1395/complex
, you will find the results from six complex rearrangement analysis tools, a summary figure summary.png
and a JSON file hcc1395circos.json
for web-based visualization.
To explore the results interactively, go to https://www.ccrr.life/ and click "Customize Data" in the top-right menu to access the custom upload page.
On the Customize Data page, click "Choose a file" to select hcc1395circos.json, then click "Upload & Render". The Circos plot will be rendered after a short loading period.
Clicking on any element in the plot reveals detailed annotations and associated information.
To focus on regions where multiple tools show consensus, enter the coordinates chr3:57825870-130091239;chr6:10307610-122158017
into the "Genome Region" field in the control panel, then click "Add". This will zoom in and display a more detailed and clearer view of the selected regions.
To save the current visualization, click "Export SVG" to export it as a scalable vector graphic. If needed, click "Reset" to clear custom regions and revert the view to its default state.
We uploaded the locally generated results from Delly, Manta, Gridss, Lumpy, Purple, and CNVkit.
After uploading the files, select the reference genome as hg38, then click "Start" to begin the analysis.
The system will redirect to a waiting page. After approximately 20 minutes, it will automatically jump to the results page.