Filtering step for the ContextSV long-read structural variant (SV) caller, utilizing a Random Forest model trained on SV validation features. Assign confidence scores to SV datasets based on coverage, genomic context, and other important alignment features, then filter low-confidence SVs to increase the precision of the final callset. Genomic context is determined from annotations using ANNOVAR and UCSC databases.
conda install -c wglab -c bioconda -c conda-forge contextscore
# Or using mamba (faster dependency resolution):
mamba install -c wglab contextscoreANNOVAR is required for annotations and must be installed separately.
These are the required ANNOVAR components for ContextScore:
--annovar: directory containingannotate_variation.plandtable_annovar.pl--annovar-db: ANNOVAR database directory
contextscore --input input.vcf --output scored.vcf --sample-coverage 30 --buildver {hg38,hg19} --threshold 0.2 \
--annovar /path/to/annovar --annovar-db /path/to/humandb| File | Source | Description | Link |
|---|---|---|---|
cytobands_hg{19,38}.txt |
UCSC Genome Browser | Cytoband annotations for human genome builds hg19 and hg38 | UCSC hg19 / UCSC hg38 |
hg{19,38}_segmental_duplications.bed |
UCSC Genome Browser | Segmental duplication annotations for human genome builds hg19 and hg38 | UCSC hg19 / UCSC hg38 |
phastcons100way_hg{19,38}.bed |
UCSC Genome Browser | PhastCons conservation scores for human genome builds hg19 and hg38 | UCSC hg19 / UCSC hg38 |
simple_repeats_hg{19,38}.bed |
UCSC Genome Browser | Simple repeat annotations for human genome builds hg19 and hg38 | UCSC hg19 / UCSC hg38 |
fragile_sites_hg38.bed / fragile_sites_hg19_liftover.bed |
HumCFS | Fragile site annotations for human genome builds hg38 and hg19 (liftover) | HumCFS |