Usage¶
Read the sections below to run TRIBES on your own data, with a custom pipeline
Input data¶
TRIBES requires the following input files:
filename.vcf.gz- multi-sample VCF file containing sample genotypesfilename.true.rel- true pairwise relations (optional, only if a user has known relations and wants to calculate accuracy of estimated relationships)config.yaml- pipeline configuration file defining the location and name of reference data, the true relations file, the input filename and the preprocessing steps required before IBD/relatedness estimation
Refer to files inside example dataset TFCeu/ directory for correct
format for these input files.
Preparing a custom pipeline¶
A key strength of TRIBES is that is a flexible pipeline, utilizing
snakemake, to enable the user to specify which pre-processing steps
they want to include.
The following steps can used in the pipeline.
Preprocessing:¶
NM: retain only loci with with non-missing genotypesBiSnp: retain only bi-allelic SNPsBiSnpNM: combinesBiSnpandNMin a single stepMAF@<maf-threshold>: filters forMAF >= maf-threshold, e.g.MAF@0.01. MAF is determined from the reference dataAFannotation which is also added to the output inREF_AFannotation.LD: prune on LD with the reference defined inG1K_SNP_EUR(bcftools +prune -l 0.95 -w 1kb)QC: filter on quality (with bcftools:INFO/MQ>59 & INFO/MQRankSum>-2 & AVG(FORMAT/DP)>20 & AVG(FORMAT/DP)<100 & INFO/QD>15 & INFO/BaseQRankSum>-2 & INFO/SOR<1)PH: phase (using beagle) without referenceRPH: phase (using beagle) with reference defined inref_sampleconfig parameter
Examples¶
Example 1¶
For example, a user may wish to identify relationships using an unphased
input VCF. They wish to filter on allele frequency of MAF = 0.01 and
then phase the data using reference file and estimate relatedness. They
would then need to edit the config.yaml file from the example data
TFCeu directory to reflect their input VCF filename and processing
steps. Their input VCF file should be in the same TFCeu directory,
for the config.yaml file to work.
Their config.yaml file would look like this:
- rel_sample:
filename_BiSnpNM_MAF@0.01_RPH[wherefilenamerefers to the input VCF filename] - ref_dir:
../REF_G1K-EUR_0.001[where ref_dir is the location of the reference directory, which hosts the cohost used for filtering on MAF and LD, phasing and masking steps]
The user would then run TRIBES from the installation directory as in the Getting started section
./tribes -d $HOME/tribes-data/TFCeu -j <no_cpu_cores> estimate_degree
whereestimate_degree is an alias which calls TRIBES to perform
the GRM, FPI and IBD steps described under ‘IBD/Relatedness
steps’ in Preparing a custom pipeline
Example 2¶
Alternatively, a user may want to identify novel relationship, as well as confirm known relationships. They wish to pre-process the VCF to filter on MAF = 0.01 and quality metrics, then phase the data using reference, estimate relationships and compare estimated with known relationships.
Their config.yaml file would look like this:
- rel_sample:
filename_BiSnpNM_MAF@0.01_QC_RPH - ref_dir:
../REF_G1K-EUR_0.001 - rel_true:
filename.true.rel[a reference file containing known relationships,required if stepRVTis used in the pipeline]
The user would then run TRIBES from the installation directory as in the Getting started section
./tribes -d $HOME/tribes-data/TFCeu -j <no_cpu_cores> estimate_degree_vs_true
If users provide a rel_true: file in the config_yaml file, they
can call estimate_degree_vs_true which is an alias that calls
TRIBES to perform the GRM, FPI, IBD and RVT steps
described under ‘IBD/Relatedness steps’ in Preparing a custom pipeline