Multivariate W-jet Tagging

Authors : Yanou Cui, Zhenyu Han and Matthew Schwartz
Reference: arXiv:1012.2077

This package combines variables characterizing jet substructure to tag highly boosted hadronically decaying W bosons against their QCD backgrounds.

The code is written in C++ and tested with gcc. It can be downloaded here (wtag-1.00.tar.gz). The following programs are needed as prerequisites.


In addition, Pythia 8 is needed to run the included examples.

Quick start

Download the code, untar with

tar -xvzf wtag-1.00.tar.gz

Two examples are given in the directory examples/, where test_wtag does the W-jet tagging and print_mvars outputs the variables to ASCII files. To compile and run the examples, one needs to

(1) Go to the examples/ directory, and modify the Makefile there to direct the compiler to the correct PATHs for the prerequisite programs (libraries).

(2) The examples require Pythia 8. It is requried to set PYTHIA8DATA when running the code outside the Pythia 8 directory. For example:

export PYTHIA8DATA=~/work/programs/pythia8142/xmldoc

(3) do


Both examples take W-jets from WW->lvqq samples and background QCD-jets from Wj->lvj samples. If running successfully, the first example prints out signal and background tagging efficiencies to the file test_wtag.log. The second example outputs the variables to files, mvars_ww.dat and mvars_wj.dat, for signal and background jets respectively.


The code is given as subroutines for calculating the jet variables and doing the W-jet tagging, which is located at src/, and described below. One should write his/her own main() function to call them. This is shown in the examples located in examples/.

W-jet tagging. The function wtag() takes a jet found with FastJet as input and return whether it is tagged as a W-jet. Note that the code utilizes weight files from Boosted Decision Trees, which are trained using jets found with Cambridge/Aachen algorithm (jet size R=1.2). Therefore, the input jets must be from the same algorithm and jet size, unless the user wants to train the data himself/herself (see below).

bool wtag(ClusterSequence & clustSeq, PseudoJet &jet, double signal_eff = -1, jet_pars *jetpars_ptr = NULL)

Return value: true = tagged as W-jet, false = tagged as non-W jet
clustSeq: the ClusterSequence the jet belongs to
jet: jet to be tagged
signal_eff: signal efficiency, default = efficiency maximizing S/sqrt(B)
jetpars_ptr: pointer to jet parameters, default = NULL, using default parameters

By default, W-jet tagging is performed using preselected signal efficiency corresponding to the maximum significance S/sqrt(B). One can choose another signal efficiency (limited to be <=50%). An example is given at examples/

Calculating the variables. One may want to examine and train the data, for example, from other processes or using different parameters. To do so, one needs to calculate the variables and write them to files suitable for TMVA training. The jet variables are stored in class jet_mvars. To obtain the variables, first define jet parameters

jet_pars jetpars;

jet_pars.read_jetpars() reads the filtering/pruning/trimming parameters from files. By default, they are located in the "data/" directory. To change the parameters, one could modify the files or point the program to different files (uncomment the statements above).

Then one can use jet_mvars::get_mvars(ClusterSequence &, PseudoJet &, jet_pars &) to calculate the variables. For example,

jet_mvars jetmvars;
jetmvars.get_mvars(clustSeq, jet, jetpars);

The variables can be written to a file using jet_mvars::foutput(ofstream & file) where file should be already open.

An example is given at examples/

Last updated: Dec.10, 2010
Contact : zhan AT

Update notes

Dec. 10, 2010: Initial version, v1.00