report_2018_07_31.md 7.2 KB

Authors: Caleb Fangmeier Date: July 31, 2018

Pileup Studies

Info added to Ntuple

In an effort to understand the effects of additional particles from pileup interactions on the performance of the electron seeding algorithm, both in terms of purity/efficiency as well as fake rate, I modified the ntuple to include pileup information. I added the following fields:

float truePileup;
int inTimePileup;
vector<int> *outOfTimePileup;

It is important to understand what these numbers mean. This data is coming from Monte-Carlo simulation, and therefore the number of pileup interactions is modeled to closely match the observed or expected number that exist in a certain era of detector operation. The model includes a number which represents the expectation value for the number of pileup interactions at a certain time. This is the truePileup. One can imagine that this number changes slowly during a fill, starting high, but consistently dropping as the proton bunches are depleted.

However, the actual number of pileup interactions from bunch-crossing to bunch-crossing can vary wildly according to a Poisson distribution whose mean is truePileup. The actual number of pileup events that occurred in the same bunch-crossing as the hard-scatter is referred to as inTimePileup.

Finally, because of the finite time resolution of the detector, results of previous (and even subsequent) bunch-crossings can be read out along with those of the bunch-crossing of interest. Therefore, the outOfTimePileup contains the number of pileup interactions for the 12 previous, and the 3 next bunch-crossings.

Initial results

Having added this info the to the Ntuple, I made distributions of the GSF-Tracking purity, efficiency, and fake-rate as a function of the inTimePileup, seen below.

fig::tracking_pur_dR|Tracking Purity using $\Delta R$ Truth matching
fig::tracking_eff_dR|Tracking Efficiency using $\Delta R$ Truth matching

It was surprising to me to see that there is virtually zero dependence on the number of PU events for either efficiency or purity. I thought there might be a bug somewhere so I did a few checks.

Check 1 - Numerator and Denominator Distributions

fig::tracking_pur_dR_num|Tracking Efficiency Numerator
fig::tracking_pur_dR_num|Tracking Purity Numerator
fig::tracking_pur_dR_den|Tracking Efficiency Denominator
fig::tracking_pur_dR_den|Tracking Purity Denominator

Looking at the individual distributions, the shape tracks the overall pileup distribution, with, as expected, a roughly constant ratio between the numerator and denominator. I don't see anything else of interest here.

Check 2 - Number of seeds/tracks vs. Pileup

Next, I wanted to check if there was any relation between the number of in-time pileup events and:

  1. # Electron Seeds, both tracker and ECAL driven
  2. # Electron Seeds, only ECAL driven with HOE<0.15
  3. # GSF tracks
  4. # Sim-Vertices lacking any source tracks, the idea being that these are individual primary vertices

The following plots show these relations for the default working point of the new seeding in the $t\bar{t}$ sample.

fig::tt_new-default_n_seeds_v_PU|# of Electron Seeds vs. inTimePileup
fig::tt_new-default_n_good_seeds_v_PU|# of ECAL-Driven Electron Seeds vs. inTimePileup
fig::tt_new-default_n_tracks_v_PU|# of GSF Tracks vs. inTimePileup
fig::tt_new-default_n_pv_v_PU|# of Sim-Vertices lacking parent tracks vs. inTimePileup

Looking at the distributions, there is a pretty clear lack of correlation between any of the chosen variables and the number of in-time pileup events. However, there is an interesting feature in the top right figure. There appear to be peaks in the "# Seeds" distribution at 0, 4, and to a lesser extent 7 and 12. I couldn't think of any clear reason for this. Naively, I would expect a monotonically decreasing distribution, and certainly not something with multiple localized peaks.

Check 3 - Investigating source of peaks

For comparison, here are the distributions for the old seeding algorithm:

fig::tt_old-default_n_seeds_v_PU|# of Electron Seeds vs. inTimePileup
fig::tt_old-default_n_good_seeds_v_PU|# of ECAL-Driven Electron Seeds vs. inTimePileup
fig::tt_old-default_n_tracks_v_PU|# of GSF Tracks vs. inTimePileup
fig::tt_old-default_n_pv_v_PU|# of Sim-Vertices lacking parent tracks vs. inTimePileup

Things look similar. This shows that the peaks in the previous plots are not an artifact of the new seeding algorithm. In fact, the strips seem to appear (although less sharp) even in the top left plot showing all electron seeds.

Finally, here are the plots for the Drell-Yan sample using the default working point of the new seeding. Similar observations as above...

fig::dy_new-default_n_seeds_v_PU|# of Electron Seeds vs. inTimePileup
fig::dy_new-default_n_good_seeds_v_PU|# of ECAL-Driven Electron Seeds vs. inTimePileup
fig::dy_new-default_n_tracks_v_PU|# of GSF Tracks vs. inTimePileup
fig::dy_new-default_n_pv_v_PU|# of Sim-Vertices lacking parent tracks vs. inTimePileup

The next idea I had that could be the culprit of the "discreetness" is the number of superclusters so I made plots of the number of superclusters vs the number of seeds. The below plots use the default working point of the new seeding. The top row used Drell-Yan MC, and the bottom row uses $t\bar{t}$.

  • Left: # of electron seeds vs # of superclusters
  • Right: Same, but requiring the seeds to be ECAL-Driven with the supercluster having HOE<0.15
fig::dy_new-default_n_seeds_v_n_scl|# of Electron Seeds vs. # of superclusters
fig::dy_new-default_n_good_seeds_v_n_good_scl|# of ECAL-Driven Electron Seeds vs. # of superclusters with HOE<0.15
fig::tt_new-default_n_seeds_v_n_scl|# of Electron Seeds vs. # of superclusters
fig::tt_new-default_n_good_seeds_v_n_good_scl|# of ECAL-Driven Electron Seeds vs. # of superclusters with HOE<0.15

Interestingly, it appears that for $t\bar{t}$, the discreetness is hidden in the plot without the ECAL-Driven+HOE<0.15 requirement (left), but appears clearly in the one with it (right). On the other hand, the pattern appears similarly in both plots for Drell-Yan.

So right now, I still don't have a clear answer for the cause of the discreetness, but I think it is important to understand the source since it could be from a bug somewhere.