Authors: Caleb Fangmeier Date: July 31, 2018
In an effort to understand the effects of additional particles from pileup interactions on the performance of the electron seeding algorithm, both in terms of purity/efficiency as well as fake rate, I modified the ntuple to include pileup information. I added the following fields:
float truePileup;
int inTimePileup;
vector<int> *outOfTimePileup;
It is important to understand what these numbers mean. This data is coming from Monte-Carlo simulation, and therefore the number of pileup interactions is modeled to closely match the observed or expected number that exist in a certain era of detector operation. The model includes a number which represents the expectation value for the number of pileup interactions at a certain time. This is the truePileup
. One can imagine that this number changes slowly during a fill, starting high, but consistently dropping as the proton bunches are depleted.
However, the actual number of pileup interactions from bunch-crossing to bunch-crossing can vary wildly according to a Poisson distribution whose mean is truePileup
. The actual number of pileup events that occurred in the same bunch-crossing as the hard-scatter is referred to as inTimePileup
.
Finally, because of the finite time resolution of the detector, results of previous (and even subsequent) bunch-crossings can be read out along with those of the bunch-crossing of interest. Therefore, the outOfTimePileup
contains the number of pileup interactions for the 12 previous, and the 3 next bunch-crossings.
Having added this info the to the Ntuple, I made distributions of the GSF-Tracking purity, efficiency, and fake-rate as a function of the inTimePileup
, seen below.
It was surprising to me to see that there is virtually zero dependence on the number of PU events for either efficiency or purity. I thought there might be a bug somewhere so I did a few checks.
Looking at the individual distributions, the shape tracks the overall pileup distribution, with, as expected, a roughly constant ratio between the numerator and denominator. I don't see anything else of interest here.
Next, I wanted to check if there was any relation between the number of in-time pileup events and:
The following plots show these relations for the default
working point of the new seeding in the $t\bar{t}$ sample.
Looking at the distributions, there is a pretty clear lack of correlation between any of the chosen variables and the number of in-time pileup events. However, there is an interesting feature in the top right figure. There appear to be peaks in the "# Seeds" distribution at 0, 4, and to a lesser extent 7 and 12. I couldn't think of any clear reason for this. Naively, I would expect a monotonically decreasing distribution, and certainly not something with multiple localized peaks.
For comparison, here are the distributions for the old seeding algorithm:
Things look similar. This shows that the peaks in the previous plots are not an artifact of the new seeding algorithm. In fact, the strips seem to appear (although less sharp) even in the top left plot showing all electron seeds.
Finally, here are the plots for the Drell-Yan sample using the default
working point of the new
seeding. Similar observations as above...
The next idea I had that could be the culprit of the "discreetness" is the number of superclusters so I made plots of the number of superclusters vs the number of seeds. The below plots use the default
working point of the new
seeding. The top row used Drell-Yan MC, and the bottom row uses $t\bar{t}$.
Interestingly, it appears that for $t\bar{t}$, the discreetness is hidden in the plot without the ECAL-Driven+HOE<0.15 requirement (left), but appears clearly in the one with it (right). On the other hand, the pattern appears similarly in both plots for Drell-Yan.
So right now, I still don't have a clear answer for the cause of the discreetness, but I think it is important to understand the source since it could be from a bug somewhere.