Characterizing Signal Loss in the 21 cm Reionization Power Spectrum: A Revised Study of PAPER-64

(1)

2020-10-14T09:45:00Z

Acceptance in OA@INAF

Characterizing Signal Loss in the 21 cm Reionization Power Spectrum: A Revised

Study of PAPER-64

Title

Cheng, Carina; Parsons, Aaron R.; Kolopanis, Matthew; Jacobs, Daniel C.; Liu,

Adrian; et al.

Authors

10.3847/1538-4357/aae833

DOI

http://hdl.handle.net/20.500.12386/27791

Handle

THE ASTROPHYSICAL JOURNAL

Journal

868

(2)

Characterizing Signal Loss in the 21 cm Reionization Power Spectrum:

A Revised Study of PAPER-64

Carina Cheng1, Aaron R. Parsons1,2, Matthew Kolopanis3, Daniel C. Jacobs3 , Adrian Liu1,4,15 , Saul A. Kohn5 , James E. Aguirre5 , Jonathan C. Pober6, Zaki S. Ali1, Gianni Bernardi7,8,9 , Richard F. Bradley10,11,12, Chris L. Carilli13,14 ,

David R. DeBoer2, Matthew R. Dexter2, Joshua S. Dillon1,16 , Pat Klima11, David H. E. MacMahon2, David F. Moore5, Chuneeta D. Nunhokee8 , William P. Walbrugh7, and Andre Walker7

1_{Astronomy Department, University of California, Berkeley, CA, USA;}_{ccheng@berkeley.edu} 2

Radio Astronomy Laboratory, University of California, Berkeley, CA, USA 3

School of Earth and Space Exploration, Arizona State University, Tempe, AZ, USA 4

Berkeley Center for Cosmological Physics, Berkeley, CA, USA 5

Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA, USA 6

Department of Physics, Brown University, Providence, RI, USA 7

Square Kilometre Array, Cape Town, South Africa 8

Department of Physics and Electronics, Rhodes University, South Africa 9

INAF–Instituto di Radioastronomia, Bologna, Italy 10

Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA, USA 11

National Radio Astronomy Observatory, Charlottesville, VA, USA 12_{Department of Astronomy, University of Virginia, Charlottesville, VA, USA}

13

National Radio Astronomy Observatory, Socorro, NM, USA 14

Cavendish Laboratory, Cambridge, UK

Received 2018 June 22; revised 2018 October 9; accepted 2018 October 10; published 2018 November 15

Abstract

The Epoch of Reionization(EoR) is an uncharted era in our universe’s history during which the birth of the first stars and galaxies led to the ionization of neutral hydrogen in the intergalactic medium. There are many experiments investigating the EoR by tracing the 21 cm line of neutral hydrogen. Because this signal is very faint and difficult to isolate, it is crucial to develop analysis techniques that maximize sensitivity and suppress contaminants in data. It is also imperative to understand the trade-offs between different analysis methods and their effects on power spectrum estimates. Specifically, with a statistical power spectrum detection in HERA’s foreseeable future, it has become increasingly important to understand how certain analysis choices can lead to the loss of the EoR signal. In this paper, we focus on signal loss associated with power spectrum estimation. We describe the origin of this loss using both toy models and data taken by the 64-element configuration of the Donald C. Backer Precision Array for Probing the Epoch of Reionization (PAPER). In particular, we highlight how detailed investigations of signal loss have led to a revised, higher 21 cm power spectrum upper limit from PAPER-64. Additionally, we summarize errors associated with power spectrum error estimation that were previously unaccounted for. We focus on a subset of PAPER-64 data in this paper; revised power spectrum limits from the PAPER experiment are presented in a forthcoming paper by Kolopanis et al. and supersede results from previously published PAPER analyses.

Key words: dark ages, reionization,ﬁrst stars – early universe – large-scale structure of universe – methods: data analysis– methods: statistical – techniques: interferometric

1. Introduction

By about one billion years after the Big Bang (z ∼ 6), the first stars and galaxies are thought to have ionized all the neutral hydrogen that dominated the baryonic matter content in the universe. This transition period, during which the first luminous structures formed from gravitational collapse and began to emit intense radiation that ionized the cold neutral gas into a plasma, is known as the Epoch of Reionization (EoR). The EoR is a relatively unexplored era in our universe’s history that spans the birth of the first stars to the full reionization of the intergalactic medium(IGM). This epoch encodes important information regarding the nature of the first galaxies and the processes of structure formation. Direct measurements of the EoR would unlock powerful characteristics about the IGM, revealing connections between the matter distribution exhibited

via cosmic microwave background (CMB) studies and the highly structured web of galaxies we observe today (for a review, see Barkana & Loeb2001; Furlanetto et al.2006; Loeb & Furlanetto2013).

One promising technique to probe the EoR is to target the 21 cm wavelength signal that is emitted and absorbed by neutral hydrogen via its spin-ﬂip transition (Furlanetto et al. 2006; Barkana & Loeb2008; Morales & Wyithe 2010; Pritchard & Loeb 2010, 2012). This technique is powerful

because it can be observed both spatially and as a function of redshift—that is, the wavelength of the signal reaching our telescopes can be directly mapped to a distance from where the emission originated before stretching out as it traveled through expanding space. Hence, 21 cm tomography offers a unique window into both the spatial and temporal evolution of ionization, temperature, and densityﬂuctuations.

In addition to the ﬁrst tentative detection of our Cosmic Dawn(pre-reionization era) made by the Experiment to Detect the Global EoR Signature(EDGES; Bowman & Rogers2010;

15

Hubble Fellow. 16

(3)

Bowman et al. 2018), there are several radio telescope

experiments that have succeeded in using the 21 cm signal from hydrogen to place constraints on the brightness of the signal. Examples of experiments investigating the mean brightness temperature of the 21 cm signal relative to the CMB are the Large Aperture Experiment to Detect the Dark Ages (LEDA; Bernardi et al. 2016), the Dark Ages Radio

Explorer (DARE; Burns et al.2012), the Sonda Cosmológica

de las Islas para la Detección de Hidrógeno NeutroSciHi (SCI-HI; Voytek et al. 2014), the Broadband Instrument

for Global HydrOgen ReioNisation Signal (BIGHORNS; Sokolowski et al.2015), and the Shaped Antenna measurement

of the background RAdio Spectrum (SARAS; Patra et al.

2015). Radio interferometers that seek to measure statistical

power spectra include the Giant Meter-wave Radio Telescope (GMRT; Paciga et al. 2013), the LOw Frequency ARray

(LOFAR; van Haarlem et al. 2013), the Murchison Wideﬁeld

Array (MWA; Tingay et al. 2013), the 21 Centimeter Array

(21CMA; Pen et al. 2004; Wu 2009), the Square Kilometer

Array(SKA; Koopmans et al.2015), and the Donald C. Backer

Precision Array for Probing the Epoch of Reionization (PAPER; Parsons et al. 2010). The Hydrogen Epoch of

Reionization Array(HERA), which is currently being built, is a next-generation instrument that aims to combine lessons learned from previous experiments and has a forecasted sensitivity capable of a high-signiﬁcance power spectrum detection with an eventual 350 elements using current analysis techniques (Pober et al.2014; Dillon & Parsons2016; Liu & Parsons 2016; DeBoer et al.2017).

The major challenge that faces all 21 cm experiments is isolating a small signal that is buried underneath foregrounds and instrumental systematics that are, when combined, four to ﬁve orders of magnitude brighter (e.g., Santos et al.2005; Ali et al. 2008; de Oliveira-Costa et al. 2008; Jelić et al. 2008; Bernardi et al. 2009, 2010, 2013; Ghosh et al. 2011; Pober et al. 2013a; Dillon et al. 2014; Kohn et al. 2016). A clean

measurement therefore requires an intimate understanding of how data analysis choices, which are often tailored to maximize sensitivity and minimize contaminants, affect power spectrum results. More specifically, it is imperative to develop techniques that ensure the accurate extraction and recovery of the EoR signal, despite the analysis method chosen and how much loss (of both contaminants and the EoR signal) accompanies the method. In this paper, we specifically discuss signal loss—the loss of the cosmological signal—associated with power spectrum estimation. This is an issue that is essential to investigate for a robust 21 cm power spectrum analysis and one that has motivated a revised PAPER analysis. Wefirst approach this topic from a broad perspective and then perform a detailed case study using data from the 64-element configuration of PAPER. In this study, we use a subset of PAPER-64 data to illustrate our revised analysis methods, while a related paper, M. Kolopanis et al. (2018, in preparation), builds off of the methods in this paper to present revised PAPER-64 results for multiple redshifts and baseline types.

Finally, we also highlight several additional errors made in previous PAPER analyses, including those related to boot-strapping and error estimation. This paper accompanies the erratum of Ali et al.(2018) and adds to the growing foundation

of lessons that have been documented in, for example, Paciga et al.(2013), Patil et al. (2016), and Jacobs et al. (2016) by the

GMRT, LOFAR, and MWA projects, respectively. These

lessons are imperative as the community as a whole moves toward higher sensitivities and potential EoR detections.

This paper is organized into four main sections. In Section2, we use toy models to develop intuition about signal loss and its origin and subtleties. In Section 3, we present a case study using data from the PAPER-64 array, highlighting key changes from the signal loss methods used in the published result in Ali et al. (2015, hereafter A15), which previously underestimated

loss. In Section 4, we summarize additional lessons learned since A15 that have shaped our revised analysis. Finally, we conclude in Section5.

2. Signal Loss Toy Models

Signal loss refers to attenuation of the target cosmological signal in a power spectrum estimate. Certain analysis techniques can cause this loss, and if the amount of loss is not quantiﬁed accurately, it can lead to false nondetections and overly aggressive upper limits. Determining whether an analysis pipeline is lossy and, if so, estimating the amount of loss has subtle challenges but is necessary to ensure the accuracy of any result.

One type of signal loss can occur when weighting data by itself. Broadly speaking, a data set can be weighted to emphasize certain features and minimize others. For example, oneﬂavor of weighting employed by previous PAPER analyses is inverse covariance weighting in frequency, which is a generalized version of inverse variance weighting that also takes into account frequency correlations (Liu & Tegmark

2011; Dillon et al.2013,2014,2015; Liu et al.2014a,2014b).

Using such a technique enables the down-weighting of contaminant modes that obey a different covariance structure from that of cosmological modes. However, a challenge of inverse covariance weighting is in estimating a covariance matrix that is closest to the true covariance of the data; the discrepancy between the two, as we will see, can have large impacts on signal loss. In this paper, we focus speciﬁcally on loss associated with the use of an empirically estimated covariance matrix with the“optimal quadratic estimator (QE)” formalism. This loss was signiﬁcantly underestimated in theA15analysis and is the main reason motivating a revised power spectrum result.

2.1. The QE Method

We begin with an overview of the QE formalism used for power spectrum estimation. The goal of power spectrum analysis is to produce an unbiased estimator of the EoR power spectrum in the presence of both noise and foreground emission. Prior to power spectrum estimation, the data will often have been prepared to have minimal foregrounds by some method of subtraction, so this foreground emission may appear either directly(because it was not subtracted) or as a residual of some subtraction process not in the power spectrum domain. If an accurate estimate of the total covariance of the data is known, including both the desired signal and any contaminants, then the “optimal QE” formalism provides a method of producing a minimum-variance, unbiased estimator of the desired signal, as shown in Liu & Tegmark(2011), Trott et al.

(2012), Dillon et al. (2013, 2014, 2015), Liu et al. (2014a,

2014b,2016), and Switzer et al. (2015).

Suppose that the measured visibilities for a single baseline in Jy are arranged as a data vector, x. It has length NtNf, where Nt

(4)

is the number of time integrations and Nf is the number of

frequency channels. The covariance of the data is given by

Cº áxx†ñ =S+U, ( )1

where the average over an ensemble of data realizations produces the true covariance, and we further assume that it may be written as the sum of the desired cosmological signal S and other terms U.

We are interested in estimating the three-dimensional power spectrum of the EoR. Visibilities are measurements of the Fourier transform of the sky along two spatial dimensions (using the ﬂat-sky approximation), and since we are interested in three-dimensional Fourier modes, we only need to take one Fourier transform of our visibilities along the line-of-sight dimension. We consider band powers Pα of the power spectrum of x over some range in cosmological k, where α indexes a waveband in k_P(a cosmological wavenumber k_Pis the Fourier dual to frequency under the delay approximation (Parsons et al.2012b), which is a good approximation for the

short baselines that PAPER analyzes). The fundamental dependence of the covariance on the power spectrum band powers Pαis encoded as S P C Q P P , 2

å

= ¶ ¶ º a a a a a a _{( )} where we deﬁne C Q P º a ¶

¶ a . In other words, Q describes the

response of the covariance to a change in the power spectrum, relating a quadratic statistic of the data (the covariance) to a quadratic statistic in Fourier space (the power spectrum).

The optimal QE prescription is then to compute F Pa₌

å

1 q _-b , 3 b ab b b - _   ₍ ₎ ₍ ₎ _{( )}

where F is the Fisher matrix (which determines errors on the power spectrum estimate)

C Q C Q F 1 2tr , 4 1 1 º ab ₍ - a - b₎ _{( )}

q is the unnormalized power spectrum estimate

x C Q C x q 1 2 , 5 1 1 = a - a - † ( )

and b is the additive bias

UC Q C b 1 2tr . 6 1 1 = a _- _a - ₍ ₎ _{( )}

The power spectrum estimator in Equation(3) is the

minimum-variance (smallest error bar) estimate of the power spectrum subject to the constraint that it is also unbiased; that is, the ensemble average of the estimator is equal to its true value,

P P 7

áañ = a _{( )}

(Tegmark 1997; Bond et al. 1998).

Intuitively, the estimator must be capable of“suppressing” or “removing” the effects of contaminants in order to obtain an unbiased estimate of the power spectrum. By construction, the subtraction of the residual foreground and noise bias accomplishes this, removing any additive bias. However, the C−1 _{piece of Equation} ₍₅_{) also has the effect of suppressing}

residual foregrounds and noise in both the additive bias and any contributions the residuals may have to the variance.

More speciﬁcally, the effect of the weighting in Equation (5)

is to project out the modes of U with a different covariance structure than S in the power spectrum estimate, and the effect of Equation(6) is to subtract out the remaining bias. Similar

effects for a realistic model of the EoR and foregrounds are shown in Liu & Tegmark(2011).

If the covariance structure of the contaminants is sufﬁciently different from the desired power spectrum, then the linear bias term may be expected to be quite small, and it is only necessary to know C andQα, not U. Since the foregrounds are expected to be strongly correlated between frequencies, whereas the EoR is not, we expect different covariance structures and therefore a small linear bias. Moreover, because the linear bias is always positive and there is no multiplicative bias, the quadratic-only term will always produce an estimate that is high relative to the true value and can conservatively be interpreted as an upper limit. These considerations, and the difﬁculty of obtaining an estimate for U, motivate the neglect of the linear bias in the rest of this analysis.

Motivated by the desire to retain the advantageous behavior of suppressing contributions of U to estimates of the EoR power spectrum, we note that is possible to deﬁne a modiﬁed version of the QE where Equation(5) is replaced by

x RQ Rx q 1 2 , 8 = a a  † ( )

whereR is a weighting matrix chosen by the data analyst. For example, inverse covariance weighting (the optimal form of QE) would set R≡ C−1 and a uniform-weighted case would useR≡I, the identity matrix. Again, the matrix Qα encodes the dependence of the covariance on the power spectrum but, in practice, also does other things, including implementing a transform of the frequency domain visibilities to k-space, taking into account cosmological scalings, and converting the visibilities from Jy to K.

With an appropriate normalization matrixM, the quantity

P=Mq ( )9

is a sensible estimate of the true power spectrumP.

To ensure thatM correctly normalizes our power spectrum, one may take the expectation value of Equation(9) to obtain

RQ RQ URQ R URQ R P M P W P 1 2 tr 1 2 tr 1 2 tr , 10

å

á ñ = + º + a bg ag g b b g g b ab b g g  ₍ ₎ ₍ ₎ ( ) ( )

where Wαβ are elements of a window function matrix. Considering the first term of this expression (again, we are assuming that the linear bias term is significantly suppressed; if this is not the case, we are simply assuming that we are setting a conservative upper limit), if W ends up being the identity matrix for our choices of R and M, then we recover Equation(7) for the first term, and we have an estimator that

has no multiplicative matrix bias. However, Equation(7) is a

rather restrictive condition, and it is possible to violate it and still have a sensible(and correctly normalized) power spectrum estimate. In particular, as long as the rows ofW sum to unity, our power spectrum will be correctly normalized. Beyond this, the data analyst has a choice for M, and for simplicity

(5)

throughout this paper, we choose M to be diagonal. In a preview of what is to come, we also stress that the derivation that leads to Equation (10) assumes that R and x are not

correlated. If this assumption is violated, a simple application of the(now incorrect) formulae in this section can result in an improperly normalized power spectrum estimator that does not conserve power, i.e., one that has signal loss.

Given the advantages of inverse covariance weighting, the question arises of how one goes about estimating C. One method is to empirically derive it from the data x itself. Similar types of weightings that are based on variance information in the data are done in Chang et al. (2010) and Switzer et al.

(2015). In previous PAPER analyses, one time-averages the

data to obtain

Cº áxx†ñ » át xx ,†ñ (11) assuming xá ñ =t 0 (a reasonable assumption, since fringes average to zero over a sufﬁcient amount of time), whereáñt

denotes a ﬁnite average over time. The weighting matrix for our empirically estimated inverse covariance weighting is then RºC , where we use a hat symbol to distinguish the-1 empirical covariance from the true covarianceC.

In the next three sections, we use toy models to investigate the effects of weighting matrices on signal loss by experiment-ing with different matricesR and examining their impact on the resulting power spectrum estimates P. Our goal in experiment-ing with weightexperiment-ing is to suppress foregrounds and investigate EoR losses associated with them. We note that we purposely take a thorough and pedagogical approach to describing the toy model examples given in the next few sections. The speciﬁcs of how signal loss appears in PAPER’s analysis is described in Section 3.

As a brief preview, we summarize our ﬁndings in the following sections here.

1. If the covariance matrix is estimated from the data, a strong correlation between the estimated modes and the data will in general produce an estimate of the signal power spectrum that is strongly biased low relative to the true value. In this context, this is what we call “signal loss” (Section2.2).

2. The effect of the bias is worsened when the number of independent samples used to estimate the covariance matrix is reduced (Section2.3).

3. The rate at which empirical eigenvectors converge to their true forms depends on the sample variance in the empirical estimate and the shape of the empirical eigenspectrum. In general, larger sample variances lead to more loss(Section2.3).

4. Knowing these things, there are some simple ways of altering the empirical covariance matrix to decouple it from the data and produce unbiased power spectrum estimates(Section2.4).

2.2. Empirical Inverse Covariance Weighting

Using a toy model, we will now build intuition into how weighting by the inverse of the empirically estimated covariance, C , can give rise to signal loss. We construct a-1 simple data set that contains visibility data with 100 time integrations and 20 frequency channels. This model represents

realistic dimensions of about 1 hr of PAPER data that might be used for a power spectrum analysis. For PAPER-64 (both theA15analysis and our new analysis), we use ∼8 hr of data (with channel widths of 0.5 MHz and integration times of 43 s), but here we scale it down with no loss of generality.

We create mock visibilities, x, and assume a nontracking, drift-scan observation. Hence, ﬂat-spectrum sources (away from zenith) lead to measured visibilities that oscillate in time and frequency. We therefore form a mock visibility measure-ment of a bright foreground signal,xFG, as a complex sinusoid

that varies smoothly in time and frequency, a simplistic but realistic representation of a single bright source. We also create a mock visibility measurement of an EoR signal, xEoR, as a

complex, Gaussian random signal. A more realistic EoR signal would have a sloped power spectrum in p(k) (instead of ﬂat, as in the case of white noise), which could be simulated by introducing frequency correlations into the mock EoR signal. However, here we treat all k-values separately, so a simplistic

Figure 1. Our toy model data set to which we apply different weighting schemes in order to investigate signal loss. We model a mock foreground-only visibility with a sinusoid signal that varies smoothly in time and frequency. We model a mock visibility of an EoR signal as a random Gaussian signal. We add the two together to form x=xFG+xEoR. Real parts are shown here.

Figure 2.Estimated covariance matrices(top row) and inverse covariance-weighted data(bottom row) for FG only (left), EoR only (middle), and FG + EoR(right). Real parts are shown here.

(6)

white noise approximation can be used. Our combined data vector is then x=xFG +xEoR, to which we apply different weighting schemes throughout Section 2. The three data components are shown in Figure1.

We compute the power spectrum of our toy model data set x using Equations(8) and (9), with RºC . Figure-1 2shows the estimated covariances of our toy model data sets, along with the C-1 weighted data. The foreground sinusoid is clearly visible in CFG . The power spectrum result is shown in green in the left plot of Figure 3. Also plotted in the ﬁgure are the uniform-weighted (R≡I) power spectra of the individual components xFG (blue) and xEoR (red). As shown, our C

1 - weighted result successfully suppresses foregrounds, demon-strated in Figure 3 by the missing foreground peak in the weighted power spectrum estimate (green). It is also evident that our result fails to recover the EoR signal—it exhibits the correct shape, but the amplitude level is slightly low. It is this behavior that we describe as signal loss.

As discussed in Section2.1, this behavior is not expected in the case where we use a trueC−1weighting. Rather, we would obtain a nearly unbiased estimate of the power spectrum. The key difference is that since C is estimated from the data, its eigenvectors and eigenvalues are strongly coupled to the particular data realization that was used to compute it, and this coupling leads to loss.

For the case of an eigenmode that can be safely assumed to be predominantly a foreground, its presence in the true covariance matrix will result in the desired suppression via a kind of projection; whether or not it is strongly correlated with the actual data vector is irrelevant. However, in the case of an empirically estimated covariance matrix, the eigenmodes of CEoRwill both be incorrect and can be correlated with the data. If these incorrect eigenmodes are not correlated with the data, it will lead to nonminimum variance estimates but will not produce the suppression of the power spectrum amplitude as seen in the left plot of Figure3. As described in Section 3.1, however, if CEoR is correlated with the data vector x, there is a kind of projection of power in the nonforeground modes from the resulting power spectrum estimate, thus producing an estimate that is biased low. In short, if the covariance is computed from the data itself, it carries the risk of overﬁtting

information in the data and introducing a multiplicative bias (per k) to estimates of the signal.

The danger of an empirically estimated covariance matrix comes mostly from not being able to describe the EoR-dominated eigenmodes of C accurately, for which the EoR signal is brighter than foregrounds. In such a case, the coupling between these modes to the data realization leads to the overﬁtting and subtraction of the EoR signal. More speciﬁcally, the coupling between the estimated covariance and the data is anticorrelated in nature (explained in more detail in Section 3.1), which leads to loss. Misestimating C for

EoR-dominated eigenmodes is therefore more harmful than for foreground-dominated modes, and since the lowest-valued eigenmodes of an eigenspectrum are typically EoR-dominated, using this part of the spectrum for weighting is most dangerous. Armed with this information, we can tweak the covariance in a simple way to suppress foregrounds and yield minimal signal loss. Recall that our toy model foreground can be perfectly described by a single eigenmode. Using the full data set’s (foreground plus EoR signal) empirical covariance, we can project out the zeroth eigenmode and then take the remaining covariance to be the identity matrix. This decouples the covariance from the data for the EoR modes. The resulting power spectrum estimate for this case is shown in the right plot of Figure 3. In this case, we recover the EoR signal, demonstrating that if we can disentangle the foreground-dominated modes and EoR-foreground-dominated modes, we can suppress foregrounds with negligible signal loss.

Altering C as such is one speciﬁc example of a regulariza-tion method for this toy model, in which we are changing C in a way that reduces its coupling to the data realization. There are several other simple ways to regularize C, and we will discuss some in Section2.4.

2.3. Fringe-rate Filtering

We have shown how signal loss can arise due to the coupling of EoR-dominated eigenmodes to the data. We will next show how this effect is exacerbated by reducing the total number of independent samples in a data set.

A fringe-rate ﬁlter is an analysis technique designed to maximize sensitivity by integrating in time(Parsons et al.2016). Figure 3.Resulting power spectrum estimates for the toy model simulation described in Section2.2—foregrounds only (blue), EoR only (red), and weighted FG +

EoR data set(green). The power spectrum of the foregrounds peaks at a k-mode based on the frequency of the sinusoid used to create the mock foreground signal. In the two panels, we compare using empirically estimated inverse covariance weighting, whereC is derived from the data (left) and projecting out the zeroth eigenmode only(right). In the former case, signal loss arises from the coupling of the eigenmodes of C to the data. There is negligible signal loss when all eigenmodes besides the foreground one are no longer correlated with the data.

(7)

Rather than a traditional boxcar average in time, a time-domain ﬁlter can be designed to up-weight temporal modes consistent with the sidereal motion on the sky while down-weighting modes that are noise-like.

Because fringe-rate filtering is analogous to averaging in time, it comes at the cost of reducing the total number of independent samples in the data. With fewer independent modes, it becomes more difficult for the empirical covariance to estimate the true covariance matrix of the fringe-ratefiltered data. We can quantify this effect by evaluating a convergence metrice ( ) for the empirical covariance, which we deC fine as

C C C C , 12 ij ij ij ij ij 2 2

å

e º -  ( ) ( ) ( )

where C is the true covariance matrix. To compute this metric, we draw different numbers of realizations (different draws of Gaussian noise) of our toy model EoR measurement, xEoR, and

take their ensemble average. We then compare this to the“true” covariance, which in our simulation is set to be the empirical covariance after a large number(500) of realizations. As shown in Figure4, we perform this computation for a range of total independent ensemble realizations (horizontal axis) and number of independent samples in the data following time-averaging, or “fringe-rate ﬁltering” (different colors). With more independent time samples(i.e., more realizations) in the data, one converges to the true fringe-rate ﬁltered covariance more quickly.

The situation here with using afinite number of time samples to estimate our covariance is analogous to a problem faced in galaxy surveys, where the nonlinear covariance of the matter power spectrum is estimated using a large—but finite—number of expensive simulations. There, the limited number of independent simulations results in inaccuracies in estimated covariance matrices (Dodelson & Schneider 2013; Taylor & Joachimi 2014), which in turn result in biases in the final

parameter constraints (Hartlap et al. 2007). In our case, the

empirically estimated covariances are used for estimating the power spectrum, and as we discussed in the previous section

(and will argue more thoroughly in Section 3.1), couplings

between these covariances and the data can lead to power spectrum estimates that are biased low—which is precisely signal loss. In future work, it will be fruitful to investigate whether advanced techniques from the galaxy survey literature for estimating accurate covariance matrices can be successfully adapted for 21 cm cosmology. These techniques include the imposition of sparsity priors (Padmanabhan et al. 2016), the

ﬁtting of theoretically motivated parametric forms (Pearson & Samushia 2016), covariance tapering (Paz & Sánchez 2015),

marginalization over the true covariance (Sellentin & Heavens

2016), and shrinkage methods (Pope & Szapudi2008; Joachimi

2017).

The overall convergence of the covariance is important, but also noteworthy is the fact that different eigenvectors converge to their true forms at different rates. This is illustrated by Figure5, which shows the convergence of eigenvectors in an empirical estimate of a covariance matrix. For this particular toy model, we construct a covariance whose true form combines the same mock foreground from the previous toy models with an EoR component that is modeled as a diagonal matrix with eigenvalues spanning one order of magnitude (more speciﬁcally, we construct the EoR covariance as a diagonal matrix in the Fourier domain, where the signal is expected to be uncorrelated; its Fourier transform is then the true covariance of the EoR in the frequency domain, orCEoR).

For different numbers of realizations, we draw random EoR signals that are consistent with CEoR, add them to the mock

foreground data, and compute the combined empirical covariance by averaging over the realizations. The eigenvectors of this empirical covariance are then compared to the true eigenvectors v, where we use as a convergence metric e ( ),v deﬁned as v v v , 13 i N i 2 f

å

e( ) º ∣ -∣ ( ) where Nfis the number of frequencies (Equation (20)) in the

mock data. The eigenmode convergence curves in Figure5are

Figure 4. Convergence level, as deﬁned by Equation (12), of empirically

estimated covariances of mock EoR signals with different numbers of independent samples. In red, the mock EoR signal is comprised entirely of independent samples (100 of them). Subsequent colors show time-averaged signals. As the number of realizations increases, we see that the empirical covariances approach the true covariances. With more independent samples, the quicker an empirical covariance converges(i.e., the quicker it decouples from the data), the less signal loss we would expect to result.

Figure 5. Convergence level, as deﬁned by Equation (13), of empirically

estimated eigenvectors for different numbers of mock data realizations. The colors range from the zeroth(highest eigenvalue) to 19th (lowest eigenvalue) eigenmode, where they are ordered by eigenvalue in descending order. This ﬁgure shows that the zeroth eigenmode converges the quickest, implying that eigenvectors with eigenvalues that are substantially different than the rest(the foreground-dominated mode has a much higher eigenvalue than the EoR modes) are able to converge to the true eigenvectors the quickest. On the other hand, eigenmodes 1–19 have similar eigenvalues and are slower to converge because of degeneracies between them.

(8)

ranked by eigenvalue, such that “eigenmode 0” illustrates the convergence of the eigenvector with the largest eigenvalue, “eigenmode 1” for the second-largest eigenvalue, and so on. We see that the zeroth eigenmode—the mode describing the foreground signal—is quickest to converge.

Our numerical test reveals that the convergence rates of empirical eigenvectors are related to the sample variance in our empirical estimate. In general, computing an empirical covariance from a finite ensemble average means that the empirical eigenmodes have sample variances. Consider first a limiting case where all eigenvalues are equal. In such a scenario, any linear combination of eigenvectors is also an eigenvector, and thus there is no sensible way to define the convergence of eigenvectors. In our current test, aside from the zeroth mode, the eigenvalues have similar values but are not precisely equal. Hence, there is a well-defined set of eigenvectors to converge to. However, due to the sample variance of our empirical covariance estimate, there may be accidental degeneracies between modes, where some modes are mixing and swapping with others. Therefore, the steeper an eigenspectrum, the easier it is for the eigenmodes to decouple from each other and approach their true forms. A particularly drastic example of this can be seen in the behavior of mode 0 (the foreground mode), whose eigenvalue differs enough from the others that it is able to converge reasonably quickly despite substantial sample variance in our empirical covariance estimate. To break degeneracies in the remaining modes, however, requires many more realizations.

While the connection between the rate of convergence of an empirical eigenvector with the sample variance of an eigenspectrum is interesting, it is also important to note that regardless of convergence rate, any mode that is coupled to the data is susceptible to signal loss. The true eigenvectors are not correlated with the data realizations; thus, if our empirical eigenvectors are converged fully, there will not be any signal loss. However, an unconverged eigenvector estimate will retain some memory of the data realizations used in its generation, leading to signal loss.

In the toy models throughout Section2, we exploit the fact that the strongest eigenmode (highest-eigenvalue mode) is dominated by foregrounds in order to purposely incur signal loss for that mode. Even for the case of real PAPER data (Section 3), we make the assumption that the strongest

eigenmodes are likely the most contaminated by foregrounds. However, in general, foregrounds need not be restricted to the strongest eigenmodes, and as we have seen, it is really the degeneracies between modes that determine how quickly they converge, and hence how much signal loss can result.

With Figures 4 and 5 establishing the connection between convergence rates(of empirical covariances and eigenvectors) and number of realizations, we now turn back to our original toy model used in Section 2.2, which is comprised of a mock foreground and mock EoR signal. We mimic a fringe-rateﬁlter by averaging every four time integrations of our toy model data set together, yielding 25 independent samples in time (Figure 6). We choose these numbers so that the total number

of independent samples is similar to the number of frequency channels; hence, our matrices will be full rank. We use this “fringe-rate ﬁltered” mock data for the remainder of Section2. The power spectrum results for this model are shown in Figure 7, and as expected, there is a much larger amount of signal loss for this time-averaged data set, since we do a worse

job estimating the true covariance. In addition, as a result of having fewer independent samples, we obtain an estimate with more scatter. This is evident by noticing that the green curve in Figure7 fails to trace the shape of the uniform-weighted EoR power spectrum(red).

Using our toy model, we have seen that a sensitivity-driven analysis technique like fringe-rate filtering has trade-offs of signal loss and noisier estimates when using data-estimated covariance matrices. Longer integrations increase sensitivity but reduce the number of independent samples, resulting in eigenmodes correlated with the data that can overfit signal greatly. We note that a fringe-ratefilter does have a range of benefits, many described in Parsons et al. (2016), so it can still

be advantageous to use one despite the trade-offs.

2.4. Other Weighting Options

In Section2.2, we showed one example of how altering C can make the difference between nearly zero and some signal loss. We will now use our toy model to describe several other ways to tailor C in order to minimize signal loss. We choose four independent regularization methods to highlight in this section that have been chosen due to their simplicity in implementation and straightforward interpretation. We illus-trate the resulting power spectra for the different cases in Figure 8. These examples are not meant to be taken as suggested analysis methods but rather as illustrative cases.

As aﬁrst test, we model the covariance matrix of EoR as a proof of concept that if perfect models are known, signal loss can be avoided. We know that our simulated EoR signal should have a covariance matrix that mimics the identity matrix, with its variance encoded along the diagonal. We model CEoR as

such(i.e., the identity), instead of computing it based on xEoR itself. Next, we add CEoR+ CFG (where CFG= áxFG FGx† ñt) to

obtain aﬁnal Creg (regularized empirical covariance matrix) to use in weighting. In Figure8(upper left), we see that there is

negligible signal loss. This is because, by modelingCEoR, we

avoid overfitting EoR fluctuations in the data that our model does not know about (but an empirically derived CEoR would know about the fluctuations). In practice, such a weighting option is not feasible, as it is difficult to model CEoR, and CFG is

unknown because we do not know how to separate out the foregrounds from the EoR in our data.

The upper right panel in Figure 8 uses a regularization method of setting CregºC +gI, where γ=5 (an arbitrary strength ofI for the purpose of this toy model). By adding the identity matrix, element-wise, we are weighting the diagonal elements of the estimated covariance matrix more heavily than those off diagonal. Since the identity component does not know anything about the data realization, it alters the covariance to be less coupled to the data, and there is no loss.

The lower left panel in Figure 8 minimizes signal loss by only using the ﬁrst three eigenmodes of the estimated covariance. Recalling that our toy model foregrounds can be described entirely by the zeroth eigenmode, this method intentionally projects out the highest-valued modes only by replacing all but the three highest weights in the eigenspectrum with 1s (equal weights). Again, avoiding the overﬁtting of EoR-dominated modes that are coupled to the data results in negligible signal loss. While this case is illuminating for the toy model, in practice, it is not obvious which eigenmodes are foreground or EoR-dominated (and they could be mixed as

(9)

well), so determining which subset of modes to down-weight is not trivial. We experiment with this idea using PAPER data in Section 3.3.

The last regularization scheme we are highlighting here is setting CregºC I◦ (element-wise multiplication), or inverse variance weighting(i.e., keeping only the diagonal elements of C). In the lower right panel of Figure8, we see that this method does not down-weight the foregrounds at all—this regulariza-tion altered C in a way where it is no longer coupled to any of the empirically estimated eigenmodes, including the fore-ground-dominated one. To understand this, we recall that our foregrounds are spread out in frequency and therefore have nonnegligible frequency–frequency correlations. Multiplying by the identity matrix element-wise results in a diagonal matrix, meaning we do not have any correlation information. Because of this, we do a poor job suppressing the foreground. But because we decoupled the whole eigenspectrum from the data, we also avoid signal loss. Although this method did not successfully recover the EoR signal for this particular simulation, it is important that we show that there are many options for estimating a covariance matrix, and some may down-weight certain eigenmodes more effectively than others based on the spectral nature of the components in a data set.

In summary, we have shown how signal loss is caused by weighting a data set by itself and, in particular, how estimated

covariances can overﬁt EoR modes when they are coupled to data and not converged to their true forms. We have also seen that there are trade-offs between a chosen weighting method, its foreground-removal effectiveness, the number of independent samples in a data set, and the amount of resulting signal loss.

3. Signal Loss in PAPER-64

We now turn to a detailed signal loss investigation using a subset of the PAPER-64 data set from A15. In the previous section, we showed how signal loss arises when weighting data with empirically estimated covariances; in this section, we highlight how the amount of this loss was underestimated in the previous analysis. Additionally, we illustrate how we have revised our analysis pipeline in light of our growing understanding.

As a brief review, PAPER is a dedicated 21 cm experiment located in the Karoo Desert in South Africa. The PAPER-64 conﬁguration consists of 64 dual-polarization drift-scan elements that are arranged in a grid layout. For our case study, we focus solely on Stokes I estimated data(Moore et al.2013)

from PAPER’s 30 m East/West baselines (Figure 9). All data

are compressed, calibrated (using self-calibration and redun-dant calibration), delay-ﬁltered (to remove foregrounds inside the wedge), binned in local sidereal time (LST), and fringe-rate ﬁltered. For detailed information about the back-end system of PAPER-64 and its observations and data reduction pipeline, we refer the reader to Parsons et al.(2010) andA15. We note that all data-processing steps are identical to those inA15until after the LST-binning step in Figure3 ofA15.

The previously best published 21 cm upper limit result fromA15placed a 2σ upper limit on Δ2(k), deﬁned as

k k P k 2 , 14 2 3 2 p D( )= ˆ ( ) ( ) of(22.4 mK)2in the range 0.15Mpc−1<k<0.5 h Mpc−1at z=8.4. The need to revise this limit stems mostly from previously underestimated signal loss, which we address in this section.

For the analysis in this paper, we use 8.1 hr of LST, namely an R.A. range of 0.5–8.6 hr (A15 uses a slightly longer R.A. range of 0–8.6 hr; we found that some early LSTs were more severely foreground-contaminated). We also use only 10 baselines, a subset of the 51 total East/West baselines used inA15, in order to illustrate our revised methods. All power spectrum results are produced for a center frequency of 151 MHz using a width of 10 MHz(20 channels), identical to the analysis inA15. In the case study in this paper, we only use one baseline type instead of the three inA15, but M. Kolopanis et al. (2018, in preparation) use the full data set presented in A15 to revise the result and place limits on the EoR at multiple redshifts (using a straightforward and not lossy approach to avoid many of the issues that will be made clear later on).

The most significant changes fromA15occur in our revised power spectrum analysis, which is explained in the rest of this paper, but we also note that the applied fringe-ratefilter is also slightly different. InA15, the appliedfilter was not equivalent to the optimal fringe-ratefilter (which is designed to maximize power spectrum sensitivity). Instead, the optimal filter was degraded slightly by widening it in fringe-rate space. This was chosen in order to increase the number of independent modes and reduce signal loss associated with the QE, though as we

Figure 6. Our“fringe-rate ﬁltered” (time-averaged) toy model data set. We average every four samples together, yielding 25 independent samples in time. Real parts are shown here.

Figure 7. Resulting power spectrum estimate for the “fringe-rate ﬁltered” (time-averaged) toy model simulation—foregrounds only (blue), EoR only (red), and weighted FG + EoR data set (green). We use empirically estimated inverse covariance weighting, whereC is computed from the data. There is a larger amount of signal loss than for the nonaveraged data, a consequence of weighting by eigenmodes that are more strongly coupled to the data due to there being fewer independent modes in the data.

(10)

will explain in the next section, this signal loss was still underestimated. With the development of a new, robust method for assessing signal loss, we choose to use the optimalfilter in order to maximize sensitivity. This filter is computed for a fiducial 30 m baseline at 150 MHz, the center frequency in our band. Thefilter in both the fringe-rate domain and time domain is shown in Figure10.

Finally, we emphasize that the discussion that follows is solely focused on signal loss associated with empirical covariance weighting. As mentioned in Section 2, there are a number of steps in our analysis pipeline that could lead to loss, including gain calibration, delay filtering, and fringe-rate filtering, which have been investigated at various levels of detail in Parsons et al. (2014) and A15 but are clearly the subject of future work. Here we only focus on the most significant source of loss we have identified and note that

M. Kolopanis et al. (2018, in preparation) and other future work will consider additional sources of signal loss and exercise increased caution in reporting results.

We present our PAPER-64 signal loss investigation in three parts. We ﬁrst give an overview of our signal injection framework that is used to estimate loss. In this framework(and inA15), we inject simulated cosmological signals into our data

and test the recovery of those signals(an approach also taken by Masui et al.2013). As we will see, correlations between the

injected signals and the data are signiﬁcant complicating factors that were previously not taken into account. Next, we describe our methodology in practice and detail how we map our simulations into a posterior for the EoR signal. Finally, we build off of the previous section by experimenting with different regularization schemes on PAPER data in order to minimize loss. Throughout each section, we also highlight major differences from the signal loss computation used inA15.

3.1. Signal Loss Methodology

In short, our method for estimating signal loss consists of adding an EoR-like signal into visibility data and then measuring how much of this injected signal would be detectable given any attenuation of this signal by the (lossy) data analysis pipeline. To capture the full statistical likelihood of signal loss, one requires a quick way to generate many

Figure 8.Resulting power spectrum estimates for our“fringe-rate filtered” (time-averaged) toy model simulation—foregrounds only (blue), EoR only (red), and weighted FG+ EoR data set (green). We show four alternate weighting options that each minimize signal loss, including modeling the covariance matrix of EoR (upper left), regularizing C by adding an identity matrix to it (upper right), using only the first three eigenmodes of C (lower left), and keeping only the diagonal elements of C (lower right). The first case (upper left) is not feasible in practice, since we do not know CFGandCEoRlike we do in the toy model.

Figure 9.PAPER-64 antenna layout. We use only 10 of the 30 m East/West baselines for the analysis in this paper(i.e., a subset of the shortest horizontal spacings).

(11)

realizations of simulated 21 cm signal visibilities. Here we use the same method as in A15, where mock Gaussian noise visibilities (mock EoR signals) are filtered in time using an optimal fringe-ratefilter to retain only “sky-like” modes. Since the optimal filter has a shape that matches the rate of the sidereal motion of the sky, this transforms the Gaussian noise into a measurement that PAPER could make. This signal is then added to the visibility data.17

Mathematically, suppose that e is the mock injected EoR signal(at some amplitude level). We do not know the true EoR signal contained within our visibility data, x, so e takes on the role of the true EoR signal (for which we measure its loss). Furthermore, one can make the assumption that the true EoR signal is small within our measured data, so the data vector x itself is representative of mostly contaminants. Using this assumption, the sum of x and e, deﬁned as r,

r= +x e, (15)

can be thought of as the sum of contaminants plus EoR. The quantity r then becomes the data set for which we are measuring how much loss of e there is due to our power spectrum pipeline.

We are interested in quantifying how much variance in e is lost after weighting r and estimating the power spectrum according to QE formalism. We investigate this by comparing two quantities we call the input power spectrum and output power spectrum, Pin and P , estimated using QE asout

e IQ Ie P_inaºM_ina † a (16) and P r R Q R r P M , 17 r r r r outº = a a a a   ( ) †

where, for illustrative purposes and notational simplicity, we have written these equations with scalar normalizations M, even though, for our numerical results, we choose a diagonal matrix normalization usingM as in Equation (9).

The quantity P , deﬁned by Equation (in 16), is a uniformly weighted estimator of the power spectrum of e. It can be considered the power spectrum of this particular realization of the EoR; alternatively, it can be viewed as the true power spectrum of the injected signal up to cosmic variance ﬂuctuations. The role of P in our analysis is to serve as ain reference for the power spectrum that would be measured if there were no signal loss or other systematics. The input power spectrum is then to be compared to P , which approximatesout the (lossy) power spectrum estimate that is output by our analysis pipeline prior to any signal loss adjustments.

Under this injection framework, we can begin to see explicitly why there can be large signal loss. Expanding Equation(17), P becomesout

x e R Q R x e x R Q R x e R Q R e x R Q R e e R Q R x P M M M M M . 18 r r r a r r b r r c r r d r r out= + + = + + + a _a _a a a a a a a a a  ₍ ₎ ₍ ₎ ( ) † † † † †

AssumingRris symmetric, the two cross-terms(terms with one

copy of e and one copy of x) can be summed together as

x R Q R x e R Q R e x R Q R e P M M M 2 . 19 a r r b r r c r r out= + + a _a _a _a _a a a  ( ) † † †

One of the key takeaways of this section is that the A15

analysis estimated signal loss by comparing only the signal-only term(second term in Equation (19)) with P , whereas inin fact, the cross-term (third term in Equation (19)) can

substantially lower P . In order to investigate the effect ofout each of these terms on signal loss, all three components are plotted in Figure 11 for two cases: empirically estimated inverse covariance weighting(Rr Cr

1

º  ) and uniform weight- -ing(Rr≡ I). We will now go into further detail and examine

the behavior of this equation in three different regimes of the injected signal: very weak (left ends of the Pin axes in

Figure 11), very strong (right ends), and in between (middle

portions).

Small injection. In this regime, the cross-terms(red) behave as noise averaged over a ﬁnite number of samples. Output values are Gaussian distributed around zero, spanning a range of values set by the injection level. This is because R isr

dominated by the data x, avoiding correlations with e that can lead to solely negative power(explained further below). In fact, for the uniformly weighted case, the cross-termMxax IQ Ie† a is well modeled as a symmetric distribution with zero mean and width P  . We also note that in this regime, Pe Px  (black)r

approaches the data-only power spectrum value (gray), as expected.

Large injection. When the injected signal is much larger than the measured power spectrum, the data-only components can be neglected, as they are many orders of magnitude smaller.

Figure 10.Top: normalized optimal power spectrum sensitivity weighting in fringe-rate space for our ﬁducial baseline and Stokes I polarization beam. Bottom: time-domain convolution kernel corresponding to the top panel. Real and imaginary components are illustrated in cyan and magenta, respectively, with the absolute amplitude in black. The fringe-rateﬁlter acts as an integration in time, increasing sensitivity but reducing the number of independent samples in the data set.

17

One specific change fromA15is that we add this simulated signal—which has been fringe-ratefiltered once already in order to transform it into a “sky-like” signal—into the analysis pipeline before a fringe-rate filter is applied to the data(i.e., prior to the analysis step of fringe-rate filtering). Previously, the addition was done after the fringe-ratefilter analysis step. This change results in an increased estimate of signal loss, likely due to the use of the fringe-ratefilter as a simulator. However, this pipeline difference, while significant, is not the dominant reason why signal loss was underestimated inA15(the dominant

(12)

We include a description of this regime for completeness in our discussion but note that the upper limits that we compute are typically not determined by simulations in this regime(i.e., in using an empirical weighting scheme, we have assumed that the data are dominated by foregrounds rather than the cosmological signal). However, it is useful as a check of our system in a relatively simple case. As we can see from Figure11, the cross-terms(red) are small in comparison to the signal-only term (green). Only here does the signal-only term used in A15 dominate the total power output. We again see that, in the empirical inverse covariance-weighted case, the cross-terms behave as noise(positive and negative ﬂuctuations around zero mean). This is for the same reason as at small injections—here C is dominated by the signal e. The cross-r

correlation can again be modeled as a symmetric distribution of zero mean and width P  .e Px

In between. When the injected signal is of a similar amplitude to the data by itself, the situation becomes less straightforward. We see that the weighted injected power spectrum component mirrors the input power indicating little loss (i.e., the green curve follows the dotted black line), eventually departing from unity when the injected amplitude is well above the level of the data power spectrum. However, in

this regime, the cross-term(red) has nearly the same amplitude but with a negative sign. As explained below, this negativity is the result of cross-correlating inverse covariance-weighted terms. This negative component drives down the P estimatorout (black). Again, we emphasize that in A15, signal loss was computed by only looking at the second term in Equation(19)

(green), which incorrectly implies no loss at the data-only power spectrum level. Ignoring the effect of the negative power from the cross-terms is the main reason for underestimating power spectrum limits inA15.

The source of the strong negative cross-term is not immediately obvious; however, it is an explainable effect. WhenRris taken to be Cr

1

- , the third term of Equation (19) is a

cross-correlation between Cr x 1 - _{and C} _e r 1 - _{. As shown in}

Switzer et al. (2015), this cross-correlation term is nonzero

and, in fact, negative in expectation. This negative cross-term power arises from a coupling between the inverse of C and x.r

Intuitively, we can see this by expanding the empirical covariance ofr=x+e, C rr xx xe ex ee , 20 r t t t t t = á ñ = á ñ + á ñ + á ñ + á ñ  ( ) † † † † †

Figure 11.Illustration of the power spectrum amplitude offive different power spectrum terms, each a function of visibility data (x), simulated injected EoR signal (e), or both (r). The figure shows how these quantities behave as the power level of the injected EoR signal increases (along the x-axis). The details of the simulation used to generate thefigure are explained in Section3.2; here we sample a larger Pinrange andfit smooth polynomials to our data points to make an illustrative

example. We emphasize that the output power spectrum in black(Pout=P) approximates the (lossy) power spectrum estimate that is output by our analysis pipeliner

prior to any signal loss adjustments. Roughly speaking, it can be compared to the input signal level(Pin) to estimate the amount of signal loss. In the left panel,

empirical inverse covariance weighting is used in power spectrum estimation, as inA15. The dotted black diagonal line indicates perfect 1:1 input-to-output mapping (no signal loss). The gray horizontal line is the power spectrum value of data alone, P (it does not depend on injected power). The green signal–signal component isx

the term used inA15to estimate signal loss. It is signiﬁcantly higher than P (black) when the cross-terms (red) are large and negative (black=green + red + blue).r

In the regime where cross-correlations between signal and data are not dominant(small and large Pin), the cross-terms have a noise-like term with width P P  .e x

However, at power levels comparable to the data(middle region), the cross-terms can produce large, negative estimates due to couplings between x and e that affect

C . This causes the difference between the green curve (which exhibits negligible loss at the data-only power spectrum value) and the black curve (which exhibits ∼4r

(13)

where we can neglect theﬁrst term because x is small (i.e., the large negative cross-term power in the left panel of Figure11

occurs when the injected amplitude surpasses the level of the data-only power spectrum). Without loss of generality, we will assume an eigenbasis of e, so that eeá ñ† tis diagonal. The middle

two terms, however, can have power in their off-diagonal terms due to the fact that, when averaging over a ﬁnite ensemble,

xe t

á †ñ _{is not zero. As shown in Appendix C of Parsons et al.} (2014), to leading order, the inversion of a diagonal-dominant

matrix like C (from eer á ñ†t) with smaller off-diagonal terms

results in a new diagonal-dominant matrix with negative off-diagonal terms. These off-off-diagonal terms depend on both x and e. Then, when C_r-1is multiplied into x, the result is a vector that is similar to x but contains a residual correlation to e from the off-diagonal components of C . The correlation is_r-1 negative because the product Cr x

1

- _{effectively squares the} x-dependence of the off-diagonal terms in C_r-1while retaining the negative sign that arose from the inversion of a diagonal-dominant matrix.

In general. Another way to phrase the shortcomings of the empirical inverse covariance estimator is that it is not properly normalized. Signal loss due to couplings between the data and their weightings arise because our unnormalized QE from Equation (8) ceases to be a quadratic quantity and instead

contains higher-order powers of the data. However, the normalization matrixM is derived assuming that the unnorma-lized estimator is quadratic in the data. The power spectrum estimate will therefore be incorrectly normalized, which manifests as signal loss. We leave a full analytic solution for M for future work, since our simulations already capture the full phenomenology of signal loss and have the added beneﬁt of being more easily generalizable in the face of non-Gaussian systematics.

3.2. Signal Loss in Practice

We now shift our attention toward computing upper limits on the EoR signal for the fringe-rateﬁltered PAPER-64 data set in a way that accounts for signal loss. While our methodology outlined below is independent of weighting scheme, here we demonstrate the computation using empirically estimated inverse covariance weighting (RºC ), the weighting-1 scheme used in A15that leads to substantial loss.

One issue to address is how one incorporates the randomness of P into our signal loss corrections. A different realization ofout the mock EoR signal is injected with each bootstrap run, causing the output to vary in three ways: noise variation from the bootstraps, cosmic variation from generating multiple realizations of the mock EoR signal, and variation caused by whether the injected signal looks more or less“like” the data (i.e., how much coupling there is, which affects how much loss results).

For each injection level, the true Pinis simply the average of

our bootstrapped estimates P , since Pin in,ais by construction an unbiased estimator. Phrased in the context of Bayes’s rule, we wish to ﬁnd the posterior probability distribution p P P( in out∣ ), which is the probability of Pingiven the uncorrected/measured

power spectrum estimate P . Bayes’s rule relates the posterior,out which we do not know, to the likelihood, which we can

forward-model. In other words,

p P P( in out∣ )µ(Pout∣Pin) (p Pin), (21) where  is the likelihood function deﬁned as the distribution of data plus signal injection (P ) given the injection Pout in. We

construct this distribution by ﬁxing Pin and simulating our

analysis pipeline for many realizations of the injected EoR signal consistent with this power spectrum. The resulting distribution is normalized such that the sum over P is unity,out and the whole process is then repeated for a different value of Pin.

The implementation details of the injection process require some more detailed explanation. In our code, we add a new realization of EoR to each independent bootstrap of data (see Section 4.1 for a description of PAPER’s bootstrapping routine) with the goal of simultaneously capturing cosmic variance, noise variance, and signal loss. To limit computing time, we perform 20 realizations of each Pinlevel. We also run

50 total EoR injection levels, yielding Pin values that range

from∼105 to∼1011mK2(h−1Mpc)3, resulting in a total of 1000 data points on our Pinversus P grid.out

Going forward, we treat every k-value separately in order to determine an upper limit on the EoR signal per k. We bin our simulation outputs along the Pin axis (one bin per injection

level), and, since they are well approximated by a Gaussian distribution in our numerical results, we smooth the distribution of P values by ﬁtting Gaussians for each bin based on itsout mean and variance(and normalize them). Stitching all of them together results in a two-dimensional transfer function: the likelihood function in Bayes’s rule, namely P P ( out∣ in). We then have a choice for our prior, p(Pin), and we choose to invoke a

Jeffreys prior(Jaynes1968) because it is a true uninformative

prior.

Finally, our transfer functions are shown in Figure 12 for both the weighted (left) and unweighted (right) cases. Our bootstrapped power spectrum outputs are shown as black points, and the colored heat map overlaid on top is the likelihood function modiﬁed by our prior. Although we only show ﬁgures for one k-value, we note that the shape of the transfer curve is similar for all k-values. We then invoke Bayes’s interpretation and reinterpret it as the posterior

p P P( in out∣ ), where we recall that P represents a (lossy) powerout spectrum. To do this, we make a horizontal cut at the data value P (settingx Pout =P), shown by the gray solid line, to yield ax

posterior distribution for the signal. We normalize this ﬁnal distribution and compute the 95% conﬁdence interval (an upper limit on EoR).

By-eye inspection of the transfer function in Figure12gives a sense of what the signal loss result should be. The power spectrum value of our data, P, is marked by the solid grayx

horizontal lines. From the left plot (empirically estimated inverse covariance weighting), one can eyeball that a data value of 105mK2 (h−1 Mpc)3, for example, would map approxi-mately to an upper limit of∼109mK2(h−1Mpc)3, implying a signal loss factor of∼104.

The loss-corrected power spectrum limit for empirically estimated inverse covariance-weighted PAPER-64 data is shown in Figure13 (solid red), which we can compare to the

original lossy result(dashed red). Post–signal loss estimation, the power spectrum limits are higher than both the theoretical noise level (green) and uniform-weighted power spectrum

(14)

(which is shown three ways: black and gray points are positive and negative power spectrum values, respectively, with 2σ error bars from bootstrapping; the solid blue line is the upper limit on the EoR signal using the full signal injection

framework; and the shaded gray region shows the power spectrum values with thermal noise errors). We elaborate on this point in the next section and investigate alternate weighting schemes to inverse covariance weighting, with the goal of

Figure 12.Signal loss transfer functions showing the relationship of Pinand P , as defined by Equations (out 16) and (17). Power spectra values (black points) are generated for 20 realizations of e per signal injection level. Since our P values are well approximated by a Gaussian distribution, we fit Gaussians to each injectionout level based on the mean and variance of the simulation outputs. This entire likelihood function is then multiplied by a Jeffreys prior for p(Pin), with the final result

shown as the colored heat maps on top of the points. Two cases are displayed: empirically estimated inverse covariance-weighted PAPER-64 data(left) and uniform-weighted data(right). The dotted black diagonal lines mark a perfect unity mapping, and the solid gray horizontal line denotes the power spectrum value of the data Px

from which a posterior distribution for the signal is extracted. From these plots, it is clear that the weighted case results in∼4 orders of magnitude of signal loss at the data-only power spectrum value, whereas the uniform-weighted case does not exhibit loss. The general shapes of these transfer functions are also shown by the black curves in Figure11for comparison.

Figure 13.Power spectrum of a subset of PAPER-64 data illustrating the use of empirical inverse covariance weighting. The solid red curve is the 2σ upper limit on the EoR signal estimated from our signal injection framework using empirical inverse covariance weighting. Shown for comparison is the lossy limit prior to signal loss estimation(dashed red). The theoretical 2σ thermal noise level prediction based on observational parameters is shown in green, and its calculation is detailed in Section4.2. Additionally, the power spectrum result for the uniform-weighted case is shown in three different ways: power spectrum values(black and gray points as positive and negative values, respectively, with 2σ error bars from bootstrapping), the 2σ upper limit on the EoR signal using our full signal injection framework (solid blue lines), and the measured power spectrum values with 2σ thermal noise errors (gray shaded regions). The vertical dashed black lines signify the horizon limit for this analysis using 30 m baselines. In this example, we see that the lossy power spectrum limit is∼4 orders of magnitude too low when using empirical inverse covariance weighting.