a*Department of Electrical Engineering, University of Guilan, Rasht, Iran*

b*Optical Communications Research Group, Faculty of Engineering and Environment, Northumbria University, Newcastle, UK*

A R T I C L E I N F O

*Keywords:*

Visible light communications (VLC) Carrier-less amplitude and phase (CAP) modulation

Deep learning (DL)

A B S T R A C T

We propose in this paper a carrier-less amplitude and phase visible light communications (VLC) system with deep learning (DL)-based post-equalizer (EQ) to significantly increase the transmission data rate. The proposed system is analyzed for various conditions including modulation order, transmitted signal bandwidth, and non- line of sight VLC channel. Results show that the highest data rate and spectral efficiency of 100 Mb/s and 4.67 b/s/Hz are achieved for the modulation order and signal bandwidth of 64 and 25 MHz, respectively.

In addition, we compare the performance and complexity of the proposed system with different types of EQs including least mean square and Volterra series. The study shows the DL-based EQ is qualified to mitigate mixed linear and nonlinear impairments by providing improved bit error rate performance compared to the other EQs for all modulation orders and the transmitted signal bandwidth.

## 1. Introduction

Optical wireless communications (OWC) use visible light (VL), in- frared (IR), and ultraviolet (UV) spectral bands to meet some of the demands for wireless connectivity in fifth-generation (5G) and 6G networks. OWC, as a complementary technology to the radio frequency (RF) wireless systems, has unique features such as almost unlimited bandwidth (BW), no spectrum authorization and regulations, much safer to the environment, higher security in the physical layer (PHY), higher energy efficiency and improved sustainability [1]. In the VL band OWC, known as VL communications (VLC), uses white light emitting diodes (LEDs) and photodetectors (PDs) as the transmitter (Tx) and the receiver (Rx); this offers simultaneous illumination and data communications [2]. However, white LEDs have some limitations that may cause VLC systems to underperform in practical applications.

Blue LEDs with phosphor and RGB (red, green, and blue) LEDs are the
two most widely used methods of producing white light. Even though,
RGB LEDs offer higher data rates,*𝑅*_{𝑑}, blue LEDs with phosphor coat-
ing have simpler implementation and lower cost. However, the long
responsivity of phosphor limits the LED modulation bandwidth,*𝐵*_{LED},
to only a few MHz, which in turn greatly constrains the achievable
transmission capacity of the system. Moreover, white LEDs are another
source of nonlinearity in VLC systems, leading to signal distortion and
intersymbol interference (ISI) [3,4].

A common solution to mitigate these limitations is to employ linear and nonlinear equalizers (EQs) including adaptive least mean square

∗ Corresponding author.

*E-mail address:* bsalimi@guilan.ac.ir(G. Baghersalimi).

(LMS) – the most widely used, Volterra series, and deep learning (DL)-
based filters. As (*i*) in [5] it was shown how an adaptive EQ with
the LMS algorithm could suppress the ISI in indoor VLC systems; (*ii*)
in [6], it was shown that a Rx with a decision feedback EQ (DFE) with
nonlinear Volterra feed-forward section could effectively mitigate the
effects of LED’s nonlinearity with improved performance by up to 5 dB
in terms of optical power compared with a simple DFE; and (*iii*) in [7]

it was demonstrated that higher*𝑅*_{𝑑} of 170 Mb/s could be achieved
by employing artificial neural network (ANN)-based equalization using
white LEDs.

DL is a powerful subfield of machine learning (ML), which can learn, recognize, and predict complex patterns among high-dimensional input information data [8]. The DL methods have been used in many application domains including speech recognition, computer vision, self-driving cars, natural language processing, predictive forecasting, and communication systems [9–11]. For example, authors in [12,13]

introduced several DL applications and improvements in the PHY of communication links. In [14], a deep neural network (DNN) was em- ployed as an accurate tool for channel estimation to reduce both dis- tortion and interference. Authors in [15] reviewed the challenges and applications of using DL in VLC, whereas in [16] DL was utilized to re- duce flickering and increase the dimming level in a VLC system. In [17], the authors determined an autoencoder to mitigate the nonlinearity of the LED and enhance the system bit error rate (BER) performance.

In [18], a DNN-based signal detector namely bi-directional long-short term memory for VLC links with 25 Mbps non-line of sight (NLOS)

https://doi.org/10.1016/j.optcom.2022.128741

Received 7 April 2022; Received in revised form 18 June 2022; Accepted 5 July 2022 Available online 8 July 2022

0030-4018/©2022 Elsevier B.V. All rights reserved.

**Fig. 1.** Schematic block diagram of the proposed CAP-VLC system with DL-based post-EQ.

and 50 Mbps NLOS using the second-order and first-order reflection,
respectively, were reported with the BER of < 1 × 10^{−4}. In [19],
an orthogonal frequency division multiplexing (OFDM)-based optical
quadrature spatial modulation for multiple-input multiple-output OWC
with DNN-aided detection was proposed with a successful outcome.

There are different widely used spectrally efficient modulation schemes in intensity modulation/direct detection (IM/DD) optical sys- tems. In this work, we have adopted the carrier-less amplitude and phase (CAP) modulation scheme for VLC. To the best of the authors’

knowledge, the DL-based post-EQ has not been used in CAP-VLC
systems. Therefore, inspired by many of the aforementioned research
works, the effectiveness of applying DL for post-EQ in an indoor CAP-
VLC system is investigated in this work to increase*𝑅*_{𝑑} and improve
the BER performance. In our study, we successfully demonstrate *𝑅*_{𝑑}
of 100 Mb/s, which to our knowledge is the highest *𝑅*_{𝑑} for a single
band CAP-VLC system using a single white LED so far. In addition,
we compare the BER performance of the proposed system with two
common equalization techniques of Volterra series and adaptive LMS
algorithm.

The rest of the paper is organized as follows. In Section 2, an overview of the DL-based indoor VLC-CAP system is explained. In Section 3, the most important results are presented and discussed.

Finally, Section4is allocated to conclude the paper.

## 2. System overview

Fig. 1illustrates the schematic block diagram of the proposed CAP- VLC system with DL-based post-EQ which is explained comprehensively as follows. Note, we assume that each LED follows the Lambertian radiation pattern.

*2.1. The Tx*

At the Tx, a random data bit stream {*𝐼*_{𝑛}} is mapped into the quadra-
ture amplitude modulation (QAM) format prior to being applied to the
CAP modulator. CAP is a spectrally efficient and low-cost technique,
which can be implemented with ease, due to its simplicity and the
lack of need for fast Fourier transform (FFT) and inverse FFT (IFFT) as
required in OFDM. Like the QAM format, except that instead of using a
local oscillator, CAP generates carrier frequencies using two orthogonal
digital filters whose impulse responses are Hilbert transform pairs [20–

23]. The mapped complex data are then up-sampled by a specific factor
*𝑛*_{𝑠} =⌈2 (1 +*𝛽*)⌉, where⌈*.*⌉is the ceiling function and*𝛽*is the roll-of
factor of the square root raised cosine (SRRC) filter. Next, the real and
imaginary elements of the up-sampled signal are separated and sent to

the digital in-phase (*I*) and quadrature (*Q*) SRRC filters, respectively,
which are given as [24]:

*𝑓*_{𝐼}(*𝑡*) =

⎡⎢

⎢⎢

⎢⎣

sin [*𝛾*(1 −*𝛽*)] +
4*𝛽*(_{𝑡}

*𝑇**𝑠*

)
cos [*𝛾𝛿*]

*𝛾*
[

1 −(_{4𝛽𝑡}

*𝑇**𝑠*

)2]

⎤⎥

⎥⎥

⎥⎦

*.*cos [*𝛾𝛿*] (1)

*𝑓*_{𝑄}(*𝑡*) =

⎡⎢

⎢⎢

⎢⎣

sin [*𝛾*(1 −*𝛽*)] +
4*𝛽*(

*𝑡*
*𝑇**𝑠*

)
cos [*𝛾𝛿*]

*𝛾*
[

1 −(_{4𝛽𝑡}

*𝑇*_{𝑠}

)2]

⎤⎥

⎥⎥

⎥⎦

*.*sin[*𝛾𝛿*] (2)

where*𝑇*_{𝑠}is the symbol duration,*𝛾*=*𝜋𝑡*∕*𝑇*_{𝑠}, and*𝛿*= 1 +*𝛽*.

The outputs of the digital finite impulse response (FIR) filters are summed together to produce the desired CAP signal, which is given by [24]:

*𝑠*(*𝑡*) =*𝑠*_{𝐼}(*𝑡*)*⊗ 𝑓*_{𝐼}(*𝑡*) −*𝑠*_{𝑄}(*𝑡*)*⊗ 𝑓*_{𝑄}(*𝑡*) (3)
where*𝑠*_{𝐼}(*𝑡*)and*𝑠*_{𝑄}(*𝑡*)are up-sampled*I*and*Q*components, respectively
and*⊗*indicates the time domain convolution. Finally,s (t)is scaled and
a direct current (DC)-bias is added to it which makes it nonnegative
within the dynamic range of the LED prior to IM of the light source as
follow [25]:

*𝐼*_{𝑖𝑛}(*𝑡*) =*𝐼*_{𝐷𝐶}+*𝛼𝑠*(*𝑡*)*,* (4)

where*𝛼*is the scaling factor and defined as**:**
*𝛼*= *𝑀 𝐼*×*𝐼*_{max}

(*𝑀 𝐼*+ 1) × max{*𝑠*(*𝑡*)} (5)

where*𝐼*_{max}=*𝐼*_{DC}+ 0*.*5*𝐼*_{PP}is the maximum value of*𝐼*_{in},*𝐼*_{DC} is the DC-
bias current,*𝐼*_{PP}is the peak-to-peak current, and*𝑀 𝐼*is the modulation
index, which is defined as:

*𝑀 𝐼*=*𝐼*_{max}−*𝐼*_{DC}

*𝐼*_{DC} (6)

The Hammerstein method is adopted to model LED characteristics in this study. It includes a nonlinear function and a linear component to represent the LED with the limited bandwidth, which are characterized as a third-order polynomial and a first-order low pass Butterworth filter, respectively, as given by [25]:

*𝑓*_{NL}(*𝑥*) =

⎧⎪

⎨⎪

⎩

0*.*1947*,* *𝑥 <*0*.*1

0*.*2855*𝑥*^{3}− 1*.*0886*𝑥*^{2}+ 2*.*0565*𝑥*− 0*.*0003*,* 0*.*1≤*𝑥 <*1
1*.*2531*,* *𝑥*≥1

(7)

*ℎ*_{LED}(*𝑡*) =*𝑒*^{−2𝜋𝑓}^{LED}^{𝑡} (8)

where*𝑓*_{𝐿𝐸𝐷}is the 3-dB cut-off frequency of the LED. The LED transmit
output power is defined as:

*𝑃*_{out}(*𝑡*) =*𝑓*_{NL}(
*𝐼*_{in}(*𝑡*))

*⊗ ℎ*_{LED}(*𝑡*) (9)

the Tx and the Rx, respectively.*𝑇*_{𝑠}(*𝜓*)and*𝑔*(*𝜓*)are the optical filter
and non-imaging concentrator gain, respectively. For the NLOS paths,
every wall is divided into*𝐿*_{𝑅}small squares, which can be assumed as
both the Tx and Rx.*ℎ*(

*𝑡, 𝑠, 𝜖*_{𝑛}^{𝑟})

represents the NLOS impulse response
between the main Tx and the reflecting element*𝜖*_{𝑛}^{𝑟}, which acts as a Rx.

*ℎ*(
*𝑡, 𝜖*_{𝑛}^{𝑆}*, 𝐷*)

is the impulse response between*𝜖*_{𝑛}^{𝑆} (i.e., acting as a Tx)
and the main Rx,*𝐷*. It is worth to mention that, only a single reflection
per wall is considered in this study and other higher order reflections
are ignored since they contribute very little to the total received optical
power [24]

*2.3. The Rx*

At the Rx side, the received signal is detected by an optical Rx composed of a single PD and a trans-impedance amplifier (TIA). The regenerated electrical received signal is given by:

*𝑦*(*𝑡*) =*𝑠*(*𝑡*) ∗*ℎ*_{𝑐}(*𝑡*) +*𝑛*(*𝑡*) (13)

where*𝑛*(*𝑡*)is the additive white Gaussian noise with the power*𝑃*_{𝑛} =
*𝑁*_{0}*𝐵*_{Rx}, *𝑁*_{0} is the noise power spectral density, and*𝐵*_{Rx} is the band-
width of the Rx. Note,*𝑛*(*𝑡*)is mostly dominated by the ambient light
induced shot noise. Following DC removal, a DL-based EQ is employed,
which is explained in Subsection D in details. The equalized signal
is applied to a set of inverse matched filters *𝑔*_{𝐼}(*𝑡*) = *𝑓*_{𝐼}(−*𝑡*) and
*𝑔*_{𝑄}(*𝑡*) = *𝑓*_{𝑄}(−*𝑡*) to split the*I* and*Q* components of the CAP signal,
respectively. Next, the detached signals are down-sampled prior to
QAM demodulation.

*2.4. DL-based EQ*

As mentioned before, DL is a powerful subfield of ML, which em-
ploys DNN to solve complex problems. DNN is a class of ANNs with
two or more hidden layers of nodes [27], see Fig. 2. As shown, it
consists of four layers including the input layer, two hidden layers,
and output layer. The number of nodes, shown in circles, 120, 300,
and 400 for both the input and output layer, and the 1st and 2nd
hidden layers, respectively. All layers are fully connected, which means
that every node in each layer is connected to all nodes in the next
layer. The DL method splits the modeling dataset into training and
test sets. The process for the training set is offline and includes two
steps: (*i*) forward; and (*ii*) backward propagation [28]. First, in forward
propagation, a subset of training dataset named minibatch crosses from
the input layer to the hidden layers and then the output layer. The
mathematical computations of forward propagation in the 1st hidden
layer is given by [29]:

*𝑍*^{[1]}=*𝑊*^{[1]}*𝑋*+*𝑏*^{[1]} (14)

*𝐴*^{[1]}=*𝑔*^{[1]}(*𝑍*^{[1]}) (15)

where[*.*]is the number of layer,*𝑋*and*𝐴*^{[1]}are the input and output of
the 1st layer, respectively,*𝑊*^{[1]},*𝑏*^{[1]}and*𝑔*^{[1]}(*.*)are the weights, biases,

*𝑋*=

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎣

1 … _{1}

*𝑥*^{(1)[1]}

2 …*𝑥*^{(128)[1]}

2

*.*
*.*
*.*
*𝑥*^{(1)[1]}

120 …*𝑥*^{(128)[1]}

120

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎦(*𝑛*^{[0]}×*𝑁*)=(120×128)

(17)

*𝑏*^{[1]}=

⎡⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎣
*𝑏*^{(1)[1]}

1 …*𝑏*^{(128)[1]}

1

*𝑏*^{(1)[1]}

2 …*𝑏*^{(128)[1]}

2

*𝑏*^{(1)[1]}

3 …*𝑏*^{(128)[1]}

3

*.*
*.*
*.*
*𝑏*^{(1)[1]}

300 …*𝑏*^{(128)[1]}

300

⎤⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎦(*𝑛*^{[1]}×*𝑁*)=(300×128)

(18)

*𝑍*^{[1]}=

⎡⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎣
*𝑧*^{(1)[1]}

1 …*𝑧*^{(128)[1]}

1

*𝑧*^{(1)[1]}

2 …*𝑧*^{(128)[1]}

2

*𝑧*^{(1)[1]}

3 …*𝑧*^{(128)[1]}

3

*.*
*.*
*.*
*𝑧*^{(1)[1]}

300 …*𝑧*^{(128)[1]}

300

⎤⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎦(*𝑛*^{[1]}×*𝑁*)=(300×128)

(19)

*𝐴*^{[1]}=

⎡⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎢

⎢⎣
*𝑎*^{(1)[1]}

1 …*𝑎*^{(128)[1]}

1

*𝑎*^{(1)[1]}_{2} …*𝑎*^{(128)[1]}_{2}
*𝑎*^{(1)[1]}_{3} …*𝑎*^{(128)[1]}_{3}

*.*
*.*
*.*
*𝑎*^{(1)[1]}

300 …*𝑎*^{(128)[1]}

300

⎤⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎥

⎥⎦(*𝑛*^{[1]}×*𝑁*)=(300×128)

(20)

where*𝑛*^{[𝓁]} is the number of nodes in𝓁^{th}layer. The matrix dimension
for*𝑊*^{[𝓁]},*𝑏*^{[𝓁]},*𝑍*^{[𝓁]}, and*𝐴*^{[𝓁]}are(

*𝑛*^{[𝓁]}×*𝑛*^{[𝓁−1]})

,(*𝑛*^{[𝓁]}×*𝑁*),(*𝑛*^{[𝓁]}×*𝑁*),
and(*𝑛*^{[𝓁]}×*𝑁*), respectively, where*𝑁* is the size of minibatch, which
is assumed to be 128 in this work. Generally, the forward propagation
process for all layers is given as:

*𝑍*^{[𝓁]}=*𝑊*^{[𝓁]}*𝐴*^{[𝓁−1]}+*𝑏*^{[𝓁]} (21)

*𝐴*^{[𝓁]}=*𝑔*^{[𝓁]}(*𝑍*^{[𝓁]}) (22)

where*𝐴*^{[𝓁−1]}is the input of the𝓁th layer.

**Fig. 2.** The structure of the proposed DL-based post-EQ.

**Fig. 3.** The BER performance against (*𝐸*_{b}∕*𝑁*_{0})_{test}with the DL-based EQ for a range of
(*𝐸*b∕*𝑁*0)trainfor 128 QAM and signal BW of 5 MHz.

The rectified linear unit (ReLU) and Sigmoid values are used for the intermediate and output layers, respectively, which are given as [28]:

ReLU (*𝑥*) =

{*𝑥,* *𝑥 >*0

0*,* *𝑥*≤0*,* (23)

Sigmoid= 1

1 + exp(−*𝑥*) (24)

There are two types of learning process for ML algorithms: su-
pervised and unsupervised. In the former, which is used here, the
network is trained based on the input/output pairs known as the
labeled data [27]. The labeled data (*𝑥*, *𝑦*) is shown as ports**A**and**B**
inFig. 1. Port**A**, i.e., the input to DNN, is the received signal with the
LED and channel induced impairment, and port**B**is the desired CAP
signal, which is used as the prediction target.

Accordingly, a loss function*L* is defined to determine the error,
which indicates the difference between the outputs of DNN and the
correct answers.*L*is expressed by various metrics [30], and here we
have used the mean square error (MSE), which is defined as:

## MSE(𝑎, 𝑦) = 1 𝑁

∑*𝑁*

*𝑖*=1

(*𝑎*^{𝑖}−*𝑦*^{𝑖})^{2}*,* (25)

where*𝑎*is the estimated real output of DNN, and*𝑦*is the desired labeled
value.

In the backward propagation phase, the main aim of DL is to
find the correct weights and biases parameters of (*𝑤*,*𝑏*), respectively,
for minimizing L using different optimization algorithms. Stochastic
gradient descent (SGD) [31] is the most popular optimization algorithm
in DL, which is also adopted in this study. First,*𝑤*and*𝑏*are initialized
to a random values and the gradients of differentiable*𝐿*are calculated
based on*𝑤* and *𝑏*using the chain rule. Here the gradients of cross
entropy function are calculated as a useful example. Note, we have the
followings:

*𝑑𝐿*

*𝑑𝑥*=*𝑑𝑥,* (26)

*𝑑𝑎*^{[2]}= *𝑑𝐿*
*𝑑𝑎*^{[2]} = −*𝑦*

*𝑎*^{[2]}+ 1 −*𝑦*

1 −*𝑎*^{[2]}*,* (27)

*𝑑𝑧*^{[2]}= *𝑑𝐿*
*𝑑𝑧*^{[2]} = *𝑑𝐿*

*𝑑𝑎*^{[2]}×*𝑑𝑎*^{[2]}

*𝑑𝑧*^{[2]} =*𝑑𝑎*^{[2]}×*𝑔*^{′}(
*𝑧*^{[2]})

=
(−*𝑦*

*𝑎*^{[2]}+ 1 −*𝑦*
1 −*𝑎*^{[2]}

)

×(
*𝑎*^{[2]}(

1 −*𝑎*^{[2]}))

=*𝑎*^{[2]}−*𝑦,* (28)

*𝑑𝑤*^{[2]}= *𝑑𝐿*
*𝑑𝑤*^{[2]} = *𝑑𝐿*

*𝑑𝑎*^{[2]} × *𝑑𝑎*^{[2]}

*𝑑𝑤*^{[2]}

=
(−*𝑦*

*𝑎*^{[2]} + 1 −*𝑦*
1 −*𝑎*^{[2]}

)

×(
*𝑎*^{[2]}(

1 −*𝑎*^{[2]})

×*𝑎*^{[1]𝑇})

=(
*𝑎*^{[2]}−*𝑦*)

×*𝑎*^{[1]𝑇}=*𝑑𝑧*^{[2]}×*𝑎*^{[1]𝑇}*,* (29)

*𝑑𝑏*^{[2]}= *𝑑𝐿*
*𝑑𝑏*^{[2]} = *𝑑𝐿*

*𝑑𝑎*^{[2]}×*𝑑𝑎*^{[2]}

*𝑑𝑏*^{[2]} =*𝑑𝑧*^{[2]}=*𝑎*^{[2]}−*𝑦,* (30)

**Fig. 4.** Flowchart of training and test process.

**Fig. 5.** BER performance of the proposed system for 64- and 128-QAM for a range of
transmit signal BW.

Generally, the equation for the backward propagation for*𝑁*training
example is given as:

*𝑑𝑍*^{[𝓁]}=*𝑑𝐴*^{[𝓁]}×*𝑔*^{′[𝓁]}(*𝑍*^{[𝓁]})*,* (31)
*𝑑𝑊*^{[𝓁]}= 1

*𝑁* ×*𝑑𝑍*^{[𝓁]}×*𝐴*^{[𝓁−1]𝑇}*,* (32)

*𝑑𝑏*^{[𝓁]}= 1

*𝑁* ×*𝑑𝑍*^{[𝓁]}*,* (33)

*𝑑𝐴*^{[𝓁−1]}=*𝑊*^{[𝓁]𝑇}×*𝑑𝑍*^{[𝓁]}*,* (34)

where the dimension of matrices*𝑑𝑊*^{[𝓁]},*𝑑𝑍*^{[𝓁]},*𝑑𝑏*^{[𝓁]}, and*𝑑𝐴*^{[𝓁−1]}equal
to(*𝑛*^{[𝓁]}×*𝑛*^{[𝓁−1]}),(*𝑛*^{[𝓁]}×*𝑁*),(*𝑛*^{[𝓁]}×*𝑁*), and(*𝑛*^{[𝓁]}×*𝑁*), respectively. The
updated and optimized weights and biases in each layer are given as:

*𝑤*∶ =*𝑤*−*𝛼𝑑𝑤,* (35)

*𝑏*∶ =*𝑏*−*𝛼𝑑𝑏,* (36)

where,*𝛼* is the learning rate. The backward propagation algorithm is
repeated for*𝑚*∕*𝑁*times to complete one epoch, where*𝑚*is the number
of complete training features for each node in the input layer. An epoch
is completed when all dataset is applied to the DL algorithm. Although
determining the number of epochs is different due to the convergence
of algorithm, but it is considered to 100 in this study.

As shown inFig. 2, prior to the use of activation function, the batch
normalization method is applied, which speeds up the optimization
process to provide higher*𝛼*. Note, (*i*) according to [32], the demand for
the regularization technique is reduced noticeably; and (*ii*) the adaptive
moment estimation (Adam) optimizer is employed due to faster conver-
gence. More detailed information can be found in [33]. The pseudocode
for the utilized learning algorithm is explained in Algorithm 1. To avoid
overfitting and make the model more generalized, the dataset is divided
into two sets of training and validation [34]. In the validation set, the
hyperparameters can be changed and tuned to minimize the error in
the produced model, which is fitted on the training dataset. In the test

**Table 1**
DNN parameters.

Symbol Parameter Value

*𝑛*^{[0]} Number of nodes for input layer 120

– Number of hidden layers 2

*𝑛*^{[1]} Number of nodes for 1st hidden layer 300
*𝑛*^{[2]} Number of nodes for 2nd hidden layer 400

*𝑛*^{[3]} Number of nodes for output layer 120

*𝑔*(*.*) Activation function for hidden layers ReLU
*𝑔*(*.*) Activation function for output layer Sigmoid

– Number of Batch normalization layers 2

– Optimizer SGD with Adam

– (*𝐸*_{b}∕*𝑁*_{0})_{train} 5, 10, 15, 20 dB

– Default (optimum) (*𝐸*b∕*𝑁*0)train 10 dB

*L* Loss function MSE

*𝛼* Learning rate 0.01

*𝑁* Size of minibatch 128

*𝑚* Number of training samples for each node 10^{6}

– Size of training dataset 120×10^{6}

– Size of test dataset 30000

– Number of epoch 100

**Table 2**
System parameters.

Symbol Parameter Value

– Room size 5×5×3 m^{3}

*𝜙*_{1∕2} Semi-power half angle 70^{◦}

*𝜓*_{𝑐} PD field of view 60^{◦}

*𝐴* PD detector area 1 cm^{2}

*𝑅*_{PD} Responsivity of PD 0.54 A/W

*𝑇*_{𝑠}(*𝜓*) Optical filter gain 1
*𝐵*_{𝑚} LED modulation bandwidth 4.5 MHz
*𝛽* Roll of factor of*I*/*Q*filters 0.5

*𝑛*_{𝑠} Up-sampling factor 6

*𝐿*_{𝑓} *I*/*Q*filter length 10 sym.

*𝐵*_{𝑡} Total bandwidth 5–25 MHz

– Modulation type QAM

*𝑀* Modulation order 64 & 128

*𝐺*_{TIA} TIA gain 50–60 dB

– Tx position Center of ceiling

– Rx position LOS scenario: Center of floor

NLOS scenario: (1.15, 1.15, 0.85) m

phase, which is done online, the new and never-before-seen received CAP signals are used as the input dataset in the DNN EQs. The trained network attempts to recover them and minimize the error based on the gained knowledge. All the parameters for DNN are summarized in Table 1.

## 3. Result and discussion

In this section, we investigate the performance of the proposed
CAP-VLC system with a DL-based post-EQ by means of computer simu-
lations. MATLAB and Python are used to simulate VLC subsystem and
the DL-based EQ, respectively. All key system parameters adopted are
summarized inTable 2. To train the network, we have used a range
of energy per bit to noise ratios (*𝐸*_{b}∕*𝑁*_{0})_{train}. Due to the computation
time, we considered only (*𝐸*_{b}∕*𝑁*_{0})_{train} values of 5, 10, 15, and 20 dB.

Fig. 3depicts the BER performance against the (*𝐸*_{b}∕*𝑁*_{0})_{test}for a range
of (*𝐸*_{b}∕*𝑁*_{0})_{train}values, 128-QAM and a signal bandwidth of 5 MHz. It
is shown that, the best (optimum) BER performance is achieved for
(*𝐸*_{b}∕*𝑁*_{0})_{train}of 10 dB, with the (*𝐸*_{b}∕*𝑁*_{0})_{test}gains of 0.2, 0.8, and 5.7 dB
compared with the (*𝐸*_{b}∕*𝑁*_{0})_{train}values of 15, 20 and 5 dB, respectively,
at the 7% forward error correlation (FEC) BER limit of 3.8 ×10^{−3}.
Considering the trade-off between the simulation time and estimation
of the BER, the coefficients of the DNN network corresponding to the
optimum (*𝐸*_{b}∕*𝑁*_{0})_{train} of 10 dB is adopted for the rest of evaluation
during the test phase.Fig. 4 illustrates the flowchart of training and
test processes.

Fig. 5 compares the BER performance of the proposed system for 64- and 128-QAM for a range of transmit signal BW. Referring to the

7% FEC BER limit, 128-QAM with a signal BW of 5 MHz offers the best
performance with the (*𝐸*_{b}∕*𝑁*_{0})_{test}gains of 1, 1.8, 6, and 6 dB compared
with 10, 15, and 20 MHz 128-QAM, and 25 MHz 64-QAM, respectively.

*𝑅*_{𝑑}of 93.4 Mb/s and the spectral efficiency of 4.67 b/s/Hz are attained
for 20 MHz 128-QAM with respect to the 7% FEC BER limit. To achieve
higher*𝑅*_{𝑑}, the proposed system is also examined for 64-QAM and a
signal BW of 25 MHz. Results demonstrate that*𝑅*_{𝑑}is improved by up
to∼6 Mb/s reaching 100 Mb/s with respect to the 7% FEC BER limit.

At the Rx side, the small value of peak-to-peak injection current*𝐼*_{pp}
leads to lower*𝐸*_{b}∕*𝑁*_{0} deterioration, which degrades the BER perfor-
mance. Furthermore, large values of *𝐼*_{pp} may exceed the LED linear
range, thus resulting in nonlinear distortions.Fig. 6depicts the BER
performance against*𝐼*_{pp} for (*𝐸*_{b}∕*𝑁*_{0})_{train} = 10dB,*𝑀* = 64and BW of
25 MHz, respectively. As can be seen, the lowest BER of∼2.26×10^{−3}
is achieved at 0.7≤*𝐼*_{pp}of≤0.8 A. Therefore,*𝐼*_{pp}of 0.8 A is considered
as the default value in this study.

Fig. 7(a) depicts the BER performance of the proposed system with
64-QAM and a BW of 25 MHz for three different post-EQ schemes. It is
clearly seen that, at the 7% FEC BER limit DL offers (*𝐸*_{b}∕*𝑁*_{0})_{test}gains 2
and 8 dB compared with Volterra and LMS EQs, respectively, which can
be used to trade-off transmission range against*𝑅*_{𝑑}. Note, reducing the
bandwidth results in improved BER performance as shown inFig. 7(b)
with the DL-based EQ outperforming LMS and Volterra. For instance,
at the BER of 3.8×10^{−3}, the (*𝐸*_{b}∕*𝑁*_{0})_{test}penalties are∼6 and 10 dB
for Volterra and LMS, respectively compared with the DL method.

Next, we evaluated the BER performance of LOS and LOS + NLOS 64-QAM VLC links with a signal BW of 25 MHz. As can be seen in Fig. 8, the overall BER performance of pure LOS and LOS + NLOS links are almost the same.

Finally, the complexity of the DL-based EQ is compared with LMS and the 2nd order Volterra series EQ as summarized inTable 3[35].

As it is expected, the computational complexity of the DL-based EQ is
approximately 10^{4} times larger than other two EQs. However, in this
work the proposed system with only two hidden layers is considered,
which is a simple DNN using DL technique, while more complex net-
works can be implemented using field programmable gate array (FPGA)
boards. As an example, in [36] a DNN with two hidden layers was
implemented using FPGA (Xilinx Zynq UltraScale) with the complexity
of about 182007, thus demonstrating the potential use of FPGA in DL-
based EQs in wireless communications. Note, *𝑀*_{𝐿} and*𝑀*_{NL} are the
linear and nonlinear memory depths, respectively.*𝑀*_{1}, *𝐻*_{1},*𝐻*_{2}, and
*𝐻*_{3}are the number of nodes in input, first hidden layer, second hidden
layer, and output layer, respectively.

## 4. Conclusion

In this paper, we introduced a deep learning-based post-EQ for

‘‘single’’ CAP-VLC system considering both the LED’s bandwidth lim- itation and its nonlinearity effects. The results showed that, the highest data rate of about 100 Mb/s is achieved for 64-QAM, the signal bandwidth of 25 MHz and the LED modulation bandwidth of 4.5 MHz (i.e., standard white LED). Finally, we compared the performance of the proposed scheme with two common EQs of LMS and Volterra series, demonstrating improved BER results.

Algorithm 1. Pseudo-code of DNN training algorithm
**Input:**Number of layers*𝐿*, number of neurons for all layers,
activation functions, loss function, number of epochs*𝐸*, number of
transmitted batches in one epoch*𝑁*, number of training example
in each minibatch*𝑚*, learning rate*𝛼*= 0*.*01,*𝛽*_{1}= 0*.*9,*𝛽*_{2}= 0*.*999,
*𝜂*= 10^{−8}

**Output:**A trained DNN

**1**:**Initialize**weights and biases for all layers randomly
**2: for**epoch = 1 to*𝐸***do**

## 3: for𝑡= 1to𝑁do

**Fig. 6.** The BER against*𝐼*_{pp}for the 64-QAM CAP-VLC system with the DL-based EQ.

**Fig. 7.**The BER performance against (*𝐸*b∕*𝑁*0)testfor three different EQs schemes for the signal bandwidth of: (a) 25 MHz, and (b) 20 MH.

**Table 3**

Comparison of complexity for three different EQs.

Type of EQs *𝑀*_{𝐿} *𝑀*_{𝑁 𝐿} *𝑀*_{1} *𝐻*_{1} *𝐻*_{2} *𝐻*_{3} Complexity Value of complexity

DL-based – – 120 300 400 120 *𝑀*_{1}×*𝐻*_{1}+*𝐻*_{1}×*𝐻*_{2}+*𝐻*_{3} 156400

LMS 6 – – – – – *𝑀*_{𝐿} 6

Volterra series 6 6 – – – – *𝑀*_{𝐿}+*𝑀*_{𝑁 𝐿}(*𝑀**𝑁 𝐿*+ 1)∕2 27

Algorithm 1. Pseudo-code of DNN training algorithm
**4: for**𝓁= 2 to*𝐿***do**

**5: do**Batch normalization as follow:

**6:***𝜇𝛽*← ^{1}

*𝑚*

∑*𝑚*
*𝑖*=1*𝑥*_{𝑖}
**7:***𝜎*_{𝛽}^{2}← ^{1}

*𝑚*

∑*𝑚*

*𝑖*=1(*𝑥*_{𝑖}−*𝜇𝛽*)^{2}
**8:***𝑥*^{∧}_{𝑖} ←*𝑥*_{𝑖}-√^{𝜇𝛽}

*𝜎*^{2}_{𝛽}+*𝜖*

**9:***𝑠*_{𝑖}←*𝛾 𝑥*^{∧}_{𝑖} +*𝛽*

**10: do**forward propagation
**11:**end for

**12: calculate**loss function
**13: for**𝓁= 2to L

**14: do**backward propagation

**15: initialize***𝑉*_{𝑑𝑤},*𝑉*_{𝑑𝑏},*𝑆*_{𝑑𝑤},*𝑆*_{𝑑𝑏}to zero
**16:***𝑉*_{𝑑𝑤}=*𝛽*_{1}*𝑉*_{𝑑𝑤}+(

1 −*𝛽*_{1})
*𝑑𝑤*
**17:***𝑉*_{𝑑𝑏}=*𝛽*_{1}*𝑉*_{𝑑𝑏}+(

1 −*𝛽*_{1})
*𝑑𝑏*

Algorithm 1. Pseudo-code of DNN training algorithm
**18:***𝑆*_{𝑑𝑤}=*𝛽*_{2}*𝑆*_{𝑑𝑤}+(

1 −*𝛽*_{2})
*𝑑𝑤*^{2}
**19:***𝑆*_{𝑑𝑏}=*𝛽*_{2}*𝑆*_{𝑑𝑏}+(

1 −*𝛽*_{2})
*𝑑𝑏*^{2}
**20:***𝑉*^{𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑}

*𝑑𝑤* = ^{V}^{dw}

1−*𝛽*_{1}^{𝑡}

**21:***𝑉*^{𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑}

*𝑑𝑏* =_{1−𝛽}^{V}^{db}*𝑡*
1

**22:***𝑆*^{𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑}

*𝑑𝑤* = ^{S}^{dw}

1−*𝛽*^{𝑡}
2

**23:***𝑆*_{𝑑𝑏}^{𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑}= ^{S}^{db}

1−*𝛽*^{𝑡}
2

**24: update**all weights and biases as follow:

**25:***𝑤*=*𝑤*−*𝛼* ^{𝑉}

*𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑*

√*𝑑𝑤*
*𝑆*^{𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑}_{𝑑𝑤} +*𝜀*

**26:***𝑏*=*𝑏*−*𝛼* ^{𝑉}

*𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑*

√*𝑑𝑏*
*𝑆*^{𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑}_{𝑑𝑏} +*𝜀*

**27:**end for
**28:**end for
**29:**end for

**Fig. 8.** The BER performance against (*𝐸*b∕*𝑁*0)testfor LOS and LOS + NLOS 64-QAM
VLC systems.

## Declaration of competing interest

The authors declare that they have no known competing finan- cial interests or personal relationships that could have appeared to influence the work reported in this paper.

**Data availability**

No data was used for the research described in the article.

**Acknowledgments**

The authors would like to thank anonymous reviewers for their helpful comments.

## Funding

We have not received any funding support.

**References**

[1] Harald Haas, Jaafar Elmirghani, Ian White, Optical wireless communication, Phil.

Trans. R. Soc. (2020)http://dx.doi.org/10.1098/rsta.2020.0051.

[2] Galefang Allycan Mapunda, Reuben Ramogomana, Leatile Marata, Bokamoso Basutli, Amjad Saeed Khan, Joseph Monamati Chuma, Indoor visible light communication: A tutorial and survey, hindawi, Wirel. Commun. Mob. Comput.

(2020)http://dx.doi.org/10.1155/2020/8881305.

[3] Carlos Medina, Mayteé Zambrano, Kiara Navarro, LED based visible light communication: technology, applications and challenges – a survey, Int. J. Adv.

Eng. Technol. 8 (4) (2015) 482–495,http://dx.doi.org/10.7323/ijaet/v8iss4.

[4] A. Burton, Z. Ghassemlooy, S. Zvanovec, P.A. Haigh, H.L. Minh, X. Li, Investiga- tion into using compensation for the nonlinear effects of the output of LEDs in visible light communication systems, in: 2nd West Asian Colloquium on Optical Wireless Communications, WACOWC, Tehran, Iran, 2019, pp. 80–84.

[5] Toshihiko Komine, Jun Hwan Lee, Shinichiro Haruyama, Masao Nakagawa, Adaptive equalization system for visible light wireless communication utilizing multiple white LED lighting equipment, IEEE Trans. Wirel. Commun. 8 (6) (2009) 2892–2900,http://dx.doi.org/10.1109/TWC.2009.060258.

[6] Grzegorz Stepniak, Jerzy Siuzdak, Piotr Zwierko, Compensation of a VLC phosphorescent white LED nonlinearity by means of volterra DFE, IEEE Photonics Technol. Lett. 25 (16) (2013) 1597–1600,http://dx.doi.org/10.1109/LPT.2013.

2272511.

[7] P.A. Haigh, Z. Ghassemlooy, S. Rajbhandari, I. Papakonstantinou, W. Popoola, Visible light communications: 170 Mb/s using an artificial neural network equalizer in a low bandwidth white light configuration, J. Lightwave Technol.

32 (9) (2014) 1807–1813,http://dx.doi.org/10.1109/JLT.2014.2314635.

[8] Shaveta Dargan, Munish Kumar, Maruthi Rohit Ayyagari, Gulshan Kumar, A survey of deep learning and its applications: a new paradigm to machine learning, Arch. Comput. Methods Eng. 27 (2020) 1071–1092,http://dx.doi.org/

10.1007/s11831-019-09344-w.

[9] Shi Dong, Ping Wang, Khushnood Abbas, A survey on deep learning and its applications, Comp. Sci. Rev. 40 (2021)http://dx.doi.org/10.1016/j.cosrev.2021.

100379.

[10] Feifan Liao, Shengyun Wei, Shun Zou, Deep learning methods in communication systems: a review, in: 2nd International Conference on Electronic Engineering and Informatics, IOP Publishing, 2020,http://dx.doi.org/10.1088/1742-6596/

1617/1/012024.

[11] Hoon Lee, Sang Hyun Lee, Tony Q.S. Quek, Inkyu Lee, Deep learning framework for wireless systems: applications to optical wireless communications, IEEE Commun. Mag. 57 (3) (2019) 35–41, http://dx.doi.org/10.1109/MCOM.2019.

1800584.

[12] Tim O’Shea, Jakob Hoydis, An introduction to deep learning for the physical layer, IEEE Trans. Cogn. Commun. Netw. 3 (4) (2017) 563–575,http://dx.doi.

org/10.1109/TCCN.2017.2758370.

[13] Zhijin Qin, Hao Ye, Geoffrey Ye Li, Biing-Hwang Juang, Deep learning in physical layer communications, IEEE Wirel. Commun. 26 (2) (2019) 93–99, http://dx.doi.org/10.1109/MWC.2019.1800601.

[14] Hao Ye, Geoffrey Ye Li, Biing-Hwang Juang, Power of deep learning for channel estimation and signal detection in OFDM systems, IEEE Wirel. Commun. Lett. 7 (1) (2018) 114–117,http://dx.doi.org/10.1109/LWC.2017.2757490.

[15] Nan Chi, Junlian Jia, Fangchen Hu, Yiheng Zhao, Peng Zou, Challenges and prospects of machine learning in visible light communication, J. Commun. Inf.

Netw. 5 (3) (2020) 302–309,http://dx.doi.org/10.23919/JCIN.2020.9200893.

[16] Mehmet Gorkem Ulkar, Tuncer Baykas, Ali E. Pusane, VLCnet: deep learning based end-to-end visible light communication system, J. Lightwave Technol. 38 (21) (2020) 5937–5948,http://dx.doi.org/10.1109/JLT.2020.3006827.

[17] Pu Miao, Pu Miao, Chenhao Qi, Yi Jin, Chong Lin, A model-driven deep learning method for LED nonlinearity mitigation in OFDM-based optical communications, IEEE Access 7 (2019) 71436–71446, http://dx.doi.org/10.1109/ACCESS.2019.

2919983.

[18] Yonghe Zhu, Chen Gong, Jianghua Luo, Meiyu Jin, Xianqing Jin, Zhengyuan Xu, Indoor non-line of sight visible light communication with a bi-LSTM neural network, in: IEEE International Conference on Communications Workshops (ICC Workshops), 2020, http://dx.doi.org/10.1109/ICCWorkshops49005.2020.

9145317.

[19] C. Chen, L. Zeng, X. Zhong, S. Fu, M. Liu, P. Du, Deep learning-aided OFDM- based generalized optical quadrature spatial modulation, IEEE Photonics J. 14 (1) (2022) 1–6,http://dx.doi.org/10.1109/JPHOT.2021.3129541.

[20] Mahdi Nassiri, Saeed Ashouri, Gholamreza Baghersalimi, Zabih Ghassemlooy, A comparative performance assessment between pre-equalization and post equal- ization techniques in m-CAP indoor VLC systems with bit loading, in: 1st West Asian Colloquium on Optical Wireless Communications, WACOWC2018.

[21] Zhaocheng Wang, Tianqi Mao, Qi Wang, Optical OFDM for visible light com- munications, in: IEEE, 13th International Wireless Communications and Mobile Computing Conference, IWCMC, 2017,http://dx.doi.org/10.1109/IWCMC.2017.

7986454.

[22] Mahdi Nassiri, Atiyeh Pouralizadeh, Gholamreza Baghersalimi, Zabih Ghassem- looy, A hybrid variable m-CAP based indoor visible light communications and fingerprint positioning system, in: 3rd West Asian Colloquium on Optical Wireless Communications, WACOWC, 2020, http://dx.doi.org/10.1109/WASOWC49739.

2020.9410163.

[23] N. Bamiedakis, R.V. Penty, I.H. White, Carrier-less amplitude and phase modula- tion in wireless visible light communication systems, Phil. Trans. R. Soc. (2019) http://dx.doi.org/10.1098/rsta.2019.0181.

[24] Z. Ghassemlooy, W. Popoola, S. Rajbhandari, Optical Wireless Communications:

System and Channel Modelling with Matlab^{®}, CRC Press, 2019.

[25] M. Nassiri, G. Baghersalimi, Z. Ghassemlooy, Optical OFDM based on fractional fourier transform for an indoor VLC system, Appl. Opt. 60 (9) (2021) 2664–2671, http://dx.doi.org/10.1364/AO.416565.

[26] M. Nassiri, S. Ashouri, G. Baghersalimi, Z. Ghassemlooy, Performance evaluation and comparison of spatial modulation and non-DC MIMO m-CAP techniques in indoor VLC system, IET Optoelectron. (2019) 281–287,http://dx.doi.org/10.

1049/iet-opt.2018.5076.

[27] Md Zahangir Alom, Tarek M. Taha, Chris Yakopcic, Stefan Westberg, Pa- heding Sidike, Mst Shamima Nasrin, Mahmudul Hasan, Brian C. Van Essen, Abdul A.S. Awwal, Abdul A.S. Awwal, A state-of-the-art survey on deep learning theory and architectures, Electronics 8 (3) (2019) http://dx.doi.org/10.3390/

electronics8030292.

[28] Priti G. Pachpande, Monette H. Khadr, Ahmed F. Hussein, Hany Elgala, Visible light communication using deep learning techniques, in: IEEE 39th Sarnoff Symposium, 2018,http://dx.doi.org/10.1109/SARNOF.2018.8720493.

[29] Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, MIT Press, 2016.

[30] Qi Wang, Yue Ma, Kun Zhao, Yingjie Tian, A comprehensive survey of loss function in machine learning, J. Ann. Data Sci. (2020) http://dx.doi.org/10.

1007/s40745-020-00253-5.