An FPGA Platform Proposal for Real-Time Acoustic Event Detection: Optimum Platform Implementation for Audio Recognition with Time Restrictions

Marcos Hervás 1,*,‡ and Rosa Ma Alsina-Pagès 1,*,‡ 1 GTM—Grup de Recerca en Tecnologies Mèdia, La Salle—Universitat Ramon Llull. C/Quatre Camins, 30, 08022 Barcelona, Spain * Correspondence: mhervas@salleurl.edu (M.H.); ralsina@salleurl.edu (R.M.A.-P.); Tel.: +34-932-902-445 (M.H.); +34-932-902-455 (R.M.A.-P.) † Presented at the 3rd International Electronic Conference on Sensors and Applications, 15–30 November 2016; Available online: https://sciforum.net/conference/ecsa-3. ‡ These authors contributed equally to this work.


Introduction
Monitoring all kind of human activities has never been as common as today, when it is usual to have many sensors spread along the city, some factories or even our homes. Many surveillance systems operate nowadays, taking advantage of all the data that the current technology enables us to obtain and process [1].
The Grup de Recerca en Tecnologies Mèdia (GTM) has conducted several research projects in the study of acoustic signal processing and event recognition for different applications. In indoor applications, an approach to ambient assisted living purposes was conducted in [2] to help the diagnosis of the first stages of people with dementia. For outdoor applications, especially oriented to smartcity environments, we have detected types of vehicles [3], classified events in soundscapes [4] and even specifically in surveillance applications [5].
Nowadays, GTM is involved in two projects related to audio event detection. DYNAMAP is a project funded by the European Commission (LIFE ENV/IT/001254) which aim is to develop a low cost sensor network for real time noise mapping in the cities [6]. GTM develops in the project an Anomalous Event Detection Algorithm in order to avoid the noise computation of any other event but the traffic noise to calculate the noise maps of the city [7]. The second project, named HomeSound (2014-SGR-0590), consists of programming a low-cost GPU platform [8] for the audio event detection of fifteen in-home common sounds (e.g., water, walking, glass breaking, dog barking, etc.). The GPU platform is capable of computing the feature extraction and the machine learning methods to classify the environmental sounds real-time, and send the results to the cloud to be registered via Ethernet, or even to activate any kind of alarm. The real-time implementation of the conducted projects in GTM led us to the study of the best hardware platform in terms of efficiency and cost to implement these algorithms.
This paper is structured as follows. Section II explains several characteristics from the existing hardware platforms, while Section III details the hardware proposal. Finally, the conclusions of this first approach to the real-time low cost Field Programmable Gate Array (FPGA) proposal are enumerated.

Hardware Platforms Comparison
In this section, a brief comparison of several hardware platforms is performed. In this sense, leader microcontroller manufacturers are: (i) Renesas Technology; (ii) Freescale Semiconductor; (iii) ST Microelectronics; (iv) Microchip Technology; (v) NXP Semiconductors; (vi) Texas Instruments and (vii) Infineon Technologies [9], who provide general purpose 32-bit microcontrollers (MCU) for Internet of Things (IoT) and metering for low cost and low power embedded applications.
These families of MCU are based on ARM Cortex-M or proprietary architectures and they are able to work up to 240 MHz. Cortex-M0 and M0+ are used for low cost, low area and designed for higher performance. Cortex-M4 and Cortex-M7 include floating point and DSP capabilities [10].
However, these microcontrollers are not recommended in intensive real-time applications; despite Cortex-M4 and Cortex-M7 include floating point and DSP capabilities, due to the high cost of implementation of the signal processing algorithms. In a typical case of environmental audio event recognition case, the sampling frequency may be 48 kHz and an overlap of 50% between frames of at least 30 ms is desirable. Therefore, the cycle count and the time execution of the required algorithms, such as windowing, FFT, 48 FIR filters, DCT, etc. turns to be around 15 ms. Moreover, the system has to manage the TCP/IP stack and the acquisition process. This time estimation has been extracted from Table 1, where the execution cost in terms of cycle count and time for both a FIR filter of 32 samples and different FFT size are shown using the STM32F10x Digital Signal Processing (DSP) library from STMicrocontroller for the ARM cortex M architecture.
For this reason, a higher performance device should be used, such as application processors, true DSP or application processors with GPU coprocessor. Currently, there are some low cost open-source hardware platforms based on ARM cortex-A such as BeagleBoard, Raspberry Pi, CubieBoard, PhidgetSBC and UDOO [11], where Cortex-A is the architecture of the application processor provided by ARM. The cost of these platforms is from 35$ to 150$.

Hardware Proposal and Basic Algorithm Implementation
In this paper we propose to choose a low-cost alternative platform based on programmable logic able to exploit algorithm parallelization for real time applications.
The main difference between MCU and FPGA is their architectures and their programmability paradigm. Whereas a program in a MCU is executed as a sequential series of instructions, a FPGA contains an array of discrete logic resources that can be fully configurable to implement any algorithms which can fit it. The implementation in FPGA of any algorithm may take benefit from the ability to parallelize any part of the implementation. Initially, FPGAs were very expensive devices used generally for prototyping but currently the price is similar to an application microprocessor. The power consumption of the FPGA is higher than any MCU or DSP, because every part of the MCU or DSP has been designed and optimized to execute a deterministic function. However, FPGAs can be fully programmable to do any task the user is able to program. The platform presented is the low cost Basys-3 developed by Digilentinc [13], and it comprises an Artix-7 Xilinx FPGA (see Figure 1). The logic resources available are shown in Table 2.

Algorithm Description
The implementation of the firmware in a FPGA follows two criteria: (i) performance optimization which pretends to increase the maximum usable frequency in the design and (ii) area optimization which is based in reducing the number of logic resources required.
The algorithm presented in this paper is a proof of concept to evaluate the use of these kind of platforms in real time digital signal processing for acoustic event recognition problems. The block diagram for the feature extraction algorithm is shown in Figure 2. The implementations presented are: (i) windowing; (ii) FFT; (iii) 48 GTCC [5] filter banks and (iv) square root, which are parts of the features extraction, and they all meet a good trade-off between area and speed optimization. We assume that the audio frames are of 30 ms length, which corresponds with 1440 samples at 48 ksps. For this reason, the length of the programmed FFT is of 2048 samples. A Hamming windowing has been implemented with a series of 2048 registers, where the last samples are stored. When a predefined number of new data is introduced, the windowing process starts. A reduction area optimization has been carried out, saving the coefficients in 1 Block Ram, working as a Read-Only Memory (ROM) and multiplying the value of the memory for every address with the datum stored in the flip-flop with the same index. The output of every multiplication, which is done at 100MHz, correspond to the data input of the FFT. In Figure 3, a block diagram of the implementation carried out in VHDL is shown. The FFT implementation has been done with a Xilinx IPCore. The algorithm selected is radix2 with 2048 transform length, it requires 26,659 transform cycles, and the resources used are summarized in Table 3. The transform cycles is much lower than the ARM implementation presented in Table 1, with a high number of data inputs. The algorithm selected was carried out to minimize the logic resources needed.
Finally, the implementation of the module of the FFT with the multiplication, ADD and SQRT blocks, and the parallelization of the 48 filter banks developed in series is shown in Figure 4.

Conclusions
We conclude that the Basys-3 FPGA platform is a good trade-off between cost and features for the audio detection algorithm implementation. It satisfies the restrictions for the real-time performance in typical conditions for that application. After the proof of principle of the test of the feature extraction presented in this paper, we plan to develop several machine-learning algorithms in VHDL to work in the FPGA, and evaluate the cost of the entire algorithm performing in the proposed platform. The results presented in this work encourage us to the use of this programmable logic platform able to exploit parallelization for real-time algorithms. This results encourage us to implement an embedded microcontroller in the FPGA, Microblaze, to control the system remotely through Ethernet and to compute easily non-intensive parts of the algorithm, due to the quantity of free resources available. Author Contributions: Marcos Hervás has performed the simulations and has written part of the paper. Rosa Ma Alsina-Pagès works for DYNAMAP project and conceived the tests, and wrote the other part of the paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: