WiMi Researches on 3D Detection Algorithm By Multi-Channel Convolutional Neural Network

With the rapid development of computer vision and artificial intelligence, there are many intelligent detection technologies, such as image-based object detection and scene recognition. In recent years, significant progress has been made in the research and application of convolutional neural networks in visual recognition. For example, convolutional neural networks are applied to algorithm research in three-dimensional object detection.

It is said that the R&D team of WiMi Hologram Cloud Inc. (NASDAQ: WIMI) is developing a 3D object detection algorithm based on a multi-channel convolutional neural network, which uses RGB images, depth images, and BEV images as network inputs to regression the category, 3D size and spatial position of objects respectively. The multi-channel neural network system combines RGB, depth, and BEV images to realize 3D target detection.

BEV images provide information perpendicular to the camera’s viewpoint and can represent the spatial distribution of objects. Point cloud projection generates a BEV image, which is used as the neural network input to improve the accuracy of 3D target detection. The convolutional neural network is used to process the input point cloud data directly, the coding and feature extraction problems of disordered point clouds are solved, and the end-to-end regression of the three-dimensional boundary box is obtained. Only 3D suggestion frames are extracted directly from monocular images, and 3D boundary frames are estimated. Laser point clouds are combined with visual information, and point clouds are projected into bird ‘s-eye view images (BEVs). The data is input into the convolutional neural network, and the information is fused to estimate the 3D boundary box. The fusion of multiple information is beneficial to detect the target in 3D space better.

The three-dimensional object detection algorithm based on a multi-channel convolutional neural network developed by WiMi can simultaneously identify the category, spatial position, and three-dimensional size of objects, significantly improving object detection’s accuracy and efficiency. The multi-channel target detection neural network system can extend the two-dimensional image target detection to three-dimensional target detection. The input expands to three channels, including RGB images, depth images, and BEV images. Firstly, RGB, depth, and BEV images are used as the network’s input. The convolutional neural network obtains the feature map, and the spatial pyramid pool layer generates the feature vector of the suggested region in the feature map. Then the target’s classification and position regression are realized using classifiers and regressors. The classifier is mainly used to judge which category the feature extracted in the proposal belongs to. Finally, multitask regression is performed through two fully connected layers to predict object categories and 3D bounding boxes.

Three-dimensional object detection and recognition have always been an essential technology in computer vision, which is the basis for machines to understand and interact with the world. At present, three-dimensional object detection technology has many applications in navigation, intelligent robots, uncrewed vehicles, security monitoring, and other fields.

With the progress of 3D data acquisition technology, the enhancement of computing power, the development of deep learning technology, and the increase in application demand, the research and application of 3D vision technology have received more and more attention. The 3D object detection algorithm based on a multi-channel convolutional neural network researched by WiMi has extensive application prospects in the fields of automatic driving, intelligent robot, ARVR, remote sensing, biomedicine, and so on.

3D object detection algorithm technology is the integration of ground wireless communication technology, mobile communication technology of medium and high-low orbit satellites and short-distance direct communication technology. With 3D object detection algorithm, we are closer to a world of holographic spatial information interconnection. At this stage, the physical world is highly represented in detail in the digital field, analyzed and processed, and people will be placed in it to achieve holographic spatial information interconnection. As a critical infrastructure, the 6G network provides high-capacity links with low-end end-to-end latency and secure computing capabilities throughout the network, enabling virtual representations of people and physical devices to exchange information with each other in the digital domain.

In the new era, WiMi Hologram Cloud combines artificial intelligence, big data, and other technologies to build a number of agents to realize the organic integration and multi-way interaction between the physical world and the digital world. At the same time, WiMi provides new research aspects on virtual and real integration. AR, Holographic, VR, and AI technology are not only simple display tools, they will be more fusion of the artificial intelligence industry and a variety of interactive ways. WiMi Hologram Cloud, Inc. will realize the transition of people-things interaction, building content tools, and artificial intelligence integration common system, to create “the actual fusion, the extension of space and time, embodied interaction, comprehensive communication” vision.