WiMi Developed Deep Learning-based Multi-modal Video Recommendation System

Press Releases

Sep 07, 2023

BEIJING, Sept. 7, 2023 /PRNewswire/ — WiMi Hologram Cloud Inc. (NASDAQ: WIMI) (“WiMi” or the “Company”), a leading global Hologram Augmented Reality (“AR”) Technology provider, today announced that it developed a deep learning-based multi-modal video recommendation system. This emerging technology uses advanced algorithms and multi-modal data analysis to provide users with personalized video recommendation services, enabling a whole new world of movie watching for users.

The core of WiMi’s recommendation system is a deep learning algorithm, which is capable of extracting rich hidden features from video data and generating accurate recommendations based on the user’s personal preferences. Among them, feature extraction is the key step of the whole system. Currently, the technology adopts a convolutional neural network (CNN) as the main algorithm for feature extraction. CNN is a deep learning model based on neural networks with excellent image processing and feature extraction capabilities. In the multi-modal video recommendation system, we use CNN to dig out the hidden features of users and videos from video footage datasets. The algorithm contains three main parts: convolutional layer, pooling layer and fully connected layer.

The convolutional layer is the core of CNN that recognizes and extracts various features from the input data. Through multiple convolutional operations, it can capture contextual features from video footage data, including the type of video, title, cover, etc. The extraction of these features allows the system to better understand the video content and user preferences.

The pooling layer plays the role of compression and screening in the feature extraction process. It is able to select representative local features and compress the data into a more compact representation. Through the operation of the pooling layer, the system is able to process large-scale video data more efficiently and understand the user’s interests better.

The fully connected layer is the final layer of a CNN. The fully connected layer is the last layer in the CNN. With the operation of the fully connected layer, the system is able to combine the user’s personalized information with the features of the video to calculate the user’s potential interest and preferences for the video.

To implement this algorithm, WiMi slightly changed the the CNN structure. This model consists of four key components: an input layer, a convolutional layer, a pooling layer, and an output layer.

In a video recommendation system, the input layer plays the role of converting the raw data into a digital matrix. This matrix represents the data required for the next convolutional operation. Then, the contextual features of the input data are extracted from the video footage dataset through three convolutional layers. These convolutional layers are designed to have different dimensions to better capture the diversity of the video content.

Next comes the pooling layer, whose task is to compress and filter the features extracted from the convolutional layer. By selecting the most representative local features, the pooling layer is able to reduce the dimensionality of the data and retain the most important information. This has the advantage of reducing the computational complexity of the system while improving the understanding of the user’s interests.

Finally, there is the output layer which generates the final recommendation results. The potential user preferences for the videos are calculated through the full-connected layer. Based on the results, the system can generate the top few recommended videos for the user to choose to watch.

In practical applications, four key parameters of the video (video ID, type, title, and cover) and four key parameters of the user (user ID, gender, age, and occupation) are generally selected as input data. These parameters provide basic information about the user and the video, generating an initial matrix for the subsequent feature extraction process. By continuously optimizing and training the model, the system is able to understand the user’s preferences more accurately and recommend the most appropriate video content for them.

The algorithmic architecture of WiMi’s deep learning-based multi-modal video recommendation system offers a number of advantages to users. First, with the feature extraction capability of CNN, the system is able to accurately capture the hidden features of the video and the user, thus providing more accurate personalized recommendations. Second, the operation of the pooling layer reduces the dimensionality of the data and improves the computational efficiency of the system. Most importantly, through continuous training and optimization, the system is able to continuously learn and adapt to the user’s changing interests to provide better recommendation results. Deep learning-based multi-modal video recommendation systems are leading personalized recommendation technology into a new era. With the growth of data volume and the continuous progress of algorithms, the technology can better meet the needs of users and promote the progress of personalized recommendation technology.

The steps of WiMi’s deep learning-based multi-modal video recommendation system is as follows:

Data collection and pre-processing: the system first collects a large amount of video data and user information. The video data includes information such as video ID, type, title, cover, etc., and the user information includes user ID, gender, age, and occupation. These data are pre-processed and cleaned for subsequent feature extraction and analysis.

Feature extraction: A CNN is utilized for feature extraction. Through the operation of multiple convolutional and pooling layers, the system is able to extract rich contextual features from the video data. These features include content features of the video (e.g., scenes, actors, etc.) and user interest features (e.g., types of preferences, duration preferences, etc.).

Feature fusion: Video features and user features are fused to create a connection between videos and users. This step can be realized by the operation of the full-connected layer, where the features are multiplied with the weight matrix and bias vectors are added to get a combined feature representation of the video and the user.

Recommendation Generation: Based on the user’s comprehensive feature representation, the system uses recommendation algorithms to generate personalized video recommendation results. These results are calculated based on factors such as the user’s historical movie viewing history, interest preferences, and similarities with other users. The system can generate a series of recommended videos and sort them according to the user’s level of interest in order to provide the most relevant and attractive recommended content.

Feedback and Iteration: Users’ feedback is crucial for system improvement and optimization. The system can collect users’ watching behavior, evaluation and feedback information, which can be used to further optimize the recommendation algorithm and model. Through continuous iteration and training, the system can gradually improve the accuracy and personalization of recommendations.

The algorithms of WiMi’s deep learning-based multi-modal video recommendation system not only provide personalized video recommendation services, but also offer users richer and more diverse viewing options. With the powerful feature extraction capability of the deep learning algorithm and the accuracy of the recommendation system, users can more easily discover video content that matches their interests and enjoy a better viewing experience.

With the continuous development of artificial intelligence and deep learning, the deep learning-based multi-modal video recommendation system will continue to be optimized and developed to achieve more accurate, diverse, and personalized recommendation results by improving the model, introducing reinforcement learning, fusing multi-modal data, and considering social factors. At the same time, through the application of explanatory recommendation and interpretable modeling, the user’s understanding and trust of the recommendation results will be increased, which will further enhance the user experience and solve the problem of information overload.

About WIMI Hologram Cloud

WIMI Hologram Cloud, Inc. (NASDAQ:WIMI) is a holographic cloud comprehensive technical solution provider that focuses on professional areas including holographic AR automotive HUD software, 3D holographic pulse LiDAR, head-mounted light field holographic equipment, holographic semiconductor, holographic cloud software, holographic car navigation and others. Its services and holographic AR technologies include holographic AR automotive application, 3D holographic pulse LiDAR technology, holographic vision semiconductor technology, holographic software development, holographic AR advertising technology, holographic AR entertainment technology, holographic ARSDK payment, interactive holographic communication and other holographic AR technologies.

Safe Harbor Statements

This press release contains “forward-looking statements” within the Private Securities Litigation Reform Act of 1995. These forward-looking statements can be identified by terminology such as “will,” “expects,” “anticipates,” “future,” “intends,” “plans,” “believes,” “estimates,” and similar statements. Statements that are not historical facts, including statements about the Company’s beliefs and expectations, are forward-looking statements. Among other things, the business outlook and quotations from management in this press release and the Company’s strategic and operational plans contain forward−looking statements. The Company may also make written or oral forward−looking statements in its periodic reports to the US Securities and Exchange Commission (“SEC”) on Forms 20−F and 6−K, in its annual report to shareholders, in press releases, and other written materials, and in oral statements made by its officers, directors or employees to third parties. Forward-looking statements involve inherent risks and uncertainties. Several factors could cause actual results to differ materially from those contained in any forward−looking statement, including but not limited to the following: the Company’s goals and strategies; the Company’s future business development, financial condition, and results of operations; the expected growth of the AR holographic industry; and the Company’s expectations regarding demand for and market acceptance of its products and services.

Further information regarding these and other risks is included in the Company’s annual report on Form 20-F and the current report on Form 6-K and other documents filed with the SEC. All information provided in this press release is as of the date of this press release. The Company does not undertake any obligation to update any forward-looking statement except as required under applicable laws.

View original content:https://www.prnewswire.com/news-releases/wimi-developed-deep-learning-based-multi-modal-video-recommendation-system-301920342.html