Loading [MathJax]/extensions/MathZoom.js
Combining Social Relations and Interaction Data in Recommender System With Graph Convolution Collaborative Filtering | IEEE Journals & Magazine | IEEE Xplore

Combining Social Relations and Interaction Data in Recommender System With Graph Convolution Collaborative Filtering


Overall proposal model with interaction graph, user correlation graph, and social graph as input; convolution layers to learn feature embedding and propagation progess; a...

Abstract:

A recommender system is an important subject in the field of data mining, where the item rating information from users is exploited and processed to make suitable recomme...Show More

Abstract:

A recommender system is an important subject in the field of data mining, where the item rating information from users is exploited and processed to make suitable recommendations with all other users. The recommender system creates convenience for e-commerce users and stimulates the consumption of items that are suitable for users. In addition to e-commerce, a recommender system is also used to provide recommendations on books to read, movies to watch, courses to take or websites to visit. Similarity between users is an important impact for recommendation, which could be calculated from the data of past user ratings of the item by methods of collaborative filtering, matrix factorization or singular vector decomposition. In the development of graph data mining techniques, the relationships between users and items can be represented by matrices from which collaborative filtering could be done with the larger database, more accurate and faster in calculation. All these data can be represented graphically and mined by today’s highly developed graph neural network models. On the other hand, users’ social friendship data also influence consumption habits because recommendations from friends will be considered more carefully than information sources. However, combining a user’s friend influence and the similarity between users whose similar shopping habits is challenging. Because the information is noisy and it affects each particular data set in different ways. In this study, we present the input data processing method to remove outliers which are single reviews or users with little interaction with the items; the next proposed model will combine the social relationship data and the similarity in the rating history of users to improve the accuracy and recall of the recommender system. We perform a comparative assessment of the influence of each data set and calculation method on the final recommendation. We also propose and implement a model and compared it with base line...
Overall proposal model with interaction graph, user correlation graph, and social graph as input; convolution layers to learn feature embedding and propagation progess; a...
Published in: IEEE Access ( Volume: 11)
Page(s): 139759 - 139770
Date of Publication: 07 December 2023
Electronic ISSN: 2169-3536

SECTION I.

Introduction

E-commerce develops strongly and gives users a good experience not only with complete information about products but also conveniences for customers such as providing items by catalog or comparing among products. Items selected according to certain criteria will be recommended to customers with relevant characteristics, such as age, gender, interests, or place of residence. Storing customer and product information is a challenging task because the number of customers and products is very large, increasing rapidly and diversifying sources of information collected. With a huge size database of users and items, the speed of access will limit the effectiveness of e-commerce. The collaborative filtering (CF) algorithm calculates the similarity between users [1], [2], without retrieving their personal information. This algorithm evaluates the similarity between customers based on the large number of items they have interacted with such as purchasing, rating, viewing, commenting. And then, the system will recommend to a target customer items in the interactive list of other customers who have a highly similarity with them. In these models, the correlation between the user and the item is recorded by an interaction matrix, which can be implicit or explicit. If the data are implicit, the matrix element will store whether the user is interested in or shopping for the item. If the data are explicit, the matrix element value represents the user’s rating of the item. The similarity between two users is the distance between two vectors representing them, each of which contains information about that user’s interactions with all items.

The explosion in the number of users and items leads to a huge matrix size, which will take up a large amount of memory for storage as well as computing resources for operation on matrices. The matrix factorization (MF) technique can decompose the interaction matrix into two or more matrices of smaller size without losing the original useful information. Singular value decomposition (SVD) algorithm generalizes the eigen decomposition of a square matrix which has an ortho normal eigen basis to any size matrix, and finally, the feature matrix is found with smaller size and minimum loss. However, most of the elements in these matrices have the value of zero, because almost users interested only a certain number of items and has no interaction with the rest of the item list. It made matrix sparse, they need to be compressed to both reduce the amount of memory needed as well as facilitate the computation. By matrix transformations, information is collected into embedding matrices and can be efficiently processed using TensorFlow and Keras libraries [3]. As the final step of the computation, the embedding matrices can be converted back to the original interaction matrix size for the front-end layers to make recommendations to the user.

Besides exploiting user and item interaction information, users’ social relationships are also essential because they show the user’s influence in real life. A shopping recommendation from a friend will have a greater impact on a user’s decision than other advertising sources. Today’s social networking platforms such as Foursquare, Facebook, Gowalla also provide a database of users and their friendship relationships. Mining these facts will enrich the information and support not only the recommendation process but also specific user groups [4], [5].

Both the interactions between users and items and the social relationships between users can be graphically represented and mined using graph neural network (GNN) [6]. In a graph, neighboring nodes in some hops show that they are related. Implicit information could be discovered by travel though the graph and look for nearly users who have the most common items reference. These high-order had been found in Neural graph collaborative filtering model (NGCF) [7]. Each matrix will also be transformed into embedded matrices that are significantly reduced in dimensions compared to the original matrix but still contain all the original information. The embedding matrices are decomposed to capture the influence signals and then combine them into a more informative embedding matrix [8]. The learning process can be repeated many times to reach the target accuracy, in the iterations propagated embedding matrices are used to capture high-order connectivity in the item and user interaction graph.

In this publication, we propose a GCN model that could analyst and synthesize from three sources of information: the product and user interaction matrix, the influence weighted matrix between users and the matrix of social friendships derived from social network platforms. We also implement the model and conduct experiments on well-known data sets and compare the results with the base line models. This article will be shown the next sections. Chapter 2 will summarize the research work that are the foundation of the topic and the latest techniques. Chapter 3 propose model in which we will show how to preprocess data sets, the organization of neural layers, and the signal aggregator and decomposer. Chapter 4 presents the overall results and compares them with previous state-of–the-art models. Chapter 5 is our conclusions and some ideas for future research.

SECTION II.

Literature Review

A. Collaborative Filtering

Recommender system in the first stage exploited features of all items as well as the preferences of users to find suitable items and give reasonable recommendations to users [9]. For example, a book can be categorized into comics, detective, historical, archaeology, fine arts, and others. With users, information such as age, gender, address of residence or education could also used to enrich the input data [10]. However, with a very large number of items and frequent additions, it is difficult to find out and give items attributes, the systems need additive filtering methods. Collaborative filtering approach algorithms were used to find similarity between users or items without attribute information from them [11].

Collaborative filtering can be implemented by memory-based models [12] or model-based models [13]. With memory-based model, the history of every user’s rating on items will be recorded in a rating matrix. There are many ways to define a rating scale, which can be an integer value from 1 to 5, or an implicit rating. In a space made up of user set, item set, and rating values, each user is represented by a feature vector e_{u} , also as each item is represented by feature vector e_{i} . The algorithms will find out the distance between each pair of users (or pair of items for item-based algorithms) to find the neighbors and form the recommended results. There are many methods to calculate distance such as Cosine function, Jaccard similarity or Mean Squared Differences [8].

In the model-based collaborative filtering systems, the algorithms will search for patterns in the learning data to develop a model for future prediction [14]. The matrix that contains rating of users on items can be reduced by using MF techniques. In order to ensure accuracy of distance between users and items the rating matrix can be divided into user feature matrix and item feature matrix with smaller sizes [15]. The higher accuracy of user similarity calculating, the better recommendation prediction results.

The Graph convolutional matrix completion (GCMC) built a graph-based auto-encoder framework for matrix completion [16]. This model moved from matrix to graph by defining the main problem as prediction task on a bipartite graph. Graph construction also benefits from having additional sources of information such as friendships in social networks, which are also represented by graphs. The encoders traverse the graph and record signals from vertices, representing users and items, producing latent features of these nodes. The of collaborative filtering problems between rows (or columns) has now become about measuring the predicted edge weights between two objects, which present users or items.

B. Graph Convolution Networks

A graph is a type of data structure that uses a set of vertices V to represent an object and a set of edges E to represent a relationship to depict the link between entities, people, data, and qualities. An edge on a graph may be directed or undirected, specifying the relationship’s weight or just the relationship itself. In order to extract input data and identify properties of them, neural networks have had remarkable success setting up hidden layers [17], [18]. In the publication [19], neural networks were used with graph input data, and when vertex features were propagated and aggregated into one another during the learning process [20], favorable results were obtained. Graph neuron network (GNN) methods, which inherit from neural network algorithms, employ several propagation layers as well as various aggregation and update techniques. By means of pooling or attention techniques, the neighbor vertex’s characteristics as well as those of the target vertex are better modified [21].

Graph Convolutional Network (GCN) is an iterative method of collecting information [22]. For example, when there is a similar item of interest to many of the target user’s friends, it should be recommended with a higher degree than other items. GraphSAGE [23] embedded inductive for each vertex of the graph and learned the topological structure of the graph as well as the effect of vertices on neighboring vertices. This method not only focus on feature-rich graphs but also make use of structural features of all vertices.

NGCF [7] generates hop-by-hop propagation classes in the input graph under the assumption that the effect of the users varies depending on the distance between the graph’s vertices. Attention signals from a user’s neighbors are received by the k^{th} layer of propagation, also known as k-order propagation, which also receives messages from items to that user. The input parameter k can be viewed as the number of propagation layers, and the value k=3 is concluded to be the best one. The difference between the test set’s actual and expected assessment value serves as the foundation for the loss function. Depending on the learning rate, the size of the embedding matrices, and the quantity of the learning data, the learning process to minimize the loss function occurs after a few iterations. LightGCN [24] deleted the feature weight transformation matrices during propagation and a non-linear activation function to remove adverse impacts on objects in the NGCF propagation process in a later version of the algorithm.

C. Social Recommender System

The social counseling system has been developed since the emergence of online social platforms. In addition to user behavior, restrictions such as liking, sharing and commenting; Additional data from social networks is used to provide personalized headlines to users considering the influence of their friends [5], [25], [26]. Loads of information on social networks have influenced and made users of the same interests as their friends [27], [28]. ContextMF [29] recorded the social relational information in the recommendation data in a frame design that was collected from matrix coefficients and with regularization terms. This model has improved the accuracy of the recommendation results when the model only performs MF with the user and item relation matrix. The TrustSVD model [25] developed from SVD++ [30] incorporates the influence of friends on social relations as an implicit feedback add-on for an observing user. GCN usage recommendation models are also extended to include more user social relation information, such as GraphRec [31] which viewed a user in both importance matrix aspects. social system and in the matrix of interactions with items. The data influence between users was aggregated and conditioned by the Social Aggregation module during machine learning.

SocialLGN [32] designed deep models to capture the most distinctive characteristics of social and embedded networks and users of mixed embeddings, instead learning directly from their interaction data only users and items. Moreover, their model still uses GCN to study the reference of users whose information influence is affected through the process of social variables pervasive in social networks. User influence propagation also propagates through multiple logging layers, techniques used in NGCF [7] and Light GCN [24].

Technically, the process of enhancement between social relational data and user influence in shopping is a challenge because they change according to the time and nature of each kind of items. In the Socially-Aware Self-Supervised Tri-training (SEPT) model [33], three histogram encoders were created to get the times signal from the social relationship between users, the user relationship when interested in similar items and information shared information - augment the data between social information and shopping interaction information. The self-supervised learning process in the above model has both improved the effectiveness of recommendations.

SECTION III.

Our Proposed Model

In this section, we present a proposed model to exploit the rating matrix and social relationship between users to make recommendations for any user in the future. The model will be presented with data processing module, GCN module and prediction layer with loss calculator. An overview on our model show as Figure 2.

FIGURE 1. - Collaborative Filtering algorithms calculates the correlation between users.
FIGURE 1.

Collaborative Filtering algorithms calculates the correlation between users.

FIGURE 2. - Overview on our proposed model.
FIGURE 2.

Overview on our proposed model.

A. Data Pre-Processing

The data sets are always recorded continuously and redundantly and included noise data that are users and items with very little interaction. Eliminating these signals will limit recommendations to items that are no longer available. Furthermore, the reduced data sets, smaller size but without losing valuable information, will decrease the memory size required, as well as cut down the computation time of the models. In our experiments, we apply a 10-core setting, which means that users with less than 10 interactions will be dropped at this processing step.

With the selected set of items, we also keep only the number of users with the highest interactions that are measured by the Jaccard distance between the set of items that a user interacts with and the set of all filtered items in the database. The ratio of number of users to number of items can be selected equal to the ratio of the original data set. The ratio of all survey data sets in the publication [7], [24]. We summarize this ratio in Table 1. Users with a low Jaccard measure were removed as outliers, and the social friendship matrix is being rebuilt accordingly.

TABLE 1 Ratio Between Number of Users to Number of Items in Surveyed Datasets
Table 1- 
Ratio Between Number of Users to Number of Items in Surveyed Datasets

After pre-processing the data set, we’ve got 3 matrices as input to the model, where matrix A presents interaction between users and items, matrix C contains the similarity degree between users and matrix S represents social friendship of users. The matrix S is an optional matrix, which could be skipped if a data set does not provide a real social relationship; or when the noise of social relation data is higher than the actual effect among users.

Algorithm 1 Filter Out Items With Less Than 10 Interactions

Input:

\mathcal U \times \mathcal I {Interaction between users and items in original dataset}

Output:

R \subseteq \mathcal U \times \mathcal I

1:

for all item i \in \mathcal I do

2:

if item i has at least 10 interaction then {10-core setting}

3:

I \gets \textrm {item } i

4:

end if

5:

end for

6:

p \gets \textrm {cardinality of set }I

7:

q \gets p \div selected\_{}ratio

8:

for all user u \in \mathcal U do

9:

set_{u} \gets \textrm {list of items interacted by user u}

10:

sim_{u} = \textrm {Jaccard distance between set} I \textrm{and} set_{u}

11:

end for

12:

U \gets q \textrm {users have highest }sim_{u}

13:

return R = U \times I

1) The Interaction Matrix A

The user-item interaction data set can be provided as an implicit or an explicit. Implicit data records interactions such as item view, item purchase, mouse click, or like item as a binary value of 0 or 1, while explicit data specifies a user rating an item with a rating value in a certain range, for example an integer between 0 and 5. In this article, we consider the interactions between the user and the item are implicit recorded and expressed by a bipartite graph G=\{V, E\} , where vertices set V=\{U, I\} is union from set of users U and items set I ; the E contains edge e_{i,j}=(u_{i}, i_{j}) if user u had interaction on item i . Thus, the matrix R \subseteq U \times I , represents the graph G , could be defined by (1).\begin{align*} R_{i,j}=\begin{cases} \displaystyle 1 & \text {if user $u_{i}$refers to item $i_{j}$}\\ \displaystyle 0 & \text {otherwise} \end{cases} \tag{1}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Interactive filtering by GCN will capture users’ characteristics into user embedding while features of items will be learned into item embedding. In order for these two embedding matrices to be updated at the same time to facilitate propagation, we design the input matrix to be a Laplacian matrix showed in (2). If the matrix R represents user interactions on the items, the transpose matrix R^{T} will represent the items that the users interacted with. 0 is a zero matrix of suitable size.\begin{align*} A = \begin{bmatrix}0 & R \\ R^{\top} & 0 \end{bmatrix} \tag{2}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Because of convenience in calculation the embeddings in next layer in our graph convolution operations [7], the symmetrically normalized matrix \widetilde {A} should be calculated once by (3) and accessed every time the embeddings are convoluted.\begin{equation*} \widetilde {A} = D^{ -\frac {1}{2}}AD^{ -\frac {1}{2}} \tag{3}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where D is the diagonal degree matrix with each entry D_{i, i} has value of number of nonzero entries in the i-th row vector of degree matrix A and D_{i,j } = 0 with i < > j . Finally, \widetilde {A} is used as one of input sources of our GCN model.

2) The Users Correlation Matrix C

The similarity between each pair of customers in the database depends on how many common items they interact with. Because the greater the number of items of common interest, the higher the influence between them. So that, in the publication [34], a weighted user influence matrix W_{I} was inputted as an additional source of information. This matrix can be calculated by W_{I} = R \times R^{\top} and it shows how many common items that user u_{i} and u_{j} have interaction by value of W_{I_{i,j}} . On the other hand, W_{I} also could be calculated by (5).

However, the matrix W_{I} only shows the number of intersections between the two sets of items I_{i} and I_{j} , but has not recorded the influence of couple of users \{i,j\} to all interaction data. The higher the total number of items both of customers has interacted with, the more similarity should be counted, since it is clear that they interact with the items more often than other customers. We propose the matrix W_{U} has the same size as the matrix W_{I} and has the following initialization value in (4) and (5) with where: I_{i} and I_{j} are the sets of items interacted by user u_{i} and u_{j} , respectively.\begin{align*} W_{U}&= |I_{i}| + |I_{j}| - |I_{i} \cap I_{j}| = |I_{i} \cup I_{j}| \tag{4}\\ W_{I}&=|I_{i} \cap I_{j}| \tag{5}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Therefore, element-wise product W_{U} \odot {W_{I}}^{-1} will return a normalized matrix whose elements get a value in range from 0.0 to 1.0, based on the Jaccard similarity between each pair of users.

Furthermore, GCN-based models collect the collaborative signals from the high-order connectivity graph. The matrix W_{U} \odot {W_{I}}^{-1} inadvertently adds weight to the associations already extracted from high-order connectivity graph i.e. it makes the model more focused on the interaction points extracted from the high-order connectivity graph. This leads to a model that lacks extensible and is prone to over fitting. To solve this problem we use a function to convert value of elements of W_{U} \odot {W_{I}}^{-1} into the typical values which showed in Table 2.

TABLE 2 Represents the Function f Classified Elements in the W_{U} \odot{W_{I}}^{-1} Result Matrix Into Denotation Values
Table 2- 
Represents the Function 
$f$
 Classified Elements in the 
$W_{U} \odot{W_{I}}^{-1}$
 Result Matrix Into Denotation Values

The conversion process of the resulting matrix W_{U} \odot {W_{I}}^{-1} also demonstrates the clustering of all users into multiple groups through their influence level, which is represented by the value of the matrix C .

Summarized all above ideas, the matrix C, which represents the degree of similarity between the behavior of the users, is implemented as the (6) with f is the value allocation conversion function, shown in Table 2. W_{U} is weighted user references matrix and W_{I} is the matrix that represents the union of two lists of items two users.\begin{equation*} C = f(W_{U} \odot {W_{I}}^{-1}) \tag{6}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

3) The Social Friendship Matrix S

Because the data of social relation is not always provided in a data set, it is an optional part of our recommendation system. In the next part of the experiment, we will present the comparative results with social relations and without it. Some data sets and publications review the user i and user j relationships are represented as a directed edge in a graph if the user i follow user j (j might not know about i ). In this article, we treat the relationship of any couple of users as equal, or in other words, the edge of the relation between them in the graph is an undirected edge. The social relation matrix S \subseteq U \times U denotes the friendship among users in the real social life. If user i and user j are friends, S_{ij} = 1 , otherwise S_{ij} = 0 .

In recent studies, the matrix S is used to enhance users-items relationships in input data. SEPT model has created a sharing view A_{s} [33], calculated as (7). A_{s} represents the relationship weight between users not only by interesting in common items but also depending on whether they have a true relationship or not.\begin{equation*} A_{s} = (RR^{\top}) \odot S \tag{7}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

We keep our matrix S separate from interaction data A and users correlation values C in the input to evaluate the influence of friendship relations on overall recommendation accuracy. We leave research on integrating social signals into other matrices for future research.

B. Graph Convolution Network Layers

1) Transformation Weight Matrices and Activation Function

The NGCF model represents the most advanced models that use GCN for mining and recommendation. NGCF is designed with feature transformation matrices as well as a nonlinear activation function like other GCN standards [35]. Then the non-linear activation will incorporate embedding in the output and smoothing the values so as not to introduce a biased value in overall result. The LeakyReLU function is commonly used in GCN models [7], [34].

However, developing the NGCF model on data with little description, where both user entries and each other are stored with only the valid ID, does not bring any benefit to the feature learning process. On the contrary, computer resources are spent a lot on storage and operations on digital matrices. The use of weighted transformation matrices and activation function has been discussed in [24]. In this publication, the LightGCN model eliminates all the weighted transformation W_{i} along with the activation function and achieves better overall results than the NGCF model. We again test this method on the SocialLGN model [32] with the variants eliminating either the transformation matrices, the function activations, or both.

  • SocialLGN-f removes all feature transformation matrices W_{1}, W_{2} and W_{3} .

  • SocialLGN-n eliminates the non-linear activation function \sigma (.) .

  • SocialLGN-fn eliminates both the feature transformation matrix W_{i} and the nonlinear activation function \sigma (.) .

In all experiments, we apply the same hyperparameters with publications NGCF and LightGCN, including learning rate, regularization, dropout rate, Gowalla and Yelp2018 data. The results show in Figure 4 with Gowalla data set and Figure 5 with Yelp data set.

FIGURE 3. - The form of matrix M helps embeddings updated concurrency.
FIGURE 3.

The form of matrix M helps embeddings updated concurrency.

FIGURE 4. - Compare variants of SocialLGN on data set Gowalla.
FIGURE 4.

Compare variants of SocialLGN on data set Gowalla.

FIGURE 5. - Compare variants of SocialLGN on data set Yelp2018.
FIGURE 5.

Compare variants of SocialLGN on data set Yelp2018.

When removing either feature transformation matrices W_{1},W_{2},W_{3} or non-linear activation functions \sigma (.) , the model performance is unstable and worse than the original SocialLGN, but removing both feature transformation matrices and activation function the model improves significantly. Through this experiment we proved our inference about the negative impact of graph fusion operation to embedding users. To replace the graph fusion operator, at SocialLGN-fn we simply sum the embedding users social and the embedding users interaction together to generate embedding users for each propagation.

From the above presentation, we determine that eliminating both the feature transformation matrices and the nonlinear activation function is appropriate for the data sets considered in this article.

2) Embedding Layers

Follow the model in publication [36] used two vectors for user’s latent and item’s latent, to define as a result are users embeddings and items embeddings, we also define users embeddings and items embeddings as e_{u} \in \mathbb {R}^{d} and e_{i} \in \mathbb {R}^{d} with d is the embeddings vector size. In the first round of propagation, embedding layer e^{(0)}=e_{u}^{(0)} \parallel e_{i}^{(0)} will be initialized with normal weight initialization method. Based on the architecture of LGC [24], the initialized embeddings e_{u}^{(0)} and e_{i}^{(0)} were the only trainable parameters of our model. In matrix form we denote E^{(0)}\in {\mathbb {R}^{(n+m) \times d}} is the set of all embeddings during propagation, i.e. E^{(0)} contains the set of n user embeddings and m item embeddings.\begin{equation*} E^{(0)}=E_{U}^{(0)} \parallel E_{I}^{(0)}=[e_{u_{1}}^{(0)},\dotso,e_{u_{n}}^{(0)},e_{i_{1}}^{(0)},\dotso,e_{i_{m}}^{(0)}] \tag{8}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

3) Propagation Process

In order to capture signals from three input matrices, every k propagation layers we propagate the k^{th} user and item embeddings e^{k}_{u} and e^{k}_{i} on the LGC architecture [24] and obtain four embeddings containing three signals are e_{ua}^{(k+1)}, e_{c}^{(k+1)}, e_{s}^{(k+1)}, e_{i}^{k+1} . Where e_{ua}^{(k+1)} and e_{i}^{(k+1)} hold user-item interaction signals in the matrix A , e_{s}^{(k+1)} contain social signals between users in matrix S , and e_{c}^{(k+1)} capture correlation signals between users in matrix C .\begin{align*} e_{ua}^{(k+1)} &=\sum _{i\in {\mathcal {N}}_{u}^{A}}^{}\frac {1}{ \sqrt {| {\mathcal {N}}_{u}^{A} ||{\mathcal {N}}_{i}^{A}| } }e_{i}^{(k)} \\ e_{s}^{(k+1)} &=\sum _{s\in {\mathcal {N}}_{u}^{S}}^{}\frac {1}{ \sqrt {| {\mathcal {N}}_{u}^{S} ||{\mathcal {N}}_{s}^{S}| } }e_{u}^{(k)} \\ e_{c}^{(k+1)} &=\sum _{c\in {\mathcal {N}}_{u}^{C}}^{}\frac {1}{ \sqrt {| {\mathcal {N}}_{u}^{C} ||{\mathcal {N}}_{c}^{C}| } }e_{u}^{(k)} \\ e_{i}^{(k+1)} &=\sum _{u\in {\mathcal {N}}_{i}^{A}}^{}\frac {1}{ \sqrt {| {\mathcal {N}}_{i}^{A} ||{\mathcal {N}}_{u}^{A}| } }e_{u}^{(k)} \tag{9}\end{align*}

View SourceRight-click on figure for MathML and additional features. where |{\mathcal {N}}_{q}^{X}| denote the number of neighboring users (or items) of item(or user) q in the matrix X , X = [A, S, C] . The essence of equation (9) is based on the symmetric normalization element was used in most GCN model [7], [24], [34] and was calculated by (3).

Then the (k + 1)^{th} user embeddings are aggregated according to the equation (10), specifically, we combine three embedding by using the weighted sum of the embeddings.\begin{align*} e_{u}^{(k+1)} =AGGREGATION_{k} \left ({e_{ua}^{(k+1)},\quad e_{c}^{(k+1)}, \quad e_{s}^{(k+1)} }\right) \tag{10}\end{align*}

View SourceRight-click on figure for MathML and additional features.

After some loops of propagation process, we’ve got (K+1) embeddings corresponding to the number of propagations through K layers and including the initial embedding. The final embedding of users (and items) will be obtained by (11).\begin{equation*} e_{u} = \frac {1}{K}\sum _{k=0}^{K}e_{u}^{(k)};\quad e_{i} = \frac {1}{K}\sum _{k=0}^{K}e_{i}^{(k)} \tag{11}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

Using equation (11), we takes the mean value of all embeddings at all layers, and that equation can be replaced with some other functions like maximum, median, weighted average. In order not to complicate the model and still ensure good performance in general, we use the mean function.

4) Prediction and Optimization

Based on the final embedding e_{u} and e_{i} calculated by (11), we calculate the prediction score of item i for user u by equation (12).\begin{equation*} \widehat {y}_{ui}=e_{u}^{\top} e_{i} \tag{12}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

We build the loss function with Bayesian personalize ranking (BPR) for model to learn the parameters \Phi which only include the user and item embedding. BPR is the best suitable method for implicit feedback data sets [37], it assumes observed interactions \Omega ^{+}_{ui} have higher preferences than an unobserved interactions \Omega ^{-}_{uj} . To optimize the prediction model we use mini-batch Adam [38] and minimize the BPR loss in (13).\begin{equation*} Loss_{BPR} = \sum _{\Omega ^{+}_{ui}}\sum _{\Omega ^{-}_{uj}} -\ln \sigma (\widehat {y}_{ui} - \widehat {y}_{uj}) + \lambda \parallel \Phi \parallel ^{2}_{2} \tag{13}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

5) Matrix Form

To facilitate the implementation of the model, we expressed the propagation process in the form of matrix in (14).\begin{align*} E^{(k+1)} &= E_{U}^{(k+1)} \parallel E_{I}^{(k+1)} \\ &= (E_{U_{A}}^{(k+1)} + E_{S}^{(k+1)} + E_{C}^{(k+1)}) \parallel E_{I}^{(k+1)} \\ &= (\widetilde {R}E_{I}^{(k-1)} + \widetilde {S}E_{U}^{(k-1)} + \widetilde {C}E_{U}^{(k-1)}) \parallel \widetilde {R^{\top} }E_{U}^{(k-1)} \tag{14}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \widetilde {C} and \widetilde {S} is a symmetrically normalized matrix of C and S as in (15).\begin{align*} \widetilde {S} &= D_{S}^{-\frac {1}{2}}SD_{S}^{-\frac {1}{2}} \\ \widetilde {C} &= D_{C}^{-\frac {1}{2}}CD_{C}^{-\frac {1}{2}} \tag{15}\end{align*}
View SourceRight-click on figure for MathML and additional features.

\widetilde {R} and \widetilde {R^{\top} } appeared as components in the input \widetilde {A} in (3).

SECTION IV.

Results

A. Experimental Setting

We use setting 10-core with all data sets to ensure that each user has at least ten interactions. For each data set, we selected 20% of interactions which have the latest timestamp for testing set and the remaining (80% interactions) are for process of training. Summary of all data sets showed in Table 3. We configure our model with embedding fixed size to 64 for all models and the embedding parameters are initialized with the normalized weight method. The optimization process was done with Adam algorithm [38].

TABLE 3 Statistic of the Experiment Data Sets
Table 3- 
Statistic of the Experiment Data Sets

1) Data Sets

  • Gowalla: First published in [39], Gowalla is a web-based application that provides location-based social networking. Users can check-in and share public places. Gowalla also collects the friendship network of users and provides these data as an undirected graph.

  • Librarything: is a data from a book review website called Library Thing [40]. It is an online service to help people organize their books easily and people can communicate with each other.

  • Ciao: is an online shopping site that records users’ ratings of items with timestamps. Customers can rate an item with a score from 1 to 5. Users can also add others to their friend lists and build a social network [41].

  • Epinions: is a well-known consumer review site where users can rate items and add social friends to their trust lists [42]. The Ciao and Epinions data sets have been widely selected as benchmark data sets for social recommendations.

2) Evaluation Criteria

The evaluative measurement should be selected appropriately for each algorithm [43]. A commonly evaluating method is dividing a data set into a test set and a train set. The algorithm is applied on the train set to make predictions and evaluated these results on the test set. The difference between the learning result and the actual data value shows the accuracy of the algorithm of proposed model. This difference can be represented in mean absolute error (MAE) and root mean square error (RMSE). Besides accuracy, recall, scalability, learning time, memory consumption or interpretability are also an important criterion in evaluating the recommended system.

For implicit data, interactions between the user and the item are recorded binary, rather than rated as a specific value. Then, the algorithms will use the accuracy measure for the classification. The commonly used metrics are Precision and Recall [44], [45]. Precision is the ratio between correct predictions on the test set, and Recall is the sensitivity of the algorithm, or the proportion of relational assertions that have been retrieved from the test set.\begin{align*} Precision &= \frac {TP}{TP+FP} \\ Recall &= \frac {TP}{TP+FN} \tag{16}\end{align*}

View SourceRight-click on figure for MathML and additional features. where TP is true positive set of correctly predicting interaction exists between users and items. FP is false positive set of missing prediction of interaction while FN (false negative) presents the number of predicted interactions not being exist on the test set.

Furthermore, Discounted Cumulative Gain score (DCG) [35] assumes that judges have assigned labels to each result and accumulates across the result vector a gain function G applied to the label of each result, scaled by a discount function D of the rank of the result, and it’s normalized by the dividing DCG of an ideal result vector I . We get Normalized Discounted Cumulative Gain (NDGC) as in (17).\begin{equation*} NDGC_{u} = \frac {DCG_{u}}{CDG_{max}} \tag{17}\end{equation*}

View SourceRight-click on figure for MathML and additional features. In this publication, we use Precision and Recall (16) and NDCG (17) that considered at 20 items (NDGC@20).

3) Base Line Models

To evaluate our proposed model, we compare it with the following state-of-the-art models as base lines.

  • MF [15] is the matrix factorization model demonstrated in the Netflix pricing competition. The model outperforms classic nearest neighbor techniques for generating product recommendations and allows the inclusion of additional information such as implicit feedback, temporal effects, and confidence. Even though this model is outdated, we still put it in comparison to show the evolution of recommendation models.

  • Social MF [46] incorporates the trust information and its propagation into an MF model. The signals from direct friends are embedded in the final matrix.

  • Trust SVD [25] is a trust-based MF technique, is built on the SVD++ algorithm. This model handles the explicit and implicit influence of rated items, by incorporating reviews from trusted friends, to make prediction of items for an active user.

  • GCMC [16] assumes that making a future recommendation is a prediction of the association between the user and the item in the graph. Interaction data is represented by one bipartite graph with labeled edges denotes the observed ratings. The model then uses a graph auto encoding framework based on message passing on interactive graphs.

  • NGCF [7] is one of the most cutting-edge GCN models. It performs propagation operations on embeddings using a few iterations. High-order connectivity in the interactions graph is contained in the stacked embeddings on the output. The latent vectors contain the collaborative signal, which strengthens the model.

  • WiGCN [34] based on NGCF model, the WiGCN added a weighted matrix as an additional input. That matrix contains the influence of users on each other. It leads to more data-gathering propagation and increased recommendation performance.

  • LightGCN [24] in order to concentrate on the neighborhood aggregation element for CF, this model eliminates the weight matrices and the activation function. The users and items embeddings for the interaction graph are learned using linear propagation in this model. The weighted total of all learnt embeddings becomes the final embedding.

  • SocialLGN [32] concentrate on the CF’s neighborhood aggregation component. The users and items embeddings for the interaction graph are learned using linear propagation in this model. The weighted total of all learnt embeddings becomes the final embedding.

  • SEPT [33] from the user social information, this model augments the user data views with the user social information. And then the framework builds three graph encoders upon the augmented views and iteratively improves each encoder with self-supervision signals from other two encoders.

B. Overall Result Comparison

Summary of all results from all models on the processed data are displayed in the Table 4 with three values in each data set: precision, recall and ndcg measured in 20 items. In order from the top row down, the chemical process of recommender system models can be clearly identified, that is, the group of methods using GCN has better results than the group of classical methods such as MF or SVD. In the group of GCN methods, models that do not use the transformation matrices, such as LightGCN, give better results than the remaining models. When compared in the group of models that do not consider social relationships, that is MF, GCMC, NGCF, LightGCN, WiGCN, our model with interaction embedding (Our w/ interaction) still gives the highest results.

TABLE 4 Overall Performance Comparisons
Table 4- 
Overall Performance Comparisons

In each pair of models such as MF and Social MF, LightGCN and SocialLGN, LightGCN and SEPT, almost models that use social relationship give better results; This confirms that mining actual friend data are as important as mining the correlation between users through items of common interest. Our proposed model also efficiently accounts for social relationship data when embedding them in the propagation process.

C. Detailed Model Analysis

1) Effect of Social Relation on Overall Result

We compare each pair of models that are architecturally similar to each other in Table 5. In each row of comparison, one model exploits the users and items relationship data while the other model adds users and users relationship data. In a pair of models, the signal propagation and reception formulas all have a high degree of similarity, using (or not using) the weighted transformation matrices and non-linear activation functions. The parameters setting in the experiment are the same.

  1. From MF to SocialMF: in the traditional MF model, the characteristics of each user have been obtained from the decomposition of the users - items relationship matrix. Meanwhile, the SocialMF model also extracts characteristics of a user’s direct friends to enrich information for that user. However, the SocialMF model did not deepen the indirect relationships between the user community, so it missed many valuable signals.

  2. From LightGCN to SocialLGN: this is the only case where the results are worse across all measures. The process of combining signals from mining similarities between users with the social relationships of users is not in harmony, resulting in friend information not only reducing the accuracy of the recommendation results but also scattering the CF results.

  3. From LightGCN to SEPT: although both SocialGCN and SEPT are models that extend from LightGCN with friendship information, it is clear that the SEPT model has optimized the embedding of the friendship matrix between users into the model and brings the benefits big improvement.

  4. We also tested the influence of friendship data on the overall results by removing the S matrix in the model input and comparing the results between the model without the S matrix (Our model with only interaction data) and full model (Our model-all). We calculated the influence weights between users and clustering gathered those weights into each relationship level as shown in the Table 2. By doing that way, our model avoided the damage of SocialLGN that still captures valuable friend signals like in the SocialMF model and explores indirect relationships like other GCN models.

Although friendship data has been shown to influence the final recommendation results, it should be noted that relationships in real world change over time and can introduce bias in recommendations. An example is that friends in the same type of hobby will find it difficult to give good advice about another type of hobby.
TABLE 5 Improvement With Social Friendship Data on Gowalla Data Set
Table 5- 
Improvement With Social Friendship Data on Gowalla Data Set

2) Time Performance Analysis

The time cost of our model lies in the interaction and social relation graph aggregation. Therefore, the total time complexity is acceptable in practice. We try to make the environment run the experiments as similar as possible each time, the average time results of each epoch compared to each data set on the LGC-based models are almost similar, the figures statistics are given in Table 6.

TABLE 6 The Number of Epochs the Models Learned and the Average Time (in Seconds) Per Epoch
Table 6- 
The Number of Epochs the Models Learned and the Average Time (in Seconds) Per Epoch

An extremely outstanding feature of our proposed model is its convergence speed. Not only does it have higher recommendation predictions than other models, but its time efficiency is also significantly shorter than LGC architecture-based models such as LightGCN, SocialLGN, and SEPT. Specifically, in the Gowalla data set, our model with only interaction data takes 490 epochs and Our model only takes 440 epochs, while the two models SEPT and LightGCN take 1090 and 1140 epochs, respectively. For the remaining two data sets, Ciao and Epinions, there is also a similar trend, which demonstrates very good performance of the model training process. We conclude that the user correlation matrix helps the model focus on important data, thereby accelerating the learning process of our model.

3) Positive Impact of User Interaction Matrix

Our model converges faster than the baseline models, shown in the Figure 6 and 7. In our previous work at WiGCN model, adding a weight matrix as input and generating strong attention signals for nodes in the graph during propagation. The user correlation weight matrix based on the set of shared items only needs to be calculated once using basic matrix operations and used for all embedding layers. Therefore, in this article, the user’s correlation matrix with carefully preprocessed U \times I data has created a very high convergence speed for the propagation process.

FIGURE 6. - Training curves of LGC-based models, which are evaluated by training loss and testing recall@20 on Gowalla data set.
FIGURE 6.

Training curves of LGC-based models, which are evaluated by training loss and testing recall@20 on Gowalla data set.

FIGURE 7. - Training curves of LGC-based models, which are evaluated by training loss and testing recall@20 on LibraryThing data set.
FIGURE 7.

Training curves of LGC-based models, which are evaluated by training loss and testing recall@20 on LibraryThing data set.

D. Explain the Influence of Interaction Data and Social Network Friendship Data

One of our main research questions is the influence of interaction data and social friendship data when considered separately and when aggregated. We implemented the model in a modular form, which can enable or disable interactive embedding and social friend embedding, to make our model explainable. In Table 7 and 8, we compare 4 models: LightGCN (can be considered a baseline model without interaction matrix and social network friendship data); our model only with interaction data or social network friendship data; and our full model.

TABLE 7 Influence of Interaction Data and Social Network Friendship Data on Gowalla Data Set
Table 7- 
Influence of Interaction Data and Social Network Friendship Data on Gowalla Data Set
TABLE 8 Influence of Interaction Data and Social Network Friendship Data on Librarything Data Set
Table 8- 
Influence of Interaction Data and Social Network Friendship Data on Librarything Data Set

We conclude that for data sets related to geographical locations on social networking platforms, social friendship data create the main impact. It contributes largely to the increase in precision and recall in our model. For platforms where friendship network is just a secondary function, such as the Librarything data set in Table 8, which have social network just for private messaging, the interaction data plays a crucial role. Finally, when combining both sources of data, the best results are achieved.

SECTION V.

Conclusion and Future Works

A. Conclusion

In this work, we have presented an effective model that includes data filtering to remove noisy interactions between users and items, evaluating the influence of the weight matrix and activation function on the result, and a method of consolidating data when the similarity of users comes from multiple information sources, as well as comparing the effective recommender system models presented recently.

Designing the input with multi-embedding makes them quickly capture interaction signals and significantly increase the efficiency of the model. Our model has inherited the most effective features from the previously introduced algorithms, which are using the high-order connectivity learning structure, removing the weight matrix, and eliminating the nonlinear activation function. We also evaluated the influence of social friendships on the recommendation process and pointed out some of the difficulties of using this data.

B. Future Work

During the process of conducting research, we proposed several issues that need to be further explored. That is group recommendation, which gives the results not only for a single user but for a whole group of friends. That recommendation model is useful in contexts where a group of friends go to the movies together, go to the same attraction, or a group of students take a course. Another problem is that items have been treated as individuals; they are not clustered or grouped into catalogs. This may have omitted a lot of information from the set of items in the final recommendation result. In fact, items are closely related when they belong to the same commodity group, when items are in geographically distant locations, or when the item is in movies in the same genre.

References

References is not available for this document.