Journals & Magazines >IEEE Access >Volume: 7

Towards Deep Learning Prospects: Insights for Social Media Analytics

Deep Learning for Social Media.

Abstract:

Deep learning (DL) has attracted increasing attention on account of its significant processing power in tasks, such as speech, image, or text processing. In order to the ...Show More

Metadata

Abstract:

Deep learning (DL) has attracted increasing attention on account of its significant processing power in tasks, such as speech, image, or text processing. In order to the exponential development and widespread availability of digital social media (SM), analyzing these data using traditional tools and technologies is tough or even intractable. DL is found as an appropriate solution to this problem. In this paper, we keenly discuss the practiced DL architectures by presenting a taxonomy-oriented summary, following the major efforts made toward the SM analytics (SMA). Nevertheless, instead of the technical description, this paper emphasis on describing the SMA-oriented problems with the DL-based solutions. To this end, we also highlight the DL research challenges (such as scalability, heterogeneity, and multimodality) and future trends.

Deep Learning for Social Media.

Published in: IEEE Access ( Volume: 7)

Page(s): 36958 - 36979

Date of Publication: 25 March 2019

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2019.2905101

Contents

SECTION I.

Introduction

With the dawn of SM data over the globe, the rate of growth of data-intensive problems has also been increased. The wide availability and exponential growth of digital data have made it almost challenging or even unendurable to be visualized, explored, managed, and analyzed by means of contemporary software tools and technologies. The abundant increase in data volume, the diversity in data variety, and the incoming/outgoing data velocity (the concept known as 3Vs) are the most prominent reasons, why and how the SM data escalated. For instance, more than 1K Petabytes of data per day is being processed over the Internet, according to a report by the National Security Agency. In 2006-2011, digitized data has grown nine times and by 2020, it will touch 35 trillion gigabytes [1]. This enormous digital data intensification unlocks significant research prospects for various sectors such as education, health, industrial sector, business, public administration, scientific research and so on. In addition, the emergence of SM also directs to a sensational paradigm shift in the current scientific research on the road to data-driven knowledge discovery.

Even though the SM has a key role in connecting people around the globe [2], it offers a vast variety of knowledge extracting tasks as mentioned earlier. Pulling out the information from data and reaping knowledge from this information is not even yet a trivial problem to solve. Machine learning techniques, accompanied by the advances in existing computing power, played an important role to leverage hidden information in this data. However, its enormousness and diversity invite such a solution to the problem, which would better be able to depict the obscured information and knowledge from the data. As an active sub-area of machine learning, DL is believed to be a powerful tool to deal with SMA problems. Apparently, together with other SM applications, web-based applications are increasing day-to-day as recent hotspots [3]. Significantly, it includes social computing such as online communities, reputation systems, question-answering systems, prediction systems, recommender systems, and Heterogeneous Information Network Analysis (HINA) [4]. Moreover, the graph theory illustrates better the semantic structure of SM data which represents the users as nodes and relationships among them as links.

SM data is enormously increasing every day which requires refined patterns and features extraction for healthier knowledge discovery. Most of the conventional learning methods use shallow-structured learning architectures. However, DL discusses supervised or unsupervised machine learning techniques which automatically learn the hierarchical representations for classification. Recently, DL has appealed significant consideration from the research community because of the inspiration from biological observations on human brain processing. Also, it has performed predominantly in numerous research areas such as digital image processing, speech to text, and collaborative filtering. Likewise, the DL has been applied in engineering and manufacturing products; successfully that get facilitated of the huge-volume digital data. Certainly, the well-known companies, such as Google, Apple, and Facebook have to deal with the heaps of data on a daily basis. Moreover, these companies are eagerly posing DL-oriented projects. For instance, using DL, Apple’s Siri, an application in the iPhone works as a virtual assistant, proposes widespread services counting news about sports, answers to user’s questions, reports for latest weather updates, and reminders. While Google applies DL to the multitudes of chaotic Google Translator’s data.

In contrast, the SMA is one of the paramount, hot, and recent areas of study. Connecting DL with SMA can reveal evocative insights. Formerly, a number of reviews [5]–[7] show that DL is viable and efficient to solve substantial big data problems. However, most of the focus was on DL applications, for instance, image classification and speech recognition. None of them discussed the most paramount and developed SM platform, in particular. DL methods and applications [8] feast through the zeal of application domains including business, education, economics, health informatics etc. In this review article, we cover some noteworthy application domains of SM such as user behavior analysis, business analytics, sentiment analysis, and anomaly detection where DL has played a striking role to leverage rich knowledge. In terms of contribution, the following are the fundamental insights this study seeks to answer:

Provides a contemporary summary of existing DL methods that can exhibit a roadmap to extract useful insights for SMA.
Provides a classification scheme that identifies important features to study semantics of the particular problem which may be helpful for designing better future vision in diverse SMA application domains.
Investigates the pros and cons of existing techniques.
Enlightens the prominent application domains for applying DL.
Uncovers the noteworthy research challenges and future directions.

Following the introductory section, Section 2 describes relevant concepts and terminologies about SM and DL used in the study. Section 3 illustrates the taxonomy of methods; however, each taxonomic group is further sub-categorized based on different techniques in selected application domains. Section 4 describes the DL perspectives for benchmark datasets. Section 5 discusses the performance evaluation measures for DL models. Section 6 highlights how the SM is challenging the research community. Additionally, this section deliberates intuitive research challenges and future directions whereas Section 7 concludes the study.

SECTION II.

Basics of Social Media and Deep Learning

Before diving into the article details, we begin with an overview of the basic concepts, terminologies, data types, and architectures concerning SM and DL.

A. Social Media

Here we explain some mostly practiced terminologies and concepts pertinent to the SM.

1) Terminologie

a: Social Media Analytics

SMA is the web-based technologies used to transform the communications carried over virtual networks and communities into the interactive discussion. The interactive Web 2.0 Internet-based applications are composed of content generated by the user, such as comments or text posts, videos, and data generated through all online communications. Generally, users generate service-specific SM profiles that are governed by some organization. SM facilitates the growth of online social networks by linking profile of a user with other people or groups usually consuming similar interests. Accordingly, SMA is the approaches to assemble data from different SM platforms, for instance, Facebook, Twitter, etc., then evaluate and analyze the data to make business decisions. Significantly, this data is updating, expanding, and evolving, unceasingly [9] – perhaps, a good pathway to comprehend real-time customer experiences, intents, and sentiments.

b: Social Network Analysis

The social network lies in the subcategory of SM where two users are connected with a common interest. The individuals and groups are the nodes while the edges show connections between the nodes. Social Network Analysis (SNA) is the drawing and determining associations and drifts amongst individuals, groups, organizations, URLs, and other interlinked information entities using networks and graph theory [10]. It also helps to analyze human relationships both graphically and mathematically.

Comparatively, the SMA entitles to the Business Intelligence (BI) tools– reporting, searching, visualizing, text mining and so on – applied to the information sourced from SM platforms, for instance, Facebook and Twitter. It helps us to answer the questions such as, how much and from where you are driving the traffic, how much influential is your messaging etc. However, the SNA is explicitly focused on identifying the relationships, connections, interactions, influence amongst information engineering entities such as individuals, groups, organizations and so on. It helps us to answer the questions – how closely an individual is linked to a network, how information drift occurs within a network etc. In particular, we investigate the former here in this study.

c: Big Data

The datasets which are so voluminous, diverse, rapid, and complex than the conventional data-processing and computational systems are unable to deal with, are termed as the Big Data. In order to store, process, and analyze such data entails plausible tools and techniques which can well-encompass the underlying data. Moreover, SM platforms, for instance, Facebook, Twitter, and YouTube are causing swift intensification in the data. The Big Data from SM can be used to extract trends, patterns, and associations particularly pertinent to human behavior, entities interactions, and complex integrations. Certainly, the Big Data is interwoven with the explicit 4Vs concepts, meaningfully, Volume, Variety, Velocity, and Veracity [5]–[7].

d: Dynamic Network

Users linked together by friendship links, make a network of users at Facebook. This network varies over time by several online activities, for instance, adding or removing the friends, liking or disliking the products, joining or leaving the groups, and so on. In a dynamic network, nodes and edges vary over time. Statistical methods, computational investigation, or computerized simulations are often essential to discover how these networks grow, shrink, fine-tune, or deal with the peripheral interferences.

e: Signed Social Network

People hold both kinds of sentiments against each other – positive and negative. With the advancement of SM platforms, individuals frequently prefer to express their sentiments using these platforms. The sentiments such as friends or foes, agreements or disagreements, likes or dislikes, trust or distrust, group joining or departing, and so on can be bi-categorized into positive connections such as friends, agreements, or likes and negative connections such as foes, disagreements, or dislikes. Such interactions yield to the development of Signed Social Networks (SSNs) [11], [12]. Certainly, SM signed networks are noisy and sparse usually with a massive number of users and multitudes of relationships. SM illustrations of SSNs comprise friends/foes in Slashdot¹ [13], and trust/distrust in Epinions² [14].

2) Concepts

a: Information Object

Real-world networks can be better indicated as graphs. The graph comprises vertices and relationships among vertices. For instance, two people (vertices) can be related to a network if they are connected via a friendship link in the Facebook graph. Real-world entities or nodes in these graphs are denoted as information objects [15].

b: Domain Adaptation

In business, users buy and sell different products. Some buying and selling purely depend upon users’ reviews, ratings etc. However, users’ interested domains could vary among multiple types. Partaking a separate system for classifying sentiments for each domain would not likely be serviceable in terms of heavy cost, higher resources requirement, and more processing time as well. In order to learn a single system from the set of different domains, a substitute approach is used which solely depends upon the labeled or unlabeled data. Subsequently, the learned system is applied to any labeled or unlabeled target domain. Being a multi-source system, it is unviable to extract patterns that are not shared and meaningful across domains. The problem of learning systems on diverse domain distributions is termed as domain adaptation [16].

c: Sentiment Analysis

The process of finding subjective information hidden in the users’ content is known as sentiment analysis. Particularly, the content is in the form of text sourced from multiple SM platforms. Thus, it uses Natural Language Processing (NLP), computational linguistics, and other text analysis techniques to categorize the users’ attitude. Meaningfully, it uses data mining techniques to extract and gather data for analysis. On purpose, this analysis is to determine the subjective knowledge from a piece of text or a corpus such as news articles, reviews, comments, blog posts, SM news feeds, tweets, and status updates [17].

d: Link Prediction

Link prediction is the data-analysis method used to estimate the relationships between nodes in the SM networks. The nodes could be people, organization, or transactions. SM platforms are highly dynamic where nodes and edges are constantly evolving. In an SM network, the new interactions which are yet expected to befall, the problem is denoted as link prediction [18].

3) Types of Social MEDI

a: Microblogging

A blog is an informal-adapted discussion often depicted in the written, image, audio, or audiovisual forms on the World Wide Web. However, a microblog is a precise form of a blog which allows smaller content to be shared online as compared to a regular blog. It lets a user post slight content perhaps, small sentences, images, short length videos, or connections to the web pages. In order to keep the blog updated, bloggers usually use a number of services such as instant messaging, emailing, etc. These services are termed as microposts whereas bringing them in practice to keep the blog up-to-date is called microblogging [19]. SM websites such as Weibo, Twitter, Facebook also provides microblogging services like status updates and tweets [20]. Some microblogging services allow users to regulate their privacy settings that who can read or comment on their microblogs. Microblogging applications spread in various domains such as e-commerce, online marketing, product searching, and advertising sales.

b: Friendships

Two people connected with each other using any online medium construct a social platform termed as friendship network. People are consuming tons of their energy being bridged with others on SM platforms. The friendship network members may or may not focus on a particular topic rather put more attention on remaining linked to their friends. Licamele and Getoor [21] considered these networks as friendship-event networks. For instance, in an academic collaboration network, the researchers are actors/friends, collaborations are friendships, and conferences are events. Moreover, the noteworthy online friendship platforms include but not limited to Facebook,³ MySpace,⁴ Badoo,⁵ and Bebo.⁶

c: Professional

A professional network or particularly professional network service is a category of social network service that is concentrated merely on communications and associations of a corporate/professional nature [22]. Instead of counting private, non-corporate or unprofessional relationships, the focus is to either find a job or move a step ahead in professional career [20], [22]. Some substantial professional platforms include LinkedIn,⁷ Xing,⁸ and Data.com Connect.⁹

d: Photos

Photo sharing platforms allow users to share photos online publicly or privately. Sharing refers to viewing the images but not necessarily download them [20]. Objectively, these sites allow users to backup images, make the images searchable, share images, and even give them the control, who can see their shared images. These platforms could also be used for multi purposes such as image resourcing and repository, visual literacy, and research purposes. Some substantial photo sharing platforms include Flickr,¹⁰ Instagram,¹¹ and Pinterest.¹²

e: Videos

Video sharing platforms help users to share their videos online publicly or privately. Some of the video sharing websites allow users to upload short length videos, however, others allow lengthy video content as well. Objectively, these sites allow users to save, share, comment, and control viewership of their shared content. Online SM platforms are one of the extensively used media to share video content, these days. These platforms also offer live streaming features to exclusively get connected with other users. Moreover, they also allow users to link their existing SM accounts such as Facebook with the video-sharing websites to instantly share videos [23]. Some substantial video sharing platforms include YouTube,¹³ Vimeo,¹⁴ and Veoh.¹⁵

f: Question/Answer Forums

Question/Answer (Q/A) forums tend to answer the questions asked by different users across multiple domains. Users are able to see the questions asked by other users if they help them, and similar is the case with posted answers usually by domain experts. Q/A forums are often fused by huge and professional corporations and inclined to be applied as a community which lets users in alike domains discuss questions provided with expert answers. In social Q/A services, any user can ask a question and also post an answer. Shah et al. [24] divided these services into three categories – digital reference services, where users can ask for help from librarians, expert services, where organizations offer Q/A service and experts are supposed to answer the questions, and social or community Q/A, where everybody can ask and answer a question. Sharing knowledge over online communities helps users to encourage self-presentation, peer recognition, and social learning [25]. Some substantial Q/A forums include StackExchange,¹⁶ Quora,¹⁷ Answers.¹⁸

B. Deep Learning

Here we explain significantly practiced concepts and terminologies pertinent to the DL.

1) Terminologie

a: Neural Network

Neural Network (NN) or Artificial Neural Network (ANN) is a paradigm to process information specifically fortified by the biological neurons (like in human nervous systems). It attempts to find fundamental relationships in datasets by using the series of processor that imitates the way the human brain works. The significance of this paradigm – it comprises numerous extremely interrelated processing elements (artificial neurons) operating mutually to solve particular problems. A neural network usually encompasses numerous tier-wise arranged processors operating in parallel. Raw input is injected to the first tier, whereas the output from each preceding tier is injected as input to the succeeding tier. Eventually, the last tier generates the final output [26]. In contrast to DL, training in neural networks always happens simultaneously for all layers.

b: Convolutional Neural Network

Convolutional Neural Network (CNN) is encompassed of numerous layers with few feature representation layers and other layers as in a typical neural network. Frequently, it starts off with two varying layer types namely; convolutional and sub-sampling. Nonetheless, the former layers accomplish convolution processes and the later layer reduces the size of earlier layers [1]. Owing to fewer parameters, training CNNs is rather easier than more fully linked networks. Moreover, with a fixed size input, CNNs generate absolute outputs which diminish the cost of preprocessing.

c: Recurrent Neural Network

Recurrent Neural Network (RNN), a class of ANN where links between components construct a directed graph. It lets an RNN parade a progressive dynamic behavior. Distinct from feedforward NNs, internal memory of RNNs can be used to process random input sequences. This establishes them appropriate for tasks such as joining handwriting and even speech to text conversion. Moreover, in RNNs, every node is connected to another node with a directed connection whereas nodes could be input, output, or hidden nodes that adjust the data between input and output nodes [7].

d: Auto-Encoder

An Auto-Encoder (AE), also known as auto-associator, or Diabolo network is an ANN used for unsupervised learning. The Auto-Encoder is aimed to learn an encoding (usually a representation) for a dataset. The learned representation can then be used for dimensionality reduction. Recently, the auto-encoder concept has become extensively used for generative models learning. Constructively, the modest form of and auto-encoder is a feedforward, non-RNN with a similar quantity of nodes in the input and output layer. Accordingly, an auto-encoder is an unsupervised learning model with binary parts as encoder and decoder [27].

e: Restricted Boltzman Machines

A Restricted Boltzman Machine (RBM) is a neural network with binary layers. It comprises a single non-connected hidden units’ layer and connected visible units’ layer. Additionally, the connections between hidden and visible units are symmetrical and undirected. There is a bias for hidden together with visible units in the network. Binary values are used for hidden and visible units (0 for hidden and 1 for visible). The applications of RBM lie around classification, dimensionality reduction, feature learning, collaborative filtering, and topic modeling [28].

f: Deep Belief Network

Deep Belief Network (DBN), a class of deep NN in the machine learning, or rather a generative graphical model, encompassed of various hidden layers with latent variables. Within each layer, the connections are between the layers but not between units. When unsupervised training is performed, a DBN reconstructs its inputs probabilistically. The layers then act as feature extractors. Next to learning, to perform classification, a DBN can be further dealt with supervised training. DBNs can be observed as an alignment of merely unsupervised networks, for instance, RBM or Auto-Encoders, where each hidden layer of sub-network acts as the visible layer for the next [29].

2) Concepts

a: Deep Learning

DL deals with a collection of machine learning methods that train several levels of data representations in deep architectures. The learning could be supervised, semi-supervised, or unsupervised [30]. It uses a multiple-layered cascade of nonlinear processing units (neurons) for feature transformation and extraction. The output from each preceding layer is used as input for the succeeding layer as shown in FIGURE 1; $x_{1}$ , $x_{2}$ and so on are the inputs, $w_{i}$ up to $w_{n}$ are weights where $i=1,2,\ldots ,n$ and $y$ is the output. Each gray circle is a neuron which processes input based on an activation function (described later in this sub-section). The neurons exchange messages between each other in a complexly interconnected schema. The links have adjustable numeric weights ( $w$ ) based on training. Eventually, these layers form a hierarchy of concepts. DL is composed of more hidden layers while neural networks consist of up to 3 layers most of the time. The training is occurred in 2 parts for instance, after training layer 1, the output of layer 1 would then forward to the next layer. Finally, the output layer demonstrates the learnt representations [31].

FIGURE 1.

Deep learning.

Show All

b: Activation Function

The activation function of a node (neuron) describes its output given the input or a set of input values. These functions are an extremely important feature for Dynamic Neural Networks (DNNs) [32]. Fundamentally, these functions decide whether the neuron should be activated or not. Also, whether the incoming information for the neuron is relevant or ignorable. It is the non-linear transformation done over the input signals. The result of this transformation is then set to the next layer of neurons as input. Mathematically, the activation function can be depicted as:

$\begin{equation*} Y=b+\sum \limits _{i\mathrm {=1}}^{n} {x_{i}w_{i}}\tag{1}\end{equation*}$ View Source

where

$Y$

is output,

$b$

is bias factor which minimizes the loss function over the training set,

$x$

is the input, and

$w$

is the connection weight.

c: Network Embedding

Network embedding is a technique to learn the vertex representations of low-dimensions in networks [33]. It drives to demonstrate the data into a low-dimensional hidden space. Each vertex in this space is denoted as a low-dimensional vector which helps in the direct computational processing of the network. For embedding a number of networks, it is essential to preserve the network structure both locally and globally [34].

d: Word Embedding

Applying DL to resolve NLP problems aims to obtain well-distributed words representations, particularly the words from vocabulary are mapped to real-valued vector representations. These vector representations are known as word embeddings [35]. Theoretically, it encompasses a mathematical embedding that revolves around a space opening at one dimension per word and ending at continuous and substantially low-dimensional vector space. Using these representations as input, these embeddings can boost the performance of NLP tasks, for instance, sentiment analysis, syntactic parsing, and NLP.

SECTION III.

Deep Learning in Social Media

Connectionism is the philosophy of DL while an individual biological neuron or feature in a deep model is not intelligent. However, a large collection of connections related and acting together can subdue intelligent behavior. In this section, we discuss the problem domains of SM where DL has been used as a key technique to the problem solutions. Next, we discuss the pros and cons of existing methods. To end, we walk through the taxonomic details of these methods and provide a comprehensive summary of the DL methods for SMA. FIGURE 2 shows the taxonomy of the DL methods in SMA.

FIGURE 2.

Taxonomy of DL methods.

Show All

A. User Behavior Analysis

The current living society is a combination of various entities whereas human beings are one of them. Intuitively, the behavior of human beings can be majorly categorized into – individual behavior and group behavior. Both have their own causes and consequences. However, human beings as users in society, behave differently in different social situations. Social behavior is the outcome of certain atmospheric changes, environmental events, or social influences. To obtain the knowledge of the well-being of a society, along with knowing the social changes, it is equally important to become familiar with the social behavior of individuals.

Moreover, it is worthy to determine the impact of social influences on users’ behavior. As defined earlier, the SM is a prominent source of connecting people in society and primarily count on user-generated content. Accordingly, the DL offers captivating techniques to analyze users’ behavior, learning correlations between their past and current characteristics based on SM. Here we go through some categorized tasks performed in SM to analyze users’ behavior using DL.

1) Prediction Using DL

A number of studies exist which used DL to predict human behavior in social networks. Aramo-Immonen et al. [36] learned communities using data from Twitter which is considered as one of the prominent tools for information dissemination. DL could handle multi-dimensional data quite effectively. Aiming which, Zhang et al. [37] proposed a Tensor Auto-Encoder (TAE), deep computational model, to learn features from the heterogenetic YouTube data. Given a reference basis of vectors, a tensor is used to represent the linear relation between vectors. Arrays are one of the ways to represent tensors in computer memory. The dimensions of the array make the degree (rank) of a tensor. For instance, a 2D array can be used to represent a linear map between vectors, hence, a $2^{\mathrm {nd}}$ -order tensor. In order to represent the input data and the representations in all layers, the TAE extends the conventional DL model to high-order tensor space by means of tensors. In this model, tensors are used to fuse the learned heterogeneous data features inside the hidden layer. It benefits the tensor DL model to apprehend the multifaceted relationships of the input data. To train the TAE model, the authors designed a high-order back-propagation algorithm – an important tool to improve the prediction accuracy. However, in comparison with using homogeneous data on TAE, the heterogeneous data took more time as it requires more iterations to train the parameters.

SM data comprises a wealth of valued information usable to make noteworthy predictions. Apparently, heterogenous source learning is yet a non-trivial task in SM. Jia et al. [38] fusing social networks, proposed a novel deep model, Fuses sociAl netwoRks uSing dEep lEarnING (FARSEEING) which integrated the useful information from heterogeneous social networks. Particularly, this is an information fusion task which acquired DL to learn from the complexity of multiple data sources. In this model, the authors used different inner layers to learn complex representations from multiple social networks. First, the users are associated by relating their multiple social forum accounts. Second, the given users are characterized using extracted multi-faceted features such as linguistic, demographic, and behavioral features. Apparently, the activities of users in SM are unbalanced which cause the data missing. This missing data is inferred before feeding the extracted features into FARSEEING using Non-negative Matrix Factorization (NMF). NMF is a set of algorithms in the multivariate analysis where a matrix M is factorized into two matrices X and H, without any negative element in all three matrices.

The low-level features are mapped into high-level features using deep layers then, the high-level features are fused together for learning the task. The users’ confidence level and consistency among multiple social forums are measured to gain a comprehensive understanding of a user’s interest, behavior and personality traits. The data sources used as a ground truth are Quora, About.me, and LinkedIn.

Social networks are the source of creating online relationships among users. Labeling these relationships as negative or positive drive these networks to SSNs. Liu et al. [39], [40] proposed deep belief network-based (DBN) techniques to predict links in SSNs. The prediction tasks such as co-authorship, friendship, trust, distrust, and further associations are considered. It is obvious that social networks have escalated usage rate for communication in a number of crises situations, these days. Lazreg et al. [41] presented a strong analysis of SM posts such as text, images to predict the crises situations communicated by users. The various SM posts are typically informal, brief, and heterogeneous (a mixture of languages, acronyms, and misspellings) in nature. Without loss of generality, identifying the context of the post is often necessary to infer its underlying meaning. Moreover, the posts on other ordinary events are also the data part, which grants supplementary training noise. DL better understands these complex representations to learn the crises situations. In addition, Liu and Zhu [42] used microblog data to predict users’ behavior by proposing an unsupervised drawing of the Linguistic Representation Feature Vector (LRFV). This method could comprehensively and more objectively describe users’ semantic information.

2) Classification Using DL

SM is one of the most prominent modes of interaction amongst people in which they produce, exchange and share ideas and information in networks and communities. In general, SM data is noisy, diverse, of low quality, in large quantity, and heterogeneous in nature. In order to record routine activities, users with diverse background practice SM platforms. This influences the SM data to be subjective. It also gives these data a wide collection of attributes such as the resources used, the appearance of entities in a specific context, information diffusion, link analysis, and so on. For instance, SM tasks such as image annotation and classification are non-trivial because of the diversity characteristic.

However, the SM is heterogeneous with multi-modal user-generated content which inspires to have joint representation for the data. For instance, a flower image could be associated with a number of textual tags which make latent feature learning for image classification quite complicated. A joint representation could better deliver the information associated with the content. Yuan et al. [15] proposed a DL-based approach to classify SM data, particularly images, using latent feature learning. The authors used Flickr dataset and classified the images as linked and not linked with a tag which somehow articulated this as an image classification as well as link analysis problem. Apparently, dealing with such an enormous feature space is a non-trivial task, rather, DL could be a worthy tool to handle image data. The reasons include the unsupervised pre-training of diverse social data characteristics, fine-tuning of features, layer-wise learning structure, and the explanation of more abstract and robust semantics.

Yuan et al. [15] presented a Relational Generative Deep Belief Nets (RGDBN) model and investigated links between information objects which are generated by interactions of latent features. Initially, low-level representations are learned in RGDBN, then using a deep architecture with more layers, higher-level representations are used to learn better the links between images and associated textual tags. The authors believe that integrating the collective effect of latent features learning into deep model could better represent the diverse and heterogeneous data space. Keeping in view, learning useful network representations, Wang et al. [33] presented a Structural Deep Network Embedding (SDNE) model, to efficiently capture the enormously non-linear structure of the complex networks. In particular, it is a semi-supervised deep model with several layers of non-linear functions. The multiple deep layers in SDNE allowed this model to seizure the tremendously non-linear heterogeneous network structure. The objective of network embeddings is to learn complex representations for heterogeneous networks.

Using social networks, people share multi-typed diverse data. However, it is unlikely that users share their personal data, for instance, gender, birth year, demographics etc. User behavior prediction entails classifying the users on the basis of their age groups which can reveal valuable insights about user behaviors among different age groups. Guimarãs et al. [43] analyzed 7000 sentences from social networks. They used Deep Convolutional Neural Network (DCNN) to classify social network posts features such as hashtag, retweet, characters in a tweet, number of followers, number of tweets etc. After extensive experimentation using different machine learning algorithms, for instance, Random Forest, Decision Trees, Support Vector Machine (SVM) etc., they found that DCNN outperformed the counterparts in terms of large-scale data classification. Guimarãs et al. [43] also proposed enhanced Sentiment Metric (eSM) which could classify the users by age, who restrict their personal information.

3) Clustering Using DL

In SM data, community detection is a realistic solution to determine the intrinsic grouping of information objects (defined under Social Media Concepts). In order to group information objects, diverse attributes may have varying contributions. The attribute values have pairs of interest with influence on the grouping task. For instance, the degree of qualification – a worthy attribute in an attempt to group users with corresponding institutions.

In social networks, Zin et al. [44] integrated clustering with ranking and proposed a novel deep model, Deep Learning Cluster Rank (DeepLCRank). This method better illustrates the ranked clusters in social networks. For each item of the cluster – there is a rank assigned on the basis of learned features of information objects in the network. The varied information objects in social networks form much complex representation where DeepCLRank could handle this quite effectively.

4) Ranking Using DL

Individuals tend to use social media as a marker to solve their problem by posting queries. These forums are known as Community Question Answer (CQA) forums. These help users to have satisfactory information. Nevertheless, it is not very likely that users would be able to get the desired content in fractions of time because there exists a lot of answers to the same questions at CQA. It necessitates ranking the answers provided by experts at CQA.

Chen et al. [45] proposed an approach to predict users’ personalized satisfaction using multiple instance DL frameworks. The authors presented a novel model, Multiple Instance Deep Learning (MIDL) framework to predict personalized user satisfaction. In CQA, a single question could have a number of answers. The known aspect is one of the answers is assigned a satisfied tag, however, the known aspect is what exactly the answer is with satisfied tag. This situation intrigues to learn multiple instances for answers. Each answer to a question at CQA is considered as an instance in a bag where each question resolution acquires one satisfactory answer on the Stack Exchange dataset. In terms of the historical behavior of users, common user space is defined and initialized for the representation of each individual user. Subsequently, after extracting features, all are injected into deep recurrent neural networks to rank them as positive or negative.

5) Recommendation Using DL

SM data is a promising source of the incessant recommendation of pertinent content to multi-domain users. The influence of recommendation could be exaggerated if items from different domains are jointly learned and getting recommended. Elkahky et al. [46] proposed a Multi-View Deep Neural Network (MV-DNN), which mapped items and users into a semantic space which is shared and recommends items with capitalized similarity. For instance, people who visit espncricinfo.com would most likely to see news about cricket, like to play games relevant to cricket on PC or Xbox. The authors used several data sources such as Microsoft product logs such as Bing search log, Windows Store download history log, or movie view logs from Xbox to make an interest-oriented recommendation.

DNN is used to incorporate the high-dimensional to the lower-dimensional features space of users and items from different domains. Social network users often belong to different domains, hence always look for items of their interests. MV-DNN has the ability to recommend items based on categorical features such as movie genre, application category, country or region the item belongs to, and so on.

Collaborative Filtering (CF) is also a famous style of recommending the appropriate content to users. CF-based methods typically use users’ ratings to recommend pertinent items to them. However, the sparsity of ratings causes noteworthy deprivation in recommendation performance. Wang et al. [47] proposed a Collaborative Deep Learning (CDL) model which assimilates the learning of deep representations for the content information (items) and collaborative filtering for users’ ratings. The authors used diverse data domains such as Citeu-Like, Netflix, and IMDB to recommend items to users.

In addition, users’ trust also plays an important role in finding trustworthy recommendations. Deng et al. [48] proposed Deep Learning based Matrix Factorization (DLMF) model to synthesize the interests of users and links of their trust. DLMF performed better in terms of recommendation accuracy with unusual data and cold-start users as well. By using Epinions data, the authors used an autoencoder to learn the users’ initial feature vectors and items in the first phase, whereas final latent feature vectors are learned in the second phase. This method could also work for trusted-community detection. TABLE 1 shows the summary of DL methods for user behavior modeling.

TABLE 1 DL Methods Summary for User Behavior Modeling

B. Business Analysis

With the dawn of SM such as social networks, blogs, review forums, ratings, and recommendations are swiftly thriving. To automatically filter them out is very critical for businesses tending to sell their products and recognize new market prospects. However, the large-scale [49], [50] social data makes it tough to classify users’ sentiments automatically. For instance, reviews from two different domains would contain different vocabulary which surges different data distributions for diverse domains. Consequently, domain adaptation can play a transitional role to learn intermediate representations.

1) Classification Using DL

Using SM platforms, people used to manage customer relationship [51], and make hotel decision for outing/eating [52]. Actually, SM platform such as Facebook, Twitter are now an extensive source of input and highly valuable for marketing research corporations, public view associations, and other text mining units. It drags such entities to spend more on SM to gain more business [51], [53].

Glorot et al. [16] presented a DL approach for domain adaptation of sentiment classifiers. DL techniques learn the intermediate concepts between the source and target data. The domain adaptation drift enables DL to learn meaningful intermediate concepts such as product price or quality, customer services, customer reviews about products and so on. Based on the features revealed at the preceding level, the features are learned, level-wise. Additionally, Amazon is a widely used platform for business where DL leveraged better learning representations across all domains.

The authors used Amazon dataset in [16] with reviews from domains together with books, kitchen, electronics, and DVDs. For feature extraction phase, the Stacked Denoising Autoencoder (SDA) is compared with multi-layer perceptron (MLP). The two variations of SDA-1 with one layer and SDA-3 with three layers are used in the comparison.

The MLP performance illustrated that being non-linear supports to extract information, however, is not adequate to accumulate all necessary information from data. It is more passable to use an unsupervised phase which can incorporate data from diverse domains. Obviously, on this wide-ranging problem, a single layer does not suffice to grasp optimal performance. The stacking of three layers together returns the best representation from data. It is worth noticing that the representation learned by SDA $_{sh}3$ is significant for diverse domains and is thus accurately tailored to domain adaptation. TABLE 2 shows the summary of DL methods for business analysis.

TABLE 2 DL Methods Summary for Business Analysis

Ding et al. [54] presented a CNN-based model to classify users based on their products need that is expressed on an SM platform. A product consumed by a certain user is more likely to get endorsed, subsequently. From a range of products, the products’ consumption makes it significant to classify products that are more likely to be consumed by the consumers. The proposed CNN-based product consumption intention model can better classify the words of intention from the text as compared to the SVM along with word embeddings or bag-of-words.

2) Recommendation Using DL

On account of the upsurge practice of SM, people tend to buy attires and dresses online, these days. Lin et al. [55] proposed a hierarchical deep CNN framework to recommend better and efficient clothing options for online customers. The authors used a large-scale image dataset sourced from Yahoo online shopping. The costumes’ images depict high variability in poses and appearances with significantly noisy backgrounds. The reason, the clothing-specific tree is generated with the categories such as Men, Women, and so on, whereas sub-categories include top, dress, coat, outfit and so on. The solution intuition is to match the cloths images liked by customers to that of the images in the dataset. The deep CNN is used to learn discerning features representations automatically, talented of detecting heterogeneous types of clothing images. Additionally, the DL-based hierarchical search offered immediate retrieval response as compared to conventional CNN with manually constructed features.

The deep CNN models are becoming pervasive for learning feature representations. Kiapour et al. [56] proposed a DL-based model to match the precise shopping location with users’ query. The large-scale cloth shop images are used from Tamaraberg¹⁹ and ModCloth.²⁰ The problem of matching users’ clothing query with available and feasible shopping location is intuited as computing the cosine similarity between the cloth features from query and features of online shop images. To recommend customers with better shopping locations, the shop retrievals are ranked based on the computed similarity. Accordingly, Chen et al. [57] also presented a DL-based approach to describe people on the basis of granulated clothing features.

C. Sentiment Analysis

Sentiment analysis, also stated as opinion mining involves the attitude prediction of users’ that are generating massive textual content using multi-typed SM platforms, for instance, Facebook, Twitter, etc. Significantly, the intent of analyzing users’ sentiments is prevailing, that is to classify their intellect towards a specific product or topic is positive, negative, neutral, or even to classify into some other category. It is commonly applied to the product survey answers, customer evaluations, user opinions, and in domains such as education, business, e-commerce, and healthcare. In this section, concerning sentiment analysis, we explore the techniques comprising prediction, classification, and ranking the sentiments.

1) Prediction Using DL

Predicting opinions from SM data is a prevalently active task. The English language has been commonly used for opinion mining task. However, Li et al. [58] predicted sentiment labels for Chinese sentiment corpus. The authors collected 2270 movies’ reviews.²¹ Subsequently, the movie reviews are filtered based on specific criteria such as with rude language, with special symbols, with more than one sentence, with typos, with short or long sentences, or with multi-languages. Eventually, they constructed a dataset composed of Chinese sentiments and named it – Chinese Sentiment Treebank.

The sentiments are classified into 5 classes, specifically, very positive, positive, neutral, negative, and very negative. The authors proposed a recursive DL model namely Recursive Neural Deep Model (RNDM) to predict the labels for these classified sentiments. This model is compared with three baselines models, Naïve Bayes (NB), Maximum Entropy (ME), and Support Vector Machine (SVM). Equally, the RNDM outperformed all of the baselines as it predicted better sentiment labels for sentences with contrastive conjunction structure, like “X but Y” in English.

Recently, in order to resolve a wide range of NLP and text mining tasks, DL techniques are rapidly developed and have drawn persuasive consideration to training deep and complex models on abundant data. Since the text is created by humans, it already comprises morphological aspect such as grammatical rules, syntactic knowledge, for instance, Part-Of-Speech (POS) tagging, and also the semantic knowledge such as the relationship between words and entities, synonyms, and antonyms. However, DL leverages to generate fine word embeddings by incorporating knowledge [59].

Bian et al. [35] conducted an extensive study to show the injection of knowledge-based DL into word embeddings. Nevertheless, Stojanovski et al. [60] proposed a DL architecture for Twitter sentiment analysis using pre-trained word embeddings sourced from GloVe embeddings [61]. Bian et al. [35] used the root, affix, and syllable to infer the semantic meaning from text. In order to enhance the word representations, the semantic and syntactic knowledge is used as additional inputs. The Continuous Bag-Of-Words (CBOW) model is used as the baseline method, however, the Morfessor, Longman, WordNet, and Freebase are the datasets used in the evaluation of word embeddings’ quality learned with fused knowledge and without knowledge. In the same regard, this study also explores three tasks namely word similarity, analogical reasoning, and sentence completion task. After comparison, the DL framework directly generated the embeddings for each root/affix, and syllable by aggregating the morphological elements.

Significantly, because of the layered nature of DL-based models, they are well-equipped to incorporate both semantic and syntactic features learning. However, the authors concluded that syntactic knowledge delivered significant input information but it may be inappropriate for regularized objectives. Nevertheless, semantic knowledge can expand the performance of sentence completion and the word similarity task. In addition, though, for the analogical reasoning task, applying the semantic knowledge as additional input is quite influential.

Accordingly, owing to leverage the users’ expressions and sentiments within 140 tweet characters’ restriction by Twitter, it is rather a non-trivial task. In order to clean the unnecessary information from tweets dataset used for experimentation, Stojanovski et al. [60] removed all URLs and HTML entries from the tweets. The pre-trained word embeddings are used to construct lookup tables, where each is linked with the matching feature representation. In terms of DL, the authors fused two NN models; one is CNN used for feature extraction from tweets, and Gates Recurrent Neural Network GRNN that uses sequential data where input are dependent upon the previous outputs. The noteworthy properties and reasons using a Gated Recurrent Unit (GRU) comprise: it uses less number of parameters, needs fewer data to generalize, and also enables rapid learning. Architecturally, GRU contains gating units – to control the information flow inside the underlying units. Consequently, the fusion of CNN with GRNN dominated the performance of existing individual NN models.

2) Classification Using DL

In current ages, DL has arisen as an operative means for addressing the problems pertinent to sentiment classification. Without human efforts, except the labeling phase, a neural network innately learns a valuable representation, automatically. Nevertheless, the victory of DL extremely counts on the availability of extensive training data. With the thriving of e-commerce and Web 2.0, people start consuming SM increasingly and post comments about their buying experiences on review or merchant websites. This prejudiced stuff is a valuable resource to merchants for products improvement, service quality, and to prospective customers to make appropriate decisions.

Poria et al. [62] proposed a systematic approach to extract short text features. It is founded on the inner layer activation values of a deep CNN. The authors used CNN for textual data feature extraction. However, utterances are translated from Spanish into English using Google translator. The CNN used is composed of 7 layers and is trained using a typical backpropagation procedure – usually convenient in improving the accuracy of the model.

The dataset is composed of 498 short video fragments where a person utters one sentence. For the sake of polarity of sentiments, the items are manually tagged as positive, negative, or neutral. However, by discarding neutral items, in total 447 items are processed. The combined feature vectors of visual, audio, and textual modalities are used to train a classifier grounded on multiple kernel learning (MKL) algorithm, a well-known heterogeneous data processor.

Actually, the authors combined the results of feature and decision-level fusion. Moreover, in feature-level fusion, the features are fed into a supervised classifier namely SVM after extraction, while in decision-level fusion, the extracted features are fed into separate classifiers, and then decisions are combined. The significance of CNN feature extractor is being automatic and it does not rely on handcrafted features. In particular, it well-adapts to the distinctiveness of the specific dataset, in a supervised manner.

Being a subtask of analyzing sentiments, aspect extraction involves identifying the targets of opinions in prejudiced text, particularly in detecting that the judgment holder is either endorsing or complaining about the particular features of a product or service. Poria et al. [63] proposed a novel method for aspect-oriented opinion mining using a 7-layered deep CNN. The traditional methods for feature extraction from text such as conditional random fields (CRF) have limitations. These limitations include several features required to perform even better. The linguistic pattern (LP) needs to be crafted manually which depends on the sentences’ grammatical accuracy. Owing to the automatic feature extraction nature, a CNN framework could effectively overcome such limitations for feature extraction.

The aspect-term features are based on its neighboring words. While, for aspect extraction, a 5-word window is used about all the words in a sentence, particularly ±2 words. The window features which lie locally are considered as middle word features. Next, the feature vector is served to a CNN. Similar to [35], CBOW model is used for creating word embeddings. Google and Amazon embeddings are used as dataset particularly in electronics (e.g., cell phones, laptops), or food-chains (e.g., fast-food, restaurant) domains on which five LP rules defined in [63] are applied. Though, the performance of CBOW over food-chains found to be better than that of electronics’ domain. Since electronics’ domain encompassed less aspect-oriented terms.

Guan et al. [64] also presented a novel framework, Weakly-supervised Deep Embedding (WDE) to classify customer reviews. The focus is on the semantic orientation for each sentence. In this framework, first, a high-level representation is learned which captured the overall distribution of sentences in terms of sentiments using rating information. Next, a classification layer is supplemented on the upper end of the embedding layer for supervised fine-tuning using labeled sentences. Practicing review ratings for sentiment classification [64] is an initial effort in the sentiment analysis community.

A large amount of unlabeled data is trained by using RBM/Auto-encoders. As ratings are noisy labels, so which would mislead classifier. Hence, the following simple 5-star rating scale rule is adopted:

$\begin{equation*} l\left ({s }\right)=\begin{cases} pos,& if~s~is~in~a~\mathrm {4}~or~5-stars~review \\ neg,& if~s~is~in~a~\mathrm {1}~or~2-stars~review \\ \end{cases}\end{equation*}$ View Source

where

$l\left ({s }\right)$

represents the weak label of sentence

$s$

. Amazon customer reviews of three domains namely, digital cameras, laptops, and cell phones are used as a dataset. However, 3-star reviews are ignored. The proposed WDE method is compared with several baseline methods, however, in terms of accuracy, WDE outperformed the existing methods. In a nutshell, WDE actually well trained the DNN by manipulating reviews’ rating information that is commonly accessible on social websites.

Araque et al. [65] proposed a Classifier Ensemble Model (CEM) to depict the significance of multi-sourced information. Significantly, this composition has further information as compared to its base components. The model aimed to enhance the overall performance of sentiment classification by integrating surface (traditional ML classifier) and deep (DL-based) features which could not be attained by using the classifiers separately. Overall, seven public datasets are used from movie reviews and microblogging domain.

Using SM platform such as Twitter, occasionally, users post nonsensical content. It could be categorized as hate or abusive speech, targeting people such as politicians, celebrities, or products. Detecting such hateful intents of users belongs to a certain group or a set of users fit in another group is thus important. Moreover, it is equally significant to recommend appropriate content to users. Badjatiya et al. [66] used several DL architectures namely CNNs, FastText, and Long Short-Term Memory Networks (LSTMs) to classify tweets as hate speech or not using labels – sexist, racist, or neither. Functionally, CNNs are used for hate intent detection, FastText can quickly represent a document in the form of word vectors to tune the word representations using pre-trained word embeddings sourced from GloVe embeddings as in [60], and LSTMs are used to track long-standing dependencies in the tweets. Detecting applicability of DL to detect hate intent in the tweets is the major contribution in [66].

Instead, Pitsilis et al. [67] also presented a deep architecture to classify short text from tweets as hatred speech. Nonetheless, this approach does not solely count on pre-trained word embeddings. The intuition of classifying short text is established on the historical tendency of users to post hatred content in the form of offensive messages. Usually, users prefer to use short terms or words to express their slang intents. Accordingly, tweets of length 30 are used in training deep proposed model, the reason – frequency of word vectors is preferred over pre-trained word embeddings. Alali et al. [68] proposed a Multi-Layered CNN (MLCNN) to classify tweets into five scales – highly positive, positive, neutral, negative, and highly negative. Nevertheless, after empirical evaluation of the proposed model, the authors found 3-layered CNN, the best among all other combinations of layers.

3) Ranking Using DL

CQA forums practice several questions from users and answers from experts every day. For instance, if a user posts her question on a CQA forum, the purpose is to seek for the best answer against her question. However, a number of experts inclined towards posting a quality answer to the same question. This establishes a link between question-answers text pairs. Likewise, in several information retrieval tasks, the links between query-documents exist as short text pairs. These impose the requirement to rank the answer-question pairs or query-document pairs of text. In addition, feature engineering is a protruding aspect of learning these ranks.

Above all, it has been lately depicted that CNNs are quite effective in resourceful learning and embedding input sentences into low-dimensional vector space. It preserves the important semantic and syntactic characteristics of the input sentences. Besides, it also heads towards the contemporary results in many text processing tasks. Severyn and Moschitti [69] proposed a convolution neural network-based (ConvNets) approach to learning the ranks for short text pairs. This model is aimed to learn decent intermediate representations of the documents and queries, later on, used for calculating their semantic matching. The network comprised a wide convolutional layer tracked by a non-linear and simple max pooling – which is used to reduce the dimensionality. The raw words are used as input to the network. This input is needed to be interpreted into real-valued feature vectors. The successive network layers are then used to process these real-valued feature vectors. Furthermore, extracting significant patterns is the objective of the convolutional layer, specifically, discriminative word sequences established within the common training input sentence instances. Purposefully, the authors used two well-known retrievals TREC benchmarks: TREC microblog retrieval and answer sentence selection.

CNN supported in learning more intermediate representations which therefore improved learning of high-quality sentence models. This architecture comprises intermediate representations of the questions/answers, which together establish a much richer representation. ConvNets also do not require manual feature engineering, virtual preprocessing, and peripheral resources, which might be costly or rather not available. The same architecture model has applications in other domains as well.

Tan et al. [70] proposed biLSTM-based method to select a suitable answer from a pool of answers to a question asked at the SM platform. TREC QA question answer selection dataset is used in this study. The authors used distributed deep representations to match the questions in open domain question answering systems with appropriate answers by bearing their semantical structure. The problem is solved with the intuition of higher cosine similarity with the question and the answer is chosen with the highest cosine similarity from a descending similarity ranked list. The biLSTM, a DL-based model played the role where a question has multiple relations to the words or terms used in the answers along with several relations and ideas. The noticeable feature of the proposed model is that it does not depend upon linguistic features engineering or tools whereby it can be applied to any domain. TABLE 3 shows the summary of DL methods for sentiment analysis.

TABLE 3 DL Methods Summary for Sentiment Analysis

D. Anomaly Detection

Anomaly detection is the field of detecting abnormalities from data. While investigating the real-life datasets, a collective necessity is to detect such instances in the data which stand out from the rest of the data instances [71]. Such data instances are known as abnormalities, outliers, or anomalies.

Usually, anomalies are instigated by bugs in the underlying data but occasionally, anomalies are produced due to formerly unknown fundamental process – rather, Hawkins [72] defined an outlier as: “an outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism”. In order to detect anomalies, several efforts have been made using DL. We discuss here some significant DL-based anomaly detection techniques in SM.

1) Classification Using DL

Supervised anomaly detection can play a pivotal role in improving security systems. It can also help security organizations and law enforcement agencies to act proactively and detect destructive and predatory actions and conversations in the cyberspace. Ebrahimi et al. [73] proposed a CNN-based classifier to efficiently detect such actions in large-volume chat logs. The authors used PAN-2012²² public dataset for experimentation. CNNs, as a binary classifier, is used to accomplish the conversational text classification task. In CNN, the convolutional layer usually functions on several input regions. However, the pooling layer is used to sub-sample the higher levels of abstraction in each convolutional layer. Being a text classification task, max-pooling is used as it beat the other method that is average-pooling [74]. It is found that using two convolutional layers is less effective as compared to the one convolutional layer in text classification tasks. However, for image classification tasks, several numbers of convolutional layers might help, indeed. Since the model tended to overfit the data with the greater number of convolutional layers.

Ribeiro et al. [75] proposed a deep Convolutional Auto-Encoder (CAE) for anomaly detection in video data. The authors used publicly available videos including SM platforms. The proposed CAE model does not necessitate the labeled data since all the training instances fit in the non-anomalous group. The sliding windows are used to sub-sample the video frames from video clips, wherein both motion and appearance features are extracted. The frame reconstruction error is presented as the anomaly score. The Area Under Curve (AUC) is used to calculate the performance of the proposed model. It is found that the accumulation of high-level information to unprocessed data can augment the performance of the proposed classifier, CAE. Particularly, supplementing high-level information can be valued given that the types of anomalies known beforehand that are envisioned to be detected.

2) Clustering Using DL

Detecting anomalous events particularly in videos is of yet utmost significance. The scenes are greater in number, since detecting abnormal scenes from video data can be considered as a clustering problem. Feature learning is non-trivial in video surveillance. However, Xu et al. [76] proposed a DNN-based Appearance and Motion DeepNet (ADMN) to automatically and efficiently learn feature representations. A double fusion framework is presented coalescing the goods from both early and late fusion strategies. In early fusion strategy, stacked denoising auto-encoder is proposed to discretely learn both motion and appearance features. Then, one-class SVM is used in order to predict anomaly scores. Towards the end, the anomaly scores are grouped together to detect abnormal events. It is found that the proposed ADMN model does not rely on former knowledge for feature representation learning. Also, it is more powerful than existing hand-crafted learning of video features representations. The downside of this approach includes higher computational cost and also co-occurrence of several patterns in the videos.

Anomalies are nonetheless, a result of multiple events co-occurring or multiple factors causing instances to be declared as anomalies. One of the drawbacks of [76] includes a lack of consideration of co-occurrence of patterns in videos. Hayat and Daud [77] detected co-evolutionary anomalies in heterogeneous bibliographic information networks. Co-evolutionary anomaly (target object) is associated with a number of linked attributes (attribute object). Correspondingly, the influence of each of the attribute objects is calculated on the target objects which helped to identify the cause of the occurrence of anomalies in the underlying data.

Likewise, Feng et al. [78] detected anomalies in the crowded scenes. The authors proposed a deep Gaussian Mixture Model (GMM) with a different combination of layers on top of each other. The motion and appearance features are extracted using Principle Component Analysis (PCA) and applied with clustering. The clusters with less number of members and/or far away from the regular groups are picked as anomalies. It is found that using deep-GMM is way valuable as compared to hand-crafted feature learning. However, short and long-term temporal motion features are still having room to explore. TABLE 4 shows the summary of DL methods for anomaly detection.

TABLE 4 DL Methods Summary for Anomaly Detection

SECTION IV.

Datasets and Benchmarks

In this section, we go through significant SM datasets usually used as benchmarks in the SM tasks.

A. Facebook

In the era of social connectionism, people prefer to connect with their friends online using SM platforms. Facebook²³ is one of such platforms which people use privately to express and share their content. Though, the SM usage of users may vary due to the wide-ranging services offered by these platforms. Certainly, a number of platforms are available to download and use publicly accessible SM datasets. Particularly, the features of Facebook dataset comprise – demographic features such as gender, education, relationship status; user topics such as dispositions, professions; user posting behavior such as slang posts, photos posts, video posts, shares, likes and so on.

An abundant amount of multimodal data is being posted at Facebook every day. However, most of the data is in unstructured form. In order to extract meaningful insights from unstructured data, DL can service in representing it in a form which can answer questions such as – how often does a certain company’s product appear in pictures to a particular user? Correspondingly, the profusion of multimodal data can be sheltered by applying powerful DL models in several noteworthy SMA tasks – textual analysis, image analysis, targeted advertising, and so on.

B. Twitter

People tend to express their opinions about a certain topic. Twitter, an SM platform which people use to publicly express and share their personal viewpoints and interpretations about a certain event, personality, organization, or happenings. The features of tweets include positive emotions, negative emotions, relationship expressions, or health status.

DL has striking implications in NLP. In terms of SM insights, the text is still a dominating factor among multi-typed data. Users practice Twitter to express their personal viewpoint about a certain social, political, educational, personal, or professional event. In addition, most of the Twitter data is posted in the form of text. DL can help in a number of NLP tasks such as crises situation identification, users’ behavior analysis, quantifying users’ enthusiasm about a certain event, and so on.

C. Linkedin

Companies seek individuals who are professional in their field of interest and tend to excel in future career. LinkedIn is a type of professional SM platform where people build their profiles to get hired or to hire the appropriate candidates. Fundamentally, it is an employment-oriented and business-focused online social service to create a network of professionals. Usually, the access to this dataset is not truly public unless requested or subscribe by a registered entity.

DL can play an important role in analyzing the textual, image, or video data provided by the LinkedIn platform. Usually, people use this platform for job search, however, applying DL to scan through the candidates resumes and filter out the eligible ones can be of significant support for human resource managers.

D. Flickr

Photo sharing is one of the most used online service, these days. Flickr is an image and short video sharing platform where people share their experiences and emotions in the form of photos and short videos. These platforms are available in a number of languages including English, German, French, Korean, Spanish, and so on. Account creation is not mandatory to access the content on such SM platforms but uploading imposes the account creation.

DL can be applied for image and video data analytics. On account of the recurrent nature - the RNN-based models can be used to analyze users’ likes, dislikes, recommending the most anticipated content to users, and surrounding the users with advertisements of their personal and pertinent interests.

E. Youtube

Videos and moving objects are the most substantial type of content to fascinate users. YouTube is one of the most prominent SM platforms to share the content in the form of videos. It allows the consumers to upload (with an account), share, rate, or view the video posts and manage their account privacy settings. The user-generated content uploaded at YouTube ranges from educational, artistic, trailers, documentaries, towards official live video streaming.

Analyzing dynamic and live streams is a non-trivial problem, however, deep auto-encoders can leverage dynamic video analysis significantly. The underlying nature of the data anticipates more powerful frame representations which simple ML approaches cannot provide. However, DL can be used to extract meaningful insights from dynamic video streaming platforms such as YouTube, Metacafe, and Vimeo.

F. Stackexchange

Individuals may face a number of problems, every day, for which they sought the trivial and reliable solutions. Stack Exchange is a type of SM platform where people can ask questions from varied fields and topics and learn solutions from community question-answer systems. However, a family of experts, sometimes the normal users tend to answer the questions asked by various users. The reputation can be achieved by posting highly upvoted and quality answers to the questions. Particularly, the answer with a maximum number of upvotes is chosen as the best answer. Stack Overflow, Super User, and English Language and Usage are some prominent sites used for learning programming, information technology, and English linguistics respectively.

CQA is a platform where users seek for the most relevant answers to their questions, the best answers, to-the-point answers, and most importantly to search such answers rapidly. This constitutes a multimodal search altogether. The deep hidden layers can be used to construct a model which can satisfy users multitude of necessities.

SECTION V.

Performance Evaluation

There a number of ways to evaluate the performance of DL methods. The recall, precision, and F1-score are a noteworthy performance measure for prediction, ranking and classification-based tasks [38], [45]. The F1-score stabilizes the recall and precision as:

$\begin{equation*} F_{1}=\frac {precision\cdot recall}{precision+recall}\tag{2}\end{equation*}$ View Source

where

$precision$

– the portion of pertinent instances among instances retrieved while

$recall$

– the ratio of retrieved pertinent instances with the total number of pertinent instances in the data [79]. The classification tasks also evaluated by ROC curve. With ROC curve as a the ground, the Area Under Curve (AUC) is used as an evaluation metric for each class [39].

To measure the difference between actual observed values to that of predicted values, Root Mean Squared Error (RMSE) is often used to evaluate the model performance [42], [48]. The larger RMSE error values signify greater performance loss. Nevertheless, RMSE is a performance evaluation technique that is highly sensitive to anomalies. It can be represented as:

$\begin{equation*} {RMSe}_{fo}=\sqrt {\sum \limits _{i=1}^{N} \frac {{(z_{f_{i}-}z_{o_{i}})}^{2}}{N}}\tag{3}\end{equation*}$ View Source

where

$z_{f_{i}}$

and

$z_{o_{i}}$

are differences between predicted and real values, and

$N$

is the sample size [80].

Usually, the proposed method performance is also evaluated by an empirical comparison of the proposed and existing methods’ results [76]. However, three core performance metrics plays a significant role in order to measure the performance of a DL model; accuracy, size of the model, and the rate of learning.

A. Accuracy

Predominantly, the core objective of DL-based models is to make precise predictions. The diverse nature of SM data demands either to communicate the complete information with long delays owing to the excessive and complex processing or to have data samples in order to make the predictions. However, the later could lead towards imprecise diagnostics and predictions.

For instance, a DL-based model which predicts voter fraud. A binary classifier algorithm process individual votes and classify them as legit or fraudulent. Obviously, the error cost for a fraudulent vote as legit is more severe than a parallel error that marks a legit vote as potentially fraudulent. In case, after processing a million votes, if the model fails to capture 100 fraudulent votes, even then, there will be a high accuracy. However, the end result could affect the context in a greater sense.

B. Model Size

In order to progress the prediction accuracy of the model, the size of the NN matters equally. Also, it is mounting exponentially. Therefore, to process abundant data using deep models, they need to be really efficient, scalable, and robust, so that the computation accuracy may not be transformed if the model needs enhancement.

C. Learning Rate

Learning rate matters in a great deal most of the times for any DL-based method. The reason – DL is used to process plentiful data with improved accuracy and rapid processing. Also, the dynamic nature of DL models, they involve training, deployment, and sometimes re-training – which is a requirement when new data come to the lodge. Therefore, it is quite important – how fast a deep model can learn representations aligned with the speed of new data arrival.

SECTION VI.

Challenges and Future Directions

The preceding sections explored different domains of SM where DL methods are applied, described the datasets used as benchmarks for specific research purposes, and deliberated some performance evaluation measures used in the literature. However, the features allied with SM pose challenges in adjusting DL methods in such a way that they can solve those problems. In this section, we present some topics where DL needs supplementary investigation for SMA. Undoubtedly, these are dealing with high-dimensional data, learning with streaming data, scalability of models, and distributed computing.

A. Trust-Aware Social Recommendation

SM platforms encompass a number of social relations which needs trusted recommendations. People are always ought to get connected in a type of relationship that can be trusted or receive such information that is reliable. Trust metrics have to play a significant role in recommender systems [81]. Nevertheless, disseminating trusted information can be helpful for to conquer unswerving recommendations. Deng et al. [48] made a DL-based effort to recommend trust-aware relations to SM users. However, that study is limited to Epinions and Flixster data only.

Deliberately, the trust in a pool of relations may vary with respect to time. Also, the content rated by users also significantly sensitive to time. More importantly, the ratings which have become obsolete may cause the data noisy and unreliable to be used for social recommendations. Efforts are still needed to proposed reliable and scalable DL-based techniques to recommend trusted information and trust-aware relations in SM platforms.

B. Refining DL and Avoiding Dimensionality Reduction

Machine learning models use a wide range of parameters to predict, learn, recommend, classify, cluster, or group different data items. Sometimes, these random variables or parameters are often denoted as dimensions. Most of the machine learning algorithms perform dimensionality reduction before processing a dataset. However, DL provides the room to avoid dimensionality reduction by improving and undertaking more robust and scalable learning.

Elkahky et al. [46] though used multiple data sources to train and learn DL methods. However, they suggested making DL learning more scalable that one could avoid reducing dimensions. While using an unabridged set of features could depict more durable results through learning a model.

However, it is also indispensable to reduce dimensions [82] which can enrich the network structure. Apparently, an enhanced network structure can help in processing the nodes of DL-based network with greater ease and less complexity.

C. Cut Cost and Put Productivity

With the exponential growth in SM, the availability of online reviews and recommendations have significantly increased, which thereby made the sentiment classification, an interesting topic in academic and industrial research. However, the existence of an even bigger pool of multi-domain online reviews caused it difficult to gather annotated training data. Glorot et al. [16] proposed a deep model, SDA, which is based on stacked denoising autoencoders which use the output of one layer as the input of the next stacked layer, hence, improving representation learning.

SDA can use data from different domains which cuts the computation required to transfer to numerous domains. However, the number of domains adapted is not promising, while there exist several social networks acquiring to adapt data as a single source. In addition, training a DL model using large dataset requires more powerful resources which could increase the cost. It shows us the scope of building more cost-effective and efficient DL methods to incorporate yet more social network domains [76].

D. Beyond English as Text

Sentiment analysis is the research area that investigates the opinions of people, individual evaluations, attitudes, sentiments, appraisals, and emotions towards objects such as products, services, individuals, organizations, events, issues, topics, and their attributes. With the rise of Web 2.0 and enormous growth in SM content, sentiment analysis has now developed as a prevalent and challenging research problem. Users are now able to post their opinions and views on a wide range of social websites such as Twitter and Facebook. However, the information in users’ reviews and opinions is of great significance and needs to be explored.

Today, SM users are able to post their reviews and opinions in a wide variety of natural languages such as English, Chinese, Turkish, Spanish, and so on. Li et al. [58] made an effort to explore Chinese language sentiments using recursive DL. The challenges are two-fold here: one is to create data banks for languages other than English as [58] build the Chinese Sentiment Treebank, and the other is to use and build apposite DL methods to explore SM natural languages beyond English [67]. In addition, there is still a decent pitch to play with DL methods on a wide ground of SM websites.

E. Aspect Extraction

Being a subtask of sentiment analysis, the aspect extraction comprises categorizing target of opinions in prejudiced text. Specifically, in distinguishing the particular aspects of a product, the opinion originator is either endorsing or complaining about. This problem needs a set of linguistic patterns to classify words as an aspect or non-aspect words in sentences.

Accordingly, Poria et al. [63] presented a 7-layered deep CNN model for efficient aspect extraction from Google and Amazon embeddings. Significantly, LPs play a prominent role in aspect extraction in SM data. Hence, it enquires DL for more robust LPs crafting for efficient and precise aspect extraction.

Generally, the recommendation systems recommend items on the basis of the customer’s purchase history, for instance. However, designing deep-models which can learn the representations for the recommendation on the basis of multiple latent aspects [83] or personalized aspects [84] could be of great worth.

F. Heterogeneous Anomaly Detection

Social networks grasp the capital of useful information. While the same user could have her accounts on a number of social networking platforms [69]. Integrating users’ information from heterogeneous SM data sources is always intriguing. It can illustrate a comprehensive understanding of the interests and behaviors of users – which are dynamic, heterogeneous, and multi-contextual. If some data instance is an anomaly in a specific context, however, it may be not in the other context. Also, the same data instance could be anomalous from the multi-points of view – which is significant to detect. Implementing DL approaches for joint representation learning can better seizure the complexity of networks for heterogeneous anomaly detection [85]. However, maintain consistency among heterogeneous social networks is challenging [38].

G. Fusing Anomaly Detection and Social Influence

Escalated usage of social media platforms caused an unprecedented intensification in the social data and provides an extraordinary prospect to study social behaviors of users. However, in the pool of social media users, few are influential while the majority are normal. Similar is the case with the anomalies in social networks. A number of earlier studies propose to solve conventional anomaly detection problems using the activity-based, graph-based, community-based, distance-based, and statistical-based methodologies. Influence analysis and anomaly detection problems mostly investigated independently with the domain specific applications. However, rather studying both problems separately, combinatorial consideration may capture rich semantics from social networks with diverse and more efficient applications.

Likewise, fusing two or more problem solutions can arise to a new valuable solution, diminish the preceding or individual solution cost, and as well as enhance the learning rate for the DL model. Importantly, DL has the ability to learn more complex and heterogeneous representation for networks. The network diffusion based embeddings [86] methods can solve a number of limitations of the DL-based methods including heterogeneity, scalability, and multimodality.

H. Topic-Sentiment Mixture Analysis

With the rise in SM data such as product reviews, business forums, and so on – analyzing sentiments along with the topics of the text is equally significant to swiftly and efficiently recapitulate the textual data [87]. Social and public availability of Web 2.0 generates a path to the unprecedented intensification of SM data whereas a significant part of that data is of textual nature. The SM platforms such as described in section II, the users are unrestricted to express their opinions. As compared to the conventional documents – the SM data (documents) entails the sentiment along with topic-oriented analysis.

Instead, exploring the power of DL-based models for NLP tasks, the distributed representation learning for words can be used to effectively identify topics and predict sentiments for document summarization and sentiments analysis tasks respectively [88]. Existing works focused on topic identification and sentiment analysis individually, however, studying them together and developing a unified model is effective, indeed. It could also support in summarizing textual data and also in CQA platforms.

SECTION VII.

Conclusions

SMA has grown widespread attention, recently. While going through the literature, we found several articles studying diverse aspects of SM problems. However, there are no articles revealing DL prospects using insights from SM analytics. In this article, we particularly enlighten this gap, also the pertinent models and algorithms, comprehensively. In terms of DL, we present the state-of-the-art research accomplishments in SM analytics. We also present the current research challenges and future directions in this domain.

In conclusion, the SM platforms present a number of noteworthy challenges to DL. We provide a detailed depiction of varied SM domain. The DL-based methods have significant power to learn valuable data representations from multi-domain SM platforms such as user behavior analysis, business analysis, sentiment analysis, anomaly detection and many more. However, the aspects including powerful resource requirement to deal with the data heaps, improving productivity and putting down the computational costs, learning efficient data representations from heterogeneous social data sources and so on, still need efficient and reliable DL-based techniques. Using DL, these challenges need to be addressed in a canonical way that proves to be an edge for the scientific community. We have faith that these posed challenges will bring plentiful research prospects to the DL community. Also, they will deliver key developments in various real-life fields such as education, business, e-commerce, medicine etc.

References is not available for this document.

Towards Deep Learning Prospects: Insights for Social Media Analytics

Alerts

Abstract:

Metadata

Abstract:

Introduction

Basics of Social Media and Deep Learning

A. Social Media

1) Terminologie

a: Social Media Analytics

b: Social Network Analysis

c: Big Data

d: Dynamic Network

e: Signed Social Network

2) Concepts

a: Information Object

b: Domain Adaptation

c: Sentiment Analysis

d: Link Prediction

3) Types of Social MEDI

a: Microblogging

b: Friendships

c: Professional

d: Photos

e: Videos

f: Question/Answer Forums

B. Deep Learning

1) Terminologie

a: Neural Network

b: Convolutional Neural Network

c: Recurrent Neural Network

d: Auto-Encoder

e: Restricted Boltzman Machines

f: Deep Belief Network

2) Concepts

a: Deep Learning

b: Activation Function

c: Network Embedding

d: Word Embedding

Deep Learning in Social Media

A. User Behavior Analysis

1) Prediction Using DL

2) Classification Using DL

3) Clustering Using DL

4) Ranking Using DL

5) Recommendation Using DL

B. Business Analysis

1) Classification Using DL

2) Recommendation Using DL

C. Sentiment Analysis

1) Prediction Using DL

2) Classification Using DL

3) Ranking Using DL

D. Anomaly Detection

1) Classification Using DL

2) Clustering Using DL

Datasets and Benchmarks

A. Facebook

B. Twitter

C. Linkedin

D. Flickr

E. Youtube

F. Stackexchange

Performance Evaluation

A. Accuracy

B. Model Size

C. Learning Rate

Challenges and Future Directions

A. Trust-Aware Social Recommendation

B. Refining DL and Avoiding Dimensionality Reduction

C. Cut Cost and Put Productivity

D. Beyond English as Text

E. Aspect Extraction

F. Heterogeneous Anomaly Detection

G. Fusing Anomaly Detection and Social Influence

H. Topic-Sentiment Mixture Analysis

Conclusions

References