Introduction
Literature is an important form for people to spread culture and exchange ideas by shaping images and reflecting social life through language [1]. The rich text mining value of long texts carries and records much information related to human society in the history of Chinese culture. With literary long texts, many political, economic and cultural content descriptions in human life are preserved for posterity and can be studied. Digital technology has promoted the increase in electronic long-text resource reserves, is user-friendly, provides a clear definition of copyright, and provides more opportunities for scientific research and general applications [2].
The exponential increase in the amount of electronic text information makes it fast and effective to obtain target information in massive texts, which has become an urgent need [3]. Text mining techniques can be used on-demand to acquire the knowledge and potential useful value for users who are curious or interested [4] and to extract previously unknown, understandable, and potentially useful value patterns and knowledge from a large number of long-text corpora [5]. It can solve the problems of management, utilization, and application of many unorganized electronic long texts. Analyzing the network of persons in literary works [6] helps publishers and readers find the information they need in long texts, assists publishing decision support, mines long-text knowledge bases, and effectively improves the accuracy of text retrieval and plot understanding.
A novel is a kind of literary narrative with large capacity, complete layout, plot development, and various themes. By describing the story of a character and shaping the character’s image, the character and destiny of the character can be represented in detail, intricate contradictions can be expressed, and the social environment in which the character is located can also be described. Based on these literary contents, combined with the field of social network analysis, the value of characters and their relationships within novels can be discovered.
A typical research area in social network analysis is community division, also known as community discovery [7]. Through user behavior analysis and social network sentiment analysis [8], we determine the closely connected node groups in social networks. Modern Chinese literature is a very good dataset. It reflects real life and reveals the contradictory aspects of modern Chinese society [9]. This type of literature has rich emotions and obvious community differentiation. This article uses “polarity” to describe the sharp opposition between this emotion and the community.
Community detection is an important tool to analyze hidden information such as functional module and topology structure in complex networks [10]. Traditional social network analysis of literary works is based on co-occurrences to determine the relationship between characters in a novel and to construct and analyze the character network [1], which usually discards the wonderful emotions, plots, and characters in the literary works. If one wants to mine the rich and internal textual information, the intrinsic linguistic connection with the literary language of the plot must be explored first.
Although novels are fictional, authors fully combine the understanding of life and social experience from real society in the creative process. Therefore, his or her creative works are a reflection of society, and the analysis based on it is reasonable and can be indirect; it reflects the social value of the author’s cognitive perspective.
Previous studies [11]–[14] mostly focused on selecting influential nodes in social network data independently, while they still need identifying other classes of nodes for pivot selection. Reference [15] simply utilized the node centrality to generate the structure-based network and did not consider contexts. [16] provided an overview of the characters and the relationships between them from the narrative texts but no further analysis of the network structure was presented. In [17], the authors conducted a community detection and character categorization which identify the protagonists and antagonists. This dimension gives limited information from the aspect of frequency about the original literature. Overall, few of previous works have completed graph structure extraction [18] and community division from exacted polar nodes. Extracting knowledge from Chinese literature is a complex and difficult topic for the following reasons:
First, to improve the attractiveness and readability of the works in the creative process, the author usually deliberately conceals, covers, reverses, reveals, and darkens the relationship between the antagonist and the protagonist, which makes it difficult to divide the characters into subgroups [19].
Second, the novel’s natural language has complex semantics and twists and turns. Although there are many techniques that can be used for reference, more fully theoretical experimental verification is needed in named entity recognition [20] and emotional polarity confirmation.
Third, although the amount of digital text is sufficient, the proportion of the labeled text is low, manual labeling confirmation lacks a uniformly standard definition, and the cost is high, which brings difficulties to our work.
In addition, the characters in novels usually have dynamic personalities, which are hidden in various aspects of expression. Even for human readers, it is difficult to classify and identify different types of characters, and it usually needs careful reading for several times to get the subtle differences among those characters.
Finally, there are limited number of datasets available to apply machine learning algorithm [21], [22] and it is quite time-consuming to produce denotations on large scale [23]. It is laborious to label the prior knowledge manually in some scenarios where prior knowledge is hard to obtain [24].
Our work combines the theory on natural language processing, knowledge graph and social network. We systematically propose a integrated framework, conduct the experiments on typical datasets, present the outcomes and solve this unsolved topic by solving the problems synthetically from different fields. We establish the systematic method and start from relatively polar literature that consists of characters with a steady and clear personality. It is a novel trial for community division with character graph building and polarity information extraction. This paper uses natural language processing based on the concept of literary sentiment polarity to select a representative polar literary dataset, extract characters and sentiment indication information in literary space, and mine the node polarity and link polarity contained in the text. After modeling the network of polar literary characters, we divide polar communities in the network. This study is a comprehensive application in the field of text classification and information extraction that integrates natural language processing and social network theory, combined with extracting graph data and visualizing for these data and knowledge structure.
The results show that the use of long-sentence windows can ensure the accuracy of literary interactions in the window during information extraction, and the information contained in the window is not very sparse, which can better ensure the network modeling accuracy. As there is not related benchmark for this work, we select the basic community detection method with general co-occurrence parameter to work as baseline. The accuracy of the community division results of the integrated network polarity method is significantly improved compared to the method based on the co-occurrence network results. It can automatically match the positive and negative communities and includes the complete link structure of the network that clearly reflects the literature from the perspective of polarity, which is the intention of this work.
Our main contributions in this paper are as follows: First, a systematic model is well defined to process long natural language text. It provides general standard operation steps for the machine to understand complex Chinese literature and extract graph data. Second, we propose four methods including evaluation standard of parameter to achieve building a character graph and dividing communities. They combine emotional information with social network computing to form a character network with emotional knowledge. Third, experiments conducted on seven benchmark Chinese novel datasets give out the accuracy and reliability of the methods as well as visualized graph data structure to make them more applicable and demonstrate that the method based on emotional polarity shows a significant improvement compared to baseline performance. Instead of costing manpower to analyse graph knowledge in texts as before, the automatic methods provide more change for applicaition on data mining. With texts in literature industries, the methods are designed to benefit writers, readers and publishers. With other types of compatible texts, the methods are favourable to politicians, governments, media industry and knowledge digitalization.
Graph Data Extraction and Character Graph Building
A. Polarity Data and the Polar Literary Character Network
Literary emotional polarity (LEP) refers to the obvious literary emotional tendencies of the characters and the relationships between the them in literary works under a specific classification criteria based on community relation factions, such as the antagonist, when the authors develop the plots and describe the characters’ behaviors.
The discrimination of literary polarity provides a sense of community division within the literature. As characters in a literary work can be determined by emotional tendencies, character relationships can also be determined by emotional tendencies and values. Literary works of polarity are referred to as polar literature (PL).
The content of PL shows polarity. Therefore, the nodes, relationships, and communities extracted according to the content also show polarity. Together, they form a network model containing polarity information and therefore we can build the knowledge graph [25]. We will define and explain these concepts as below.
Definition 1 (Polarity of Vertices, PoV):
In PL, all the characters described by the author are called literary character sets
When the author’s description and the reader’s understanding reach a consensus that emotions are positive,
In this paper, it is necessary to remove
Definition 2 (Polarity of Edges, PoE):
In PL, the relationship between all the characters described by the author is called a link set
If
Definition 3 (Literary Character NetworkG(V,E))
:
It refers to the simple undirected positive-weighed network composed of the literary collections
Definition 4 (Polarized Literature Character NetworkG'(V',E'))
:
It refers to the undirected network composed of PV set and PE set in PL. In this paper,
Definition 5 (Characters Community, CC):
CC is also called the literary vertices sub-network and refers to sub-network corresponding to the characters that are closely connected in
The phenomenon of communities in the network diagram is called the community structure, and the community structure is a common feature of the network. The general literary character network comprises two types of communities: communities that do not intersect with each other are called nonoverlapping (disjoint) communities, and communities containing intersections are called overlapping communities [27]. Given a network, the process of determining the community structure is called community detection. The detected communities are denoted by
Definition 6 (Polarity of Community, PoC):
In
A polar literary character network
The goal of community division (CD) is to extract the information inherent in the literature by identifying PV, PE, and classifying the vertices into the corresponding PC. A key issue of literary polarity is the classification criteria used to determine the polarity. The construction of a literary character network is based on the reading experience and plot development. In the novels from the new era, literature has developed in a pluralistic trend, and various plot modes have been blended with each other, thereby comprehensively and profoundly showing the complex and multifaceted spiritual world of human beings [28]. There are usually two different frameworks for character classification in novels: one framework is the nature of the character itself, and the other framework is the role of the character in the work [29].
The criterion of actionability is not suitable for the classification standard of literary polarity. The judgment of social morality may be different for different readers, so the polarity should be judged by the author. From the author’s specific perspective, readers and authors can reach a consensus and obtain reliable emotional orientation results, which is a good polarity division standard, and a literary work applicable to this standard is a polarized literary work. This paper’s work is different from analyzing general social network communities; the two communities have obvious polar distinctions.
B. Text Space and the Text Space Entropy
Definition 7 (Morphemec
):
In morphology, morphemes are the smallest grammatical units with a combination of phonetics and semantics.
Morphemes can be divided into free morphemes, bound morphemes, and semi-free morphemes, according to whether they can form words alone [30]. According to the different information planes used in this text, we define three types of text spaces: a natural morpheme space, a character-indicating text space, and an emotion-indicating text space [31]. According to the extraction order of text information, we first introduce the natural morpheme sequence:
Definition 8 (Natural Morpheme Sequence, NMS):
Definition 9 (Character-Indicating Sequence, CIS):
Definition 10 (Emotion-Indicating Sequence, EIS):
Definition 11 (Literacy Field Window, LFW):
Under different segmentation modes (chapter identifiers, paragraph delimiters, and periods and other punctuation), the sequence is segmented to represent the literary unit with the largest distance of morpheme interactions.
The segmentation mode refers to the methods of semiotics. This paper does not consider the influence of inner punctuation on semantics [32]. The size of the LFW is called granularity, denoted as
Definition 12 (Polarity of Window, PoW):
PoW is the tendency of the LEP expressed by the entire text in the LFW. When \begin{equation*} p(LFW)=p(S_{j} \subseteq S)=\sum {p(c_{i} \in S_{j})}.\tag{1}\end{equation*}
Definition 13 (Literacy Filter, LF):
Under an LFW of a certain granularity \begin{equation*} T=F(S,Q)=LFW_{Q} (S)=\{S_{j} \subseteq S\}_{Q}.\tag{2}\end{equation*}
Under a set of
Definition 14 (Text Space Entropy, TSE):
The information content of the text space
The text space entropy is used to evaluate the information content of the text space when extracting subspaces. Information theory states that the amount of information is related to changes in the uncertainty, and the changes in uncertainty are related to the number and probability of possible outcomes [34], \begin{equation*} H(T)=E[I(T)]=E[-\ln (P(X))].\tag{3}\end{equation*}
Among them, \begin{equation*} H(T)=\sum \limits _{i} {P(x_{i})I(x_{i})=-} \sum \limits _{i} {P(x_{i})\ln P(x_{i})}.\tag{4}\end{equation*}
Using chapter identifiers, paragraph separators, periods and other punctuation as segmentation identifiers, the corresponding information content is also different when the smallest unit of the natural morpheme space is divided. When
In the following section, we divide the community based on the natural morpheme space
Building Character Graph and Dividing Communities Using Emotional Data of Character Network
A. Community Division Based on a Co-Occurrence Network Structure
In
Representing the entire network and then using algorithms to complete community discovery based on the co-occurrence network structure of the community division is the method called community division based on co-ocurrences (CDC).
We divide our methods into 4 parts to fully demonstrate that CDEP performs much better than CDC qualitatively and quantitively, which is GN community discovery based on co-occurrence frequency. CDC is a general but sketchy method to transform natural text into knowledge graph.
The community division of the co-occurrence network structure is completed based on the general literary character network
In this way, a simple community division result is obtained, that is, two community subnets. However, since
B. Community Classification Based on the Node Polarity
Community division based on node polarity is completed based on
Classifying all the positive nodes directly into the positive community
This method not only considers the co-occurrence network structure but also fuses the co-occurrence context semantics [38] of character nodes into the network through the PoW, completes the division of polar communities, and improves the utilization of the NMS. Therefore, the contribution to the accuracy of the model is greatly improved. However, the nodes are directly classified into the community, and the result does not include the graph structure of the network.
C. Community Classification Based on the Link Polarity
Community division based on the link polarity is also completed based on
Algorithm 3 CDEP
The set of polar links indicates the.polarity interaction between nodes in the network of polar literary characters. Although it contains emotional polarity information, these two communities are the final division results, and their positive and negative polarities still need to be matched to complete the evaluation of the algorithm model. CDEP uses the PoW to measure the distance between nodes so that the communities in the network can be identified, and the community polarities can be divided by simple matching.
The magnitude of the link polarity value indicates the interaction strength of the attraction or repulsion of the node. The link polarity value is also a kind of co-occurrence number. What has been improved is that it is the number of co-occurrences after emotion indication space filtering from
D. Community Classification Based on the Polarity of the Entire Network
The integrated network community divided polarity, called polar community division based on the network’s polarity (CDNP) takes into account both the link polarity and the node polarity.
In addition, when coupling
Algorithm 4 CDNP
The appropriate polarity threshold
Algorithm 1a GN Community Discovery
Algorithm 1b CDC
Algorithm 2 CDVP
The comprehensive polarity interaction between the nodes in
Experiments and Analysis
A. Data Acquisition and Character Dictionary
In our experiments, seven novels with characters that have obvious positive and negative polarities in the social network of the text were selected. The text and text expansion materials were used to determine the node dictionary and to label the node polarities. Depending on the length of the text and the complexity of the plot, the number of nodes is between 6 and 22.
Text-based statistical results can help retrieve most of the information about the person’s name in the text [39] and avoid omissions when constructing the entries, but a small amount of information may be duplicated or even wrong. The text expansion material helps us match the main role of the character and other roles into the same character entry [40].
In literary works, the main character’s tendency to depreciate is obvious, and the polarity is easy to distinguish; secondary characters are mostly used to assist the protagonist and the main plot. During the extraction of
B. Selection of the Granularity of the Literary Interaction Window
The CDC, CDVP, CDEP, and CDNP algorithms all need to complete the extraction of
We use the text space entropy to evaluate the information content of windows obtained by dividing the text space
The information content of text space through different filters in RenMinDeMingYi. The ordinates on the two sides show the frequency and the percentage of X with different windows.
As shown in Figure 1, most short-sentence windows cannot be extracted to indicate the appearance of the character and only play a role in plot development and logical connections. However, for the research of this paper, the information in the window is missing and is an invalid window because the length of the window is too short. A small number of windows can be extracted with one character-indicating morpheme, and vertex polarity may be calculated but cannot be used to calculate the link polarity, as there is a small proportion of active window. In summary, the information content contributed by the short-sentence granularity is not sufficient, and it is not suitable for selection in this work.
The chapter window regards each chapter as the smallest unit for analyzing the status of the characters and the literary polarities. After the literary space extractor of the chapter granularity is implemented, the amount of contributed information is sufficient, but at the same time, the accuracy of the polarity indication in the literary interaction window and the window error are considered. If the length of the interaction window is too large, the data are noisy and inappropriate. Table 2 further illustrates the calculated indexes when different windows are selected for 7 novels: the text length
The information content measured by the text space entropy and the polar action accuracy rate measured by the number of character-indicating morphemes and emotion-indicating morphemes contained in the window cannot be optimized simultaneously. In terms of the text space entropy and polarity accuracy, long-sentence windows and paragraph windows are suitable. Taken together, the percentage of invalid windows in paragraphs is low, and there are more valid windows in long-sentence windows, which can ensure that the accuracy of literary interactions in the window and the information content in the window are not very sparse. This can better ensure network construction and the accuracy of the model. Long sentence granularity is eventually obtained with the particle extractor to get
C. Character-Indicating Morpheme Recognition
In the data acquisition and processing phase, we have constructed the node dictionary, and the extraction of nodes is based on this dictionary. The word segmentation results of natural morphemes are compared with the node dictionary, keeping the matching morphemes, and these morphemes are arranged in the original order to call a new sequence, which is the CIS. Before word segmentation, a long-grained literary text space extractor generates a natural morpheme space from a natural morpheme sequence, then we perform word segmentation on the natural morpheme window in the natural morpheme space, and then matches the node dictionary. Finally, a high-confidence character-indicating text space is obtained.
D. Emotion-Indicating Morpheme Recognition
The analysis and recognition of polar morphemes belong to the field of sentiment analysis, and databases of Chinese sentiment dictionaries are relatively scarce. To improve the accuracy of polar morpheme recognition, we synthesize several Chinese polarity sentiment dictionaries and build a special polarity sentiment thesaurus. The NTU Sentiment Dictionary [41] and the Hownet Sentiment Dictionary [42] are Chinese emotional polarity analysis keyword sets with a total of 6110 positive words and 11152 negative words.
The morphemes that match the natural morphemes are retained, and these morphemes are arranged in the original order to call a new sequence, which is the EIS. Correspondingly, the NMS can obtain the EMS.
E. Community Discovery and Community Matching
Regardless of the emotion-indicating morphemes, CDC builds a simple literary character network. Figure 2 shows the visual result of the CDC method and it will work as the baseline for performance comparison. Without emotional knowledge being extracted, the result of the CDC is 50.00% (avg. weight) and 44.86% (link weight), see Table 7. Li D.K. (
) is classified as the biggest positive node in pink surrounded by other good characters such as Sha R.J. ( ) and Wang D.L. ( ). Hou L.P. ( ) has remarkable co-occurrences with those negative shown as green nodes, so it’s divided into negative community though he is a good character without polar information at this step. Gao Y.L. ( ) is the head of negative community, other bad characters like Qi T.W. ( ) and Ding Y.Z. ( ) are distributed and emanated around him.The roles of the co-occurrence network in RenMinDeMingYi and community division performed by CDC. The names of characters are shown as the original processing result in Chinese. In the graph, pink nodes represent that they are divided into the positive community, while green ones are classified into the negative community. The larger the nodes are, the more frequent they show up in the literature. The thicker the edges are, the stronger connection (whether good or bad) the nodes on two sides have.
The CDVP algorithm considers the emotion-indicating morphemes. In this paper, the ratio of the number of negative words to the number of positive words in all the matching polarity indication windows of the node is selected as the polarity quantization value of the node. Tables 6 and 7 list the node polarity quantization values and the node polarity statistics in all the novels, including the number of positive nodes, the number of negative nodes, the maximum and minimum node polarity quantization values, and the threshold values.
The CDEP algorithm matches the link in the character-indicating window and the polarity word in the corresponding emotion-indicating window. Figure 3 shows a typical timing diagram of the link polarity for the RenMinDeMingYi. The cumulative result is used as the weight of the link in the polar literary character network. In Algorithm 1a, the weight of the network link only accepts positive values. Considering that the negative link indicates the node’s repellent relationship, weakening the repellent relationship to a link disconnection has no effect on the community discovery results.
The interaction data between characters by the timeline in RenMinDeMingYi. The red lines show the positive interactions between the two nodes, which means how friendly they are to each other. While the green ones represent negative and hostile interactions. The gray lines are the superimposed of all their interactions. Through the entire timeline, the accumulated value will be calculated as PoE in the graph.
Finally, the scattered nodes discovered by the community weakened as a result of the disconnection and were classified into the negative community. The average weight method does not consider the importance of the character in the text and comprehensively evaluates the overall effect of the model.
F. Evaluation of the Methods
The node weight method takes the frequency of a character’s appearance in the entire literary space as a weight to evaluate the importance of the node and uses the weight to assign different discrimination scores to the nodes.
Table 8 shows the accuracy of the results of the seven novels when the four methods are used to divide the polar community, and the average weight method and the link weight method are used to evaluate the overall method. The accuracy of CDC is calculated from the common community detection method and general co-occurrence calculation. It is used as a benchmark for comparing the accuracy of the methods proposed in this paper. This method can reflect the character network in the literature to some extent, but it is not suitable for polar community division. The CDVP algorithm first determines the PoV and then builds the network structure from the positive and negative communities. The CDEP algorithm divides the community by using the common community discovery algorithm to weaken the negative link weight. The CDNP algorithm takes the node and links polar polarity. The statistical test (
The performance of the CDEP algorithm is a bit lower than CDNP because of the following reasons: When the polarity indicator arrives at the link, it points to all the possible character pairs. The indication relationship is not clear, although the influence of the polarity effect is eliminated as much as possible through the choice of the window granularity. However, the links are combined with the corresponding characters without regard to the true existence and high reliability of the linking relationship. The accuracy of the sentimental polarity keyword set may not be adaptable to the link polarity context. Most of the existing sentimental polarity keywords are attached to a single subject, that is, a node. While these emotional indicators should have been attached to several subjects but there are fewer words that can have a dual-agent or multiagent relationship. With all these words pointing to the link together, there is considerable noise in the link polarity calculation.
The CDVP calculation does not face the two problems above and achieves better test performance. In the CDNP algorithm, the influence weight of the node polarity is set to a higher value, which can include the network structure information while the polar community is better divided so that the positive and negative polar nodes can be connected to form a complete network structure. For instance, in the graph of RenMinDeMingYi in Figure 4, Zhao L.C. (
) and Gao Y.L ( ) are regarded as bad characters shown as square nodes because they both have polarity values smaller than the threshold. And they are closely connected to each other because their interactions always indicate that they have positive emotions towards each other. Ouyang J. ( ) and Ding Y.Z. ( ) are also bad characters but have a closer connection, while Lu Y.K. ( ) and Ji C.M ( ) are connected good characters shown as round nodes because of polarity value are higher than the threshold. As Hou L.P. ( ) and Gao Y.L. ( ) occurred next to each other frequently and EISs indicate negative polarity, there are a few edges are confusing and Hou L.P. ( ) results as a false negative node in the network. In the graph of ZhiQuWeiHuShan (see Figure 4), Zuo S.D. ( ), Ma X.S. ( ) and Xu D.M.B. ( ) are in the negative clustered community, while Shao J.B. ( ), Li Y.Q. ( ) and Yang Z.R. ( ) are connecting in the positive community.The graph extraction and network building result of six Chinese novels, where square nodes are good characters and round nodes are bad characters. Separate nodes are set as bad ones because they have few interactions with either good or bad characters, which means they tend to be influenced by others. With CDNP, we can get the result of community division together with the graph structure of these literary works.
By analyzing the accuracy results, it can also be found that in the novels of the 1970s and 1980s, the emotional tendency was more obvious, the polarity of the natural language nodes became stronger, and the model discrimination accuracy was higher. At the same time, when constructing the novel character network, the longer the novel is, the greater the number of nodes, the more information and the higher the accuracy.
As for the efficiency of the methods, the classification result of the graph is relatively simple, but a big difficulty with this method is that as the number of emotional words increases, the detection time of these words will increase linearly. We chose basic and naive filter for searching emotional morphemes, which took relatively more time but perform easily and equally. The searching process can be substituted by deterministic finite automaton (DFA) or Bayesian spam filter (BSF) to extract graph knowledge from the huge amount of literature materials.
Conclusion
Chinese literary works are precious cultural treasures and important datasets for natural language processing. However, the data labeling and cleaning phase is complex and time-consuming, and there is a lack of sufficiently labeled sets to use machine learning methods for analysis.
This paper identifies the characters and emotional morphemes from natural text data and divides the literary interaction window. Based on the concept of literary sentiment polarity, the graph model of polar literary characters is built, and the polar communities are divided based on the extraction of polar nodes and polar links. The literary space entropy clearly defines the process of extracting information from the natural text in long texts, and the literary interaction window defines the range of nodes and polar morpheme indicators.
The graphs built with CDC are the baseline for comparing the performance. Most of the previous works on network analysis are based on co-occurrence only, so the analysis is limited in interaction frequency regardless of how emotional interactions affect the relationships. Therefore, the results of those community division methods only describe some parts of the exact original literature. In our methods, we attach emotional polar information to nodes and edges, so that characters and their relationships contain abundant knowledge about how they act and interact with each other. This multidimensional information has a significant contribution to graph building, social network computing and social network analysis such as community division. The experimental results show that the accuracy of the CDNP algorithm is much higher than the baseline. The key work of this paper is focused on the network structure using a dictionary to achieve the effective division of opposite characters. Though the method based on the link polarity has some limitations, it contributes valuable information of network structure which can be combined with the method based on node polarity. The CDNP method takes the advantages of them and can automatically match the positive and negative communities as well as include the complete link structure of the network. It clearly reflects the connotation of literary works from the perspective of polarity.
This systematic model provides general standard operation steps for the machine to understand complex Chinese literature and extract graph data. The experiments conducted on seven benchmark Chinese novel datasets demonstrate that the method based on emotional polarity shows a significant improvement compared to baseline performance. It is an unprecedented and beneficial effort to analyze polar characters in literature. This work plays a role model for further work on long-text literature understanding and is a meaningful reference to researches on natural language processing in the future.
As an interdisciplinary problem, it took impressive efforts to complete the fusion and the connection of semantics, network theory, emotion quantification recognition. And it is a really complicated and difficult topic to extract knowledge from literatures. Especially for Chinese literature, rhetoric sentences, polyseme base, lack of labeled data and dynamic personalities for characters make it even more laborious to convert the natural language to semantic data and calculate the relationships among the entities. We manage to overcome these difficulties, simplify and model this problem into a standard operation algorithm. The practical application of the method is that readers can quickly find the positive and negative nodes in the literary work to understand the characters and plots with a small amount of manual assistance and assist decision making for writing and publishing [43]. It is a bold experiment and novel attempt for computers to understand natural language literary works. These methods can be used for reference when integrating short texts and analyzing polar networks of public opinion. Though it is always a tricky and troublesome problem to extract emotional information, recognize character graph and divide community into specific sub-communities, we get a satisfying result for these problems with the methods in this paper.
There is a saying in Chinese, In the book, there is
For further research, time can be considered as a variable in a character graph analysis [44], which would capture the character-spaces as the narrative is unfolded [45]. Some well-defined temporal centrality measures will categorize the actors in different time spans. More stylistic analysis according to specific literature of different languages will enhance flexibility of our methods [46], [47]. It can also be used with online datasets to prioritize the responses and better manage numerous posts [48]. For the basic and supportive process of the natural language, customed entity recognition method will make it more efficient [49] and multi-polar emotion calculation will represent multi-dimensional knowledge for tremendous literature. With the character recognition method [50], our work will contribute to its promising performance, provide a good benchmark that can assist investigations of intelligent heritage in historical document images. Among with methods that identify and analyse other types of relationships, it will assist extracting knowledge from films because adaptation of a film is the transfer of a novel to its visual medium [50]–[52]. We believe that in the next few years NLP will be a promising technology and extract more interesting results from huge volumes of works of literature in human history.