CN108052743B  Method and system for determining step approach centrality  Google Patents
Method and system for determining step approach centrality Download PDFInfo
 Publication number
 CN108052743B CN108052743B CN201711349361.7A CN201711349361A CN108052743B CN 108052743 B CN108052743 B CN 108052743B CN 201711349361 A CN201711349361 A CN 201711349361A CN 108052743 B CN108052743 B CN 108052743B
 Authority
 CN
 China
 Prior art keywords
 node
 nodes
 subgraph
 pointer
 current
 Prior art date
 Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
 Active
Links
 238000004364 calculation methods Methods 0.000 claims abstract description 50
 238000010276 construction Methods 0.000 claims description 27
 238000010845 search algorithm Methods 0.000 claims description 5
 241000212893 Chelon labrosus Species 0.000 claims description 3
 238000007405 data analysis Methods 0.000 description 3
 238000004458 analytical methods Methods 0.000 description 2
 238000000034 methods Methods 0.000 description 2
 230000000875 corresponding Effects 0.000 description 1
 230000000694 effects Effects 0.000 description 1
 230000004048 modification Effects 0.000 description 1
 238000006011 modification reactions Methods 0.000 description 1
Classifications

 G—PHYSICS
 G06—COMPUTING; CALCULATING; COUNTING
 G06F—ELECTRIC DIGITAL DATA PROCESSING
 G06F30/00—Computeraided design [CAD]
 G06F30/20—Design optimisation, verification or simulation
Abstract
The invention discloses a method and a system for determining the step approach centrality, wherein the method comprises the following steps: constructing a data graph of a target database; all nodes in the data graph are used as a residual node set; calculating the distances between all node pairs in the residual node set, and calculating the proximity centrality of all nodes through the distances and the weighting functions; selecting a node with the maximum approaching centrality from the residual node set, wherein the step approaching centrality of the selected node is the approaching centrality calculated in the current subgraph, deleting the selected node from the residual node set, and deleting the edge associated with the selected node from the data graph to generate a new subgraph; and judging whether the deleted residual node set is empty or not, if not, repeating the steps to continue the calculation until the residual node set is empty, and at the moment, calculating all the nodes to obtain the self step approach centrality. The step approach centrality index provided by the invention has better locality and antiinterference capability.
Description
Technical Field
The invention belongs to the field of graph data analysis in the field of computers, and particularly relates to a method and a system for determining step proximity centrality for largescale data graph analysis.
Background
At present, a wide variety of graph data and graph structures are abundantly present in our lives, such as social networks, traffic network graphs, biological networks, financial networks, scientific data network graphs, and so on. With the continuous development of society, the scale of the data graphs is rapidly expanded, and the scale of the data graphs is increased day by day, so that the difficulty of directly analyzing the structure of the whole graph is very high. Recently, researchers have proposed many methods for detecting and evaluating communities and centrality measures in largescale graphs. The nodes in the graph are sorted by detecting the key elements in the graph and according to the importance of the nodes, so that the important candidate nodes are tracked. The methods provide powerful tools for researchers in various fields to understand the composition, the function and the dynamic evolution process of a real system. However, the diversity of largescale data graph structures determines that the centrality measurement itself is a very difficult problem, and therefore, how to evaluate and detect the centrality of nodes in the graph, to provide more accurate measurement indexes and calculation algorithms, and to perform functional explanation on the detected modules and the centrality nodes is still a current very challenging task.
Various classical node centrality indexes and calculation methods have been designed at present, and the application of the classical node centrality indexes and calculation methods in largescale graph data analysis is shown. However, most data graphs have small world characteristics, and the centrality index obtained by global calculation is often interfered by a small part of nodes with larger centrality. Therefore, how to describe the characteristics of the nodes in a local scope becomes a problem to be solved urgently.
Disclosure of Invention
Aiming at the defects or improvement requirements of the prior art, the invention provides a method and a system for determining the step approach centrality, so that the technical problem that the centrality index obtained by global calculation of the traditional node centrality index and calculation method is often interfered by a small part of nodes with larger centrality is solved.
To achieve the above object, according to one aspect of the present invention, there is provided a step approach centrality determining method including:
(1) establishing a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relation among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
(2) initializing current subgraph number l0 and current residual node set V'_{l}V, current remainder set E'_{l}E, by V'_{l}And E'_{l}Construction of a Generation subgraph G_{l}(V′_{l},E′_{l}) Or G_{l}(V′_{l},E′_{l},W′_{l}) Wherein, W'_{l}Represents E'_{l}A weight set of the middle edges;
(3) generating subgraph G_{l}Calculating to obtain the current residual node set V'_{l}Distance d between each node pair_{l}(u,v)；
(4) According to the weighting function and the distance d between each pair of nodes obtained by calculation_{l}(u, v) generating subgraph G_{l}Calculating to obtain the current residual node set V'_{l}The approximate centrality of each node in the cluster;
(5) finding out the current residual node set V'_{l}Middleapproach most centrality node set V_{l} ^{*}；
(6) From the current set V of nodes remaining'_{l}To be contained in the node set V_{l} ^{*}Deleting the nodes in the node group to obtain a nextlevel residual node set V'_{l+1}；
(7) Judging the residual node set V 'of the next level'_{l+1}If the current state is not null, executing the step (8), otherwise, executing the step (9);
(8) from the current remaining edge set E'_{l}Will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}Then through the remaining node set V 'of the next stage'_{l+1}And the remaining edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}Generating subgraph G at said new generator_{l+1}In the method, the residual node set V 'of the next stage is obtained through calculation'_{l+1}Distance d between each node pair_{l+1}(u, v) when the subgraph number l is l +1, and returningThe step (4) is executed in which,representing the generated subgraph G_{l}Middle V_{l} ^{*}The adjacent side of (2);
(9) and obtaining the step approach centrality of all the nodes in the data graph G.
Preferably, in step (3), if the generation subgraph G_{l}As an authorized graph G_{l}(V_{l}',E'_{l},W_{l}') calculating the current residual node set V by adopting Dijkstra algorithm_{l}' distance d between each node pair in_{l}(u, v) if said generating subgraph G_{l}Is a noright graph G_{l}(V_{l}',E'_{l}) Respectively constructing the generated subgraph G by adopting a breadthfirst search algorithm BFS_{l}BFS spanning tree with each node as root node for calculating and maintaining distance d between each node pair_{l}(u,v)。
Preferably, in step (4), the distance d between each pair of nodes is calculated according to the weighting function_{l}(u, v) generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' the approximate centrality of each node v is:wherein, α (d)_{l}(u, v)) represents a weighting function, C_{c}(v) Represents the approximate centrality of the node V, and u represents the current set of nodes V_{l}' the nodes remaining after node v are removed.
Preferably, step (8) specifically comprises:
(8.1) from the current residual edge set E_{l}' in will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}(ii) a Then passes through a residual node set V 'of the next stage'_{l+1}And the remaining edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}；
(8.2) if said new generation subgraph G_{l+1}And recalculating the residual node set V 'of the next stage by adopting Dijkstra algorithm as a weighted graph'_{l+1}The distance between each pair of nodes in the set;
(8.3) if said new generation subgraph G_{l+1}And (4) maintaining the residual node set V 'of the next stage in a mode of incremental updating of the BFS spanning tree structure calculated in the step (3) as an unauthorized graph'_{l+1}The distance between each pair of nodes in the array.
Preferably, step (8.3) comprises in particular:
(8.3.1) finding out the residual node set V 'of the next stage'_{l+1}Any node v in_{0}Setting a root node of a target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
(8.3.2) inserting the child node x of the node pointed by the pointer a into the tail of the queue to be modified if the inserted child node x belongs to the node to be modifiedAdding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a pointer b to point to a head node s of the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the connection between the node pointed by the pointer a and the child node x;
(8.3.3) lookup on said generated subgraph G_{l}If yes, executing the step (8.3.4), and if not, executing the step (8.3.6);
(8.3.4) creating a pointer to node t at node s, and then deleting node t from the collapse linked list;
(8.3.5) judging whether the collapse linked list after the deletion operation is empty, if not, executing the step (8.3.6), and if so, executing the step (8.3.9);
(8.3.6) judging whether a child node exists in the node pointed by the pointer b, if so, inserting the child node of the node pointed by the pointer b into the tail of the anchor point queue;
(8.3.7) judging whether the node pointed by the pointer b has a subsequent node, if so, executing the step (8.3.8), and if not, executing the step (8.3.9);
(8.3.8) pointing the pointer b to the node subsequent to the node pointed to currently, and returning to execute the step (8.3.3);
(8.3.9) judging whether the node pointed by the pointer a has a child node, if so, inserting the child node of the node pointed by the pointer a into the tail of the queue to be modified;
(8.3.10) judging whether the node pointed by the pointer a has a subsequent node, if so, executing the step (8.3.11), and if not, executing the step (8.3.12);
(8.3.11) pointing the pointer a to the node succeeding the node pointed to currently, and returning to execute the step (8.3.2);
(8.3.12) completing the incremental update to the target BFS spanning tree and completing for the set of nodes remaining at the next level V'_{l+1}Middle division v_{0}Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete the new generated subgraph G_{l+1}The distance between each node pair in (1) is updated.
According to another aspect of the present invention, there is provided a step proximity centrality determining system including:
the data graph building module is used for building a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relations among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
a generated subgraph construction module for initializing the current subgraph number l equal to 0 and the current residual node set V_{l}'V, current remainder set E'_{l}By V ═ E_{l}'and E'_{l}Construction of a Generation subgraph G_{l}(V_{l}',E'_{l}) Or G_{l}(V_{l}',E'_{l},W_{l}') wherein, W_{l}'represents E'_{l}A weight set of the middle edges;
a first node pair distance calculation module for generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' distance d between each node pair in_{l}(u,v)；
A proximity center calculation module for calculating the distance d between each pair of nodes according to the weighting function_{l}(u, v) generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' the approximate centrality of each node in the set;
a searching module for finding out the current node set V_{l}' node set having maximum nearness to center in center
A node set determination module for determining the current node set V from the current node set V_{l}' will be included in the set of nodes V_{l} ^{*}Deleting the nodes in the node group to obtain a nextlevel residual node set V'_{l+1}；
A judging module, configured to judge the remaining node set V of the next stage'_{l+1}If the node set is empty, if the node set is the residual node set V of the next level'_{l+1}If the data graph G is empty, the step approach centrality of all the nodes in the data graph G is obtained;
a newly generated subgraph construction module for set V 'of remaining nodes at the next stage'_{l+1}Not empty, from the current remaining edge set E'_{l}Will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}Then through the remaining node set V 'of the next stage'_{l+1}And the remaining edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}Generating subgraph G at said new generator_{l+1}In the method, the residual node set V 'of the next stage is obtained through calculation'_{l+1}Distance d between each node pair_{l+1}(u, v) when the subgraph number l is l +1, and returning to the approximate centrality calculation module, wherein,representing the generated subgraph G_{l}Middle V_{l} ^{*}The adjacent side of (2).
Preferably, said node pair distance calculation module, in particular for generating subgraph G_{l}As an authorized graph G_{l}(V_{l}',E'_{l},W_{l}') calculating the current set of nodes remaining V by using Dijkstra algorithm_{l}' distance d between each node pair in_{l}(u, v) generating subgraph G_{l}Is a noright graph G_{l}(V_{l}',E'_{l}) Then, the generated subgraph G is respectively constructed by adopting a breadthfirst search algorithm BFS_{l}BFS spanning tree with each node as root node for calculating and maintaining distance d between each node pair_{l}(u,v)。
Preferably, the approximate centrality calculating module is specifically configured to calculate a distance d between each pair of nodes according to the weighting function and the calculated distance d_{l}(u, v) generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' the approximate centrality of each node v is:wherein, α (d)_{l}(u, v)) represents a weighting function, C_{c}(v) Represents the approximate centrality of the node V, and u represents the current set of nodes V_{l}' the nodes remaining after node v are removed.
Preferably, the new generated subgraph construction module comprises:
a newly generated subgraph construction submodule for constructing from the current remaining edge set E'_{l}Will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}(ii) a Then passes through a residual node set V 'of the next stage'_{l+1}And the remaining edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}；
A second node pair distance calculation module for calculating the distance between the new generation subgraph G_{l+1}When the map is a weighted map, recalculating residual node set V 'of the next stage by adopting Dijkstra algorithm'_{l+1}The distance between each pair of nodes in the set;
a third nodetodistance computation module for generating subgraph G at the new generation subgraph G_{l+1}And if the node is an unweighted graph, maintaining the residual node set V 'of the next level in a mode of updating the increment of the BFS spanning tree structure calculated by the distance calculation module by the first node'_{l+1}The distance between each pair of nodes in the array.
Preferably, the third nodetodistance calculation module includes:
a first submodule for finding out a set V 'of nodes remaining with the next stage'_{l+1}Any node v in_{0}Setting a root node of a target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
a second submodule, configured to insert a subnode x of the node pointed by the pointer a into the tail of the queue to be modified, if the inserted subnode x belongs to V_{l} ^{*}Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a pointer b to point to a head node s of the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the connection between the node pointed by the pointer a and the child node x;
a third submodule for searching in the generated subgraph G_{l}Whether an edge exists between the node s pointed by the pointer b and any node t in the collapse linked list to connect the node s and the node t;
a fourth submodule, configured to create a pointer pointing to a node t at a node s when an edge exists between the node s pointed to by the pointer b and any node t in the collapsing chain table, and then delete the node t from the collapsing chain table;
the fifth submodule is used for judging whether the collapsing link table after the deleting operation is empty or not;
a sixth submodule, configured to, when there is no edge connecting node s to a node t in the collapsing chain table between the node s pointed to by the pointer b and any node t in the collapsing chain table, or when the collapsing chain table after deletion is not empty, determine whether there is a child node in the node pointed to by the pointer b, and if there is a child node in the node pointed to by the pointer b, insert the child node of the node pointed to by the pointer b into the tail of the anchor point queue;
the seventh submodule is used for judging whether a node pointed by the pointer b has a successor node or not;
the eighth submodule is used for pointing the pointer b to a successor node of the current pointed node when the successor node exists in the pointed node of the pointer b, and returning to execute the third submodule;
a ninth submodule, configured to determine whether a child node exists in a node pointed by the pointer a when the collapsing link table after the deletion operation is empty or when a successor node does not exist in the node pointed by the pointer b, and if the child node exists, insert the child node of the node pointed by the pointer a into the tail of the queue to be modified;
the tenth submodule is used for judging whether the node pointed by the pointer a has a successor node or not;
the eleventh submodule is used for pointing the pointer a to a successor node of the current pointed node when the successor node exists in the node pointed by the pointer a, and returning to execute the second submodule;
a twelfth submodule, configured to, when there is no successor node in the node pointed to by the pointer a, complete incremental update of the target BFS spanning tree, and complete set V 'of the remaining nodes at the next stage'_{l+1}Middle division v_{0}Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete the new generated subgraph G_{l+1}The distance between each node pair in (1) is updated.
In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
(1) the method for determining the step approach centrality is mainly used for describing and analyzing local centrality characteristics of largescale data graphs. When the steptype approach centrality is calculated, the interference caused by the nodes with larger approach centrality is reduced by deleting the nodes with the largest approach centrality in the current subgraph every time, and the steptype approach centrality with better locality is obtained, so that better hierarchical analysis is facilitated for the local condition of the data graph.
(2) In the calculation of the weightless graph, the existing calculation result is effectively utilized in an incremental updating mode, the dynamic updating of the shortest path between the nodes is realized, the high cost of complete recalculation is avoided, and the calculation of the step approach centrality provided by the invention can be realized more efficiently.
Drawings
Fig. 1 is a schematic flowchart of a method for determining a step approach centrality according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a method for incrementally updating a distance between node pairs according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The invention provides a method for determining the step approach centrality, which mainly comprises the following steps: firstly, establishing a data graph by taking data units of a database as nodes, taking association among the data units as edges and taking the strength of the association among the data units as weight; initializing a residual node set into all nodes in the current data graph, and selecting a proper weighting function; thirdly, calculating the distances between all node pairs in the residual node set, and calculating the approaching centrality of all nodes according to the distances; selecting a node set with the maximum approach centrality from the residual node sets, wherein the step approach centrality of the selected nodes is defined as the approach centrality calculated in the current subgraph, deleting the nodes from the residual node sets, and deleting the edges associated with the nodes from the data graph to generate a new subgraph; judging whether the residual node set is empty, if not, repeating the third to fifth steps until the residual node set is empty, and calculating all the points to obtain the step approach centrality of the point. According to the invention, the node with the maximum approaching centrality is deleted, and the approaching centrality of the node with the maximum approaching centrality calculated on the obtained subgraph is defined as the step approaching centrality of the node; the step approach centrality of all the nodes is obtained by repeating the above process, so that the centrality of all the nodes is localized, the antiinterference capability of the centrality calculated by each node is improved, and the characteristics of the nodes in a local range can be better represented.
The invention provides a step proximity center (Hierarchical Closeness center) index for largescale map data analysis, which comprises the following steps:
in graph G (V, E), the conventional approach centrality C of node V_{c}(v) Is defined as:
where α (d (u, V)) is the selected weighting function, and u is the other node in the node set V except for the node V.
In a subgraph G_{l}Degree of approaching traditional center C_{c}(v) The largest set of nodes is denoted V_{l} ^{*}Graph G_{l}Middle V_{l} ^{*}The set of associated edges is represented asDefining a series of generations in graph GSubfigure G_{l}The following were used:
defining a set V of nodes remaining simultaneously_{l}'：
Residual edge set E_{l}'：
Wherein A \ B represents the removal of a portion containing B from A.
When node u and node v exist in generation subgraph G at the same time_{l}When up, a subgraph distance d is defined_{l}(u, v) is node u and node v is in subgraph G_{l}Is measured. The step approach center degree C proposed in the present invention is defined_{H}(v) Comprises the following steps:
degree of approaching to the center of the traditionCompared with the step approach centrality index provided by the invention, the step approach centrality index has better locality and antiinterference capability.
Fig. 1 is a schematic flow chart of a method for determining a step approach center degree according to an embodiment of the present invention, where the method shown in fig. 1 includes:
(1) establishing a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relation among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
(2) initializing current subgraph number l as 0, and current residual node set V_{l}'V, current remainder set E'_{l}By V ═ E_{l}'and E'_{l}Construction of a Generation subgraph G_{l}(V_{l}',E'_{l}) Or G_{l}(V'_{l},E'_{l},W_{l}') wherein, W_{l}'represents E'_{l}A weight set of the middle edges;
(3) in generating subgraph G_{l}The current residual node set V is obtained by middle calculation_{l}' distance d between each node pair in_{l}(u,v)；
In an alternative embodiment, in step (3), if subgraph G is generated_{l}As an authorized graph G_{l}(V_{l}',E'_{l},W_{l}') calculating the current residual node set V by adopting Dijkstra algorithm_{l}' distance d between each node pair in_{l}(u, v) if subgraph G is generated_{l}Is a noright graph G_{l}(V_{l}',E'_{l}) Respectively constructing by using a BreadthFirstSearch algorithm (BFS) to generate a subgraph G_{l}BFS spanning tree with each node as root node for calculating and maintaining distance d between each node pair_{l}(u,v)。
(4) According to the weighting function and the distance d between each pair of nodes obtained by calculation_{l}(u, v) in generating subgraph G_{l}The current residual node set V is obtained by middle calculation_{l}' the approximate centrality of each node in the set;
in an alternative embodiment, the weighting function may be selected by:
e.g. in the harmonious centrality
α (d (u, v)) 2 in the center of exponential decay^{d(u,v)}And the like. And according to the sensitive condition of the distance change, selecting a corresponding weighting function appropriately, thereby obtaining ideal approximate centrality.
In an alternative embodimentIn the embodiment, in the step (4), the distance d between each pair of nodes is calculated according to the weighting function_{l}(u, v) in generating subgraph G_{l}The current residual node set V is obtained by middle calculation_{l}' the approximate centrality of each node v is:wherein, α (d)_{l}(u, v)) represents a weighting function, C_{c}(v) Represents the approximate centrality of the node V, and u represents the current set of nodes V_{l}' the nodes remaining after node v are removed.
(5) Finding out current residual node set V_{l}' node set V having the greatest degree of nearness in center_{l} ^{*}；
Wherein, V_{l} ^{*}The step approach centrality C of the node u included in (1)_{H}(u)＝C_{c}(u)。
(6) From the current set of nodes remaining V_{l}' will be contained in a node set V_{l} ^{*}Deleting the nodes in the node group to obtain a nextlevel residual node set V'_{l+1}；
(7) Judging the residual node set V of the next level'_{l+1}If the current state is not null, executing the step (8), otherwise, executing the step (9);
(8) from the current remaining edge set E'_{l}Will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}Then through the remaining node set V 'of the next stage'_{l+1}And residual edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}In the new generation subgraph G_{l+1}In the method, the residual node set V 'of the next stage is obtained through calculation'_{l+1}Distance d between each node pair_{l+1}(u, v), when the subgraph number l is l +1, and returning to execute the step (4), wherein,representing the generated subgraph G_{l}Middle V_{l} ^{*}The adjacent side of (2);
as shown in fig. 2, in an alternative embodiment, step (8) specifically includes:
(8.1) set E of edges left from the present_{l}' in will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}(ii) a Then passes through a residual node set V 'of the next stage'_{l+1}And residual edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}；
(8.2) if new generation subgraph G_{l+1}For the weighted graph, recalculating residual node set V 'of the next stage by Dijkstra algorithm'_{l+1}The distance between each pair of nodes in the set;
(8.3) if new generation subgraph G_{l+1}And (4) maintaining the residual node set V 'of the next level in a mode of updating the increment of the BFS spanning tree structure calculated in the step (3) as an unwarranted graph'_{l+1}The distance between each pair of nodes in the array.
In an alternative embodiment, step (8.3) specifically includes:
(8.3.1) finding a residual node set V 'of the next stage'_{l+1}Any node v in_{0}Setting a root node of the target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
(8.3.2) inserting the child node x of the node pointed by the pointer a into the tail of the queue to be modified, if the inserted child node x belongs to V_{l} ^{*}Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a head node s of which the pointer b points to the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the connection between the node pointed by the pointer a and the child node x;
(8.3.3) search in generating subgraph G_{l}Whether one node exists between the node s pointed to by the pointer b and any node t in the collapse chain table or notConnecting the node s and the node t by the edge, if the node s and the node t exist, executing the step (8.3.4), and if the node t does not exist, executing the step (8.3.6);
(8.3.4) creating a pointer to node t at node s, and then deleting node t from the collapsing list;
(8.3.5) judging whether the collapse linked list after the deletion operation is empty, if not, executing the step (8.3.6), and if so, executing the step (8.3.9);
(8.3.6) judging whether the node pointed by the pointer b has a child node, if so, inserting the child node of the node pointed by the pointer b into the tail of the anchor point queue;
(8.3.7) judging whether the node pointed by the pointer b has a subsequent node, if so, executing the step (8.3.8), and if not, executing the step (8.3.9);
(8.3.8) pointing the pointer b to the node subsequent to the node pointed to currently, and returning to execute the step (8.3.3);
(8.3.9) judging whether the node pointed by the pointer a has a child node, if so, inserting the child node pointed by the pointer a into the tail of the queue to be modified;
(8.3.10) judging whether the node pointed by the pointer a has a subsequent node, if so, executing the step (8.3.11), and if not, executing the step (8.3.12);
(8.3.11) pointing the pointer a to the node succeeding the node pointed to currently, and returning to execute the step (8.3.2);
(8.3.12) completing the incremental update to the target BFS spanning tree and completing the set of nodes remaining V 'for the next level'_{l+1}Middle division v_{0}Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete new generation subgraph G_{l+1}The distance between each node pair in (1) is updated.
(9) And obtaining the step approach centrality of all the nodes in the data graph G.
The invention also provides a system for determining the step approach centrality, which comprises:
the data graph building module is used for building a data graph G (V, E) or G (V, E, W) by taking all data units in the target database as nodes, taking the association relationship among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
a generated subgraph construction module for initializing the current subgraph number l equal to 0 and the current residual node set V_{l}'V, current remainder set E'_{l}By V ═ E_{l}'and E'_{l}Construction of a Generation subgraph G_{l}(V_{l}',E'_{l}) Or G_{l}(V_{l}',E'_{l},W_{l}') wherein, W_{l}'represents E'_{l}A weight set of the middle edges;
a first node pair distance calculation module for generating subgraph G_{l}The current residual node set V is obtained by middle calculation_{l}' distance d between each node pair in_{l}(u,v)；
A proximity center calculation module for calculating the distance d between each pair of nodes according to the weighting function_{l}(u, v) in generating subgraph G_{l}The current residual node set V is obtained by middle calculation_{l}' the approximate centrality of each node in the set;
a searching module for finding out the current residual node set V_{l}' node set V having the greatest degree of nearness in center_{l} ^{*}；
A node set determination module for determining a node set V from a current node set V remaining_{l}' will be included in the set of nodes V_{l} ^{*}Deleting the nodes in the node group to obtain a nextlevel residual node set V'_{l+1}；
A judging module for judging the residual node set V of the next level'_{l+1}If the node set is empty, if the node set V is the residual node set V of the next stage'_{l+1}If the data graph G is empty, the step approach centrality of all the nodes in the data graph G is obtained;
a newly generated subgraph construction module for the remaining node set V 'at the next stage'_{l+1}Not empty, from the current remaining edge set E_{l}' in will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}Then through the remaining node set V 'of the next stage'_{l+1}And residual edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}In the new generation subgraph G_{l+1}In the method, the residual node set V 'of the next stage is obtained through calculation'_{l+1}Distance d between each node pair_{l+1}(u, v) when the subgraph number l is l +1, and returning to the approximate centrality calculation module, wherein,representing the generated subgraph G_{l}Middle V_{l} ^{*}The adjacent side of (2).
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A method for determining a step approach centrality, comprising:
(1) establishing a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relation among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
(2) initializing current subgraph number l as 0, and current residual node set V_{l}'V, current remainder set E'_{l}By V ═ E_{l}'and E'_{l}Construction of a Generation subgraph G_{l}(V_{l}',E'_{l}) Or G_{l}(V_{l}',E'_{l},W_{l}') wherein, W_{l}' represents E_{l}' weight set of the middle edge;
(3) generating subgraph G_{l}The current remaining node is obtained by calculationCollection V_{l}' distance d between each node pair in_{l}(u, v); v represents the current set of nodes remaining V_{l}' Each node in u represents the current set of nodes remaining V_{l}' the nodes remaining after node v is removed;
(4) according to the weighting function and the distance d between each pair of nodes obtained by calculation_{l}(u, v) generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' the approximate centrality of each node in the set;
(5) finding out the current residual node set V_{l}' node set V having the greatest degree of nearness in center_{l} ^{*}；
(6) From the current set of nodes remaining V_{l}' will be included in the set of nodes V_{l} ^{*}Deleting the nodes in the node group to obtain a nextlevel residual node set V'_{l+1}；
(7) Judging the residual node set V 'of the next level'_{l+1}If the current state is not null, executing the step (8), otherwise, executing the step (9);
(8) from the current remaining edge set E'_{l}Will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}Then through the remaining node set V 'of the next stage'_{l+1}And the remaining edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}Generating subgraph G at said new generator_{l+1}In the method, the residual node set V 'of the next stage is obtained through calculation'_{l+1}Distance d between each node pair_{l+1}(u, v), when the subgraph number l is l +1, and returning to execute the step (4), wherein,representing the generated subgraph G_{l}Middle V_{l} ^{*}The adjacent side of (2);
(9) and obtaining the step approach centrality of all the nodes in the data graph G.
2. The method of claim 1, wherein in step (3), if the generation subgraph G is_{l}As an authorized graph G_{l}(V_{l}',E'_{l},W_{l}') calculating the current residual node set V by adopting Dijkstra algorithm_{l}' distance d between each node pair in_{l}(u, v) if said generating subgraph G_{l}Is a noright graph G_{l}(V_{l}',E'_{l}) Respectively constructing the generated subgraph G by adopting a breadthfirst search algorithm BFS_{l}BFS spanning tree with each node as root node for calculating and maintaining distance d between each node pair_{l}(u,v)。
3. The method of claim 1, wherein in step (4), the distance d between each pair of nodes is calculated based on the weighting function and_{l}(u, v) generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' the approximate centrality of each node v is:wherein, α (d)_{l}(u, v)) represents a weighting function, C_{c}(v) Represents the approximate centrality of the node V, and u represents the current set of nodes V_{l}' the nodes remaining after node v are removed.
4. The method according to any one of claims 1 to 3, characterized in that step (8) comprises in particular:
(8.1) from the current remaining edge set E'_{l}Will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}(ii) a Then passes through a residual node set V 'of the next stage'_{l+1}And the remaining edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}；
(8.2) if said new generation subgraph G_{l+1}And recalculating the residual node set V 'of the next stage by adopting Dijkstra algorithm as a weighted graph'_{l+1}The distance between each pair of nodes in the set;
(8.3) if said new generation subgraph G_{l+1}And (4) maintaining the residual node set V 'of the next stage in a mode of incremental updating of the BFS spanning tree structure calculated in the step (3) as an unauthorized graph'_{l+1}The distance between each pair of nodes in the array.
5. The method according to claim 4, characterized in that step (8.3) comprises in particular:
(8.3.1) finding out the residual node set V 'of the next stage'_{l+1}Any node v in_{0}Setting a root node of a target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
(8.3.2) inserting the child node x of the node pointed by the pointer a into the tail of the queue to be modified, if the inserted child node x belongs to V_{l} ^{*}Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a pointer b to point to a head node s of the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the connection between the node pointed by the pointer a and the child node x;
(8.3.3) lookup on said generated subgraph G_{l}If yes, executing the step (8.3.4), and if not, executing the step (8.3.6);
(8.3.4) creating a pointer to node t at node s, and then deleting node t from the collapse linked list;
(8.3.5) judging whether the collapse linked list after the deletion operation is empty, if not, executing the step (8.3.6), and if so, executing the step (8.3.9);
(8.3.6) judging whether a child node exists in the node pointed by the pointer b, if so, inserting the child node of the node pointed by the pointer b into the tail of the anchor point queue;
(8.3.7) judging whether the node pointed by the pointer b has a subsequent node, if so, executing the step (8.3.8), and if not, executing the step (8.3.9);
(8.3.8) pointing the pointer b to the node subsequent to the node pointed to currently, and returning to execute the step (8.3.3);
(8.3.9) judging whether the node pointed by the pointer a has a child node, if so, inserting the child node of the node pointed by the pointer a into the tail of the queue to be modified;
(8.3.10) judging whether the node pointed by the pointer a has a subsequent node, if so, executing the step (8.3.11), and if not, executing the step (8.3.12);
(8.3.11) pointing the pointer a to the node succeeding the node pointed to currently, and returning to execute the step (8.3.2);
(8.3.12) completing the incremental update to the target BFS spanning tree and completing for the set of nodes remaining at the next level V'_{l+1}Middle division v_{0}Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete the new generated subgraph G_{l+1}The distance between each node pair in (1) is updated.
6. A step proximity centrality determination system, comprising:
the data graph building module is used for building a data graph G (V, E) or G (V, E, W) by taking all data units in a target database as nodes, taking the association relations among the data units as edges and taking the strength of the association among the data units as a weight, wherein an edge set E represents a set of all edges in the data graph, a point set V represents a set of all nodes in the data graph, and a weight set W represents the weight of each edge in the edge set E;
a generated subgraph construction module for initializing the current subgraph number l equal to 0 and the current residual node set V_{l}'V, current remainder set E'_{l}By V ═ E_{l}'and E'_{l}Construction of a Generation subgraph G_{l}(V_{l}',E'_{l}) Or G_{l}(V_{l}',E'_{l},W_{l}') wherein, W_{l}'represents E'_{l}A weight set of the middle edges;
a first node pair distance calculation module for generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' distance d between each node pair in_{l}(u, v); v represents the current set of nodes remaining V_{l}' Each node in u represents the current set of nodes remaining V_{l}' the nodes remaining after node v is removed;
a proximity center calculation module for calculating the distance d between each pair of nodes according to the weighting function_{l}(u, v) generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' the approximate centrality of each node in the set;
a searching module for finding out the current node set V_{l}' node set V having the greatest degree of nearness in center_{l} ^{*}；
A node set determination module for determining the current node set V from the current node set V_{l}' will be included in the set of nodes V_{l} ^{*}Deleting the nodes in the node group to obtain a nextlevel residual node set V'_{l+1}；
A judging module, configured to judge the remaining node set V of the next stage'_{l+1}If the node set is empty, if the node set is the residual node set V of the next level'_{l+1}If the data graph G is empty, the step approach centrality of all the nodes in the data graph G is obtained;
a newly generated subgraph construction module for set V 'of remaining nodes at the next stage'_{l+1}Not empty, from the current remaining edge set E'_{l}Will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}Then through the remaining node set V 'of the next stage'_{l+1}And the remaining edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}Generating subgraph G at said new generator_{l+1}In, calculateSet of nodes remaining to the next stage V'_{l+1}Distance d between each node pair_{l+1}(u, v) when the subgraph number l is l +1, and returning to the approximate centrality calculation module, wherein,representing the generated subgraph G_{l}Middle V_{l} ^{*}The adjacent side of (2).
7. System according to claim 6, characterized in that said node pair distance calculation module, in particular for generating subgraph G, is adapted to calculate said distance between said node pairs_{l}As an authorized graph G_{l}(V_{l}',E'_{l},W_{l}') calculating the current set of nodes remaining V by using Dijkstra algorithm_{l}' distance d between each node pair in_{l}(u, v) generating subgraph G_{l}Is a noright graph G_{l}(V_{l}',E'_{l}) Then, the generated subgraph G is respectively constructed by adopting a breadthfirst search algorithm BFS_{l}BFS spanning tree with each node as root node for calculating and maintaining distance d between each node pair_{l}(u,v)。
8. The system according to claim 6, wherein the approximate centrality calculation module is specifically configured to calculate the distance d between each pair of nodes according to the weighting function_{l}(u, v) generating subgraph G_{l}The current residual node set V is obtained by calculation_{l}' the approximate centrality of each node v is:wherein, α (d)_{l}(u, v)) represents a weighting function, C_{c}(v) Represents the approximate centrality of the node V, and u represents the current set of nodes V_{l}' the nodes remaining after node v are removed.
9. The system of any of claims 6 to 8, wherein the new generation subgraph construction module comprises:
a newly generated subgraph construction submodule for constructing from the current remaining edge set E'_{l}Will be included in the edge setDeleting the edges to obtain a remaining edge set E 'of the next level'_{l+1}(ii) a Then passes through a residual node set V 'of the next stage'_{l+1}And the remaining edge set E 'of the next stage'_{l+1}Construction of a novel generative subgraph G_{l+1}；
A second node pair distance calculation module for calculating the distance between the new generation subgraph G_{l+1}When the map is a weighted map, recalculating residual node set V 'of the next stage by adopting Dijkstra algorithm'_{l+1}The distance between each pair of nodes in the set;
a third nodetodistance computation module for generating subgraph G at the new generation subgraph G_{l+1}And if the node is an unweighted graph, maintaining the residual node set V 'of the next level in a mode of updating the increment of the BFS spanning tree structure calculated by the distance calculation module by the first node'_{l+1}The distance between each pair of nodes in the array.
10. The system of claim 9, wherein the third nodetodistance calculation module comprises:
a first submodule for finding out a set V 'of nodes remaining with the next stage'_{l+1}Any node v in_{0}Setting a root node of a target BFS spanning tree as a node r for the target BFS spanning tree of the root node, adding the node r into a queue to be modified, and creating a pointer a to point to a head node of the queue to be modified;
a second submodule, configured to insert a subnode x of the node pointed by the pointer a into the tail of the queue to be modified, if the inserted subnode x belongs to V_{l} ^{*}Adding all brother nodes of the child node x in the target BFS spanning tree structure into an anchor point queue, creating a pointer b to point to a head node s of the anchor point queue, simultaneously adding all child nodes of the child node x into a collapse linked list, and deleting the pointera connection of the pointedto node to child node x;
a third submodule for searching in the generated subgraph G_{l}Whether an edge exists between the node s pointed by the pointer b and any node t in the collapse linked list to connect the node s and the node t;
a fourth submodule, configured to create a pointer pointing to a node t at a node s when an edge exists between the node s pointed to by the pointer b and any node t in the collapsing chain table, and then delete the node t from the collapsing chain table;
the fifth submodule is used for judging whether the collapsing link table after the deleting operation is empty or not;
a sixth submodule, configured to, when there is no edge connecting node s to a node t in the collapsing chain table between the node s pointed to by the pointer b and any node t in the collapsing chain table, or when the collapsing chain table after deletion is not empty, determine whether there is a child node in the node pointed to by the pointer b, and if there is a child node in the node pointed to by the pointer b, insert the child node of the node pointed to by the pointer b into the tail of the anchor point queue;
the seventh submodule is used for judging whether a node pointed by the pointer b has a successor node or not;
the eighth submodule is used for pointing the pointer b to a successor node of the current pointed node when the successor node exists in the pointed node of the pointer b, and returning to execute the third submodule;
a ninth submodule, configured to determine whether a child node exists in a node pointed by the pointer a when the collapsing link table after the deletion operation is empty or when a successor node does not exist in the node pointed by the pointer b, and if the child node exists, insert the child node of the node pointed by the pointer a into the tail of the queue to be modified;
the tenth submodule is used for judging whether the node pointed by the pointer a has a successor node or not;
the eleventh submodule is used for pointing the pointer a to a successor node of the current pointed node when the successor node exists in the node pointed by the pointer a, and returning to execute the second submodule;
a twelfth submodule for pointing at the pointer aWhen the node has no successor node, finishing the incremental updating of the target BFS spanning tree and finishing the residual node set V 'of the next level'_{l+1}Middle division v_{0}Incremental updating of BFS spanning tree with other nodes outside as root nodes to complete the new generated subgraph G_{l+1}The distance between each node pair in (1) is updated.
Priority Applications (1)
Application Number  Priority Date  Filing Date  Title 

CN201711349361.7A CN108052743B (en)  20171215  20171215  Method and system for determining step approach centrality 
Applications Claiming Priority (1)
Application Number  Priority Date  Filing Date  Title 

CN201711349361.7A CN108052743B (en)  20171215  20171215  Method and system for determining step approach centrality 
Publications (2)
Publication Number  Publication Date 

CN108052743A CN108052743A (en)  20180518 
CN108052743B true CN108052743B (en)  20210105 
Family
ID=62133256
Family Applications (1)
Application Number  Title  Priority Date  Filing Date 

CN201711349361.7A Active CN108052743B (en)  20171215  20171215  Method and system for determining step approach centrality 
Country Status (1)
Country  Link 

CN (1)  CN108052743B (en) 
Families Citing this family (1)
Publication number  Priority date  Publication date  Assignee  Title 

CN112989079B (en) *  20210422  20210803  北京电信易通信息技术股份有限公司  Novel image data retrieval method and system 
Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

CN101799814A (en) *  20091231  20100811  茂名学院  Method for gathering free classification label into reticular classification structure 
CN102196502A (en) *  20110406  20110921  东南大学  Congestion control method for wireless sensor network 
CN103970856A (en) *  20140507  20140806  中国人民大学  Shortest path estimation method on authorized and directed dynamic network 
Family Cites Families (2)
Publication number  Priority date  Publication date  Assignee  Title 

US20060112146A1 (en) *  20041122  20060525  Nec Laboratories America, Inc.  Systems and methods for data analysis and/or knowledge management 
US9256706B2 (en) *  20130903  20160209  Synopsys Taiwan Co., LTD.  Knowledgebased analog layout generator 

2017
 20171215 CN CN201711349361.7A patent/CN108052743B/en active Active
Patent Citations (3)
Publication number  Priority date  Publication date  Assignee  Title 

CN101799814A (en) *  20091231  20100811  茂名学院  Method for gathering free classification label into reticular classification structure 
CN102196502A (en) *  20110406  20110921  东南大学  Congestion control method for wireless sensor network 
CN103970856A (en) *  20140507  20140806  中国人民大学  Shortest path estimation method on authorized and directed dynamic network 
NonPatent Citations (3)
Title 

Parallel Algorithm for Core Maintenance in Dynamic Graphs;Na Wang 等;《2017 IEEE 37th International Conference on Distributed Computing Systems》;20170717;第23662371页 * 
大规模图中低复杂度分布式算法浅析;华强胜 等;《南京信息工程大学学报（自然科学版）》;20171031(第5期);第533543页 * 
社会网络分析在专利引用中的实证研究;侯筱蓉 等;《情报科学》;20130228;第31卷(第2期);第6371页 * 
Also Published As
Publication number  Publication date 

CN108052743A (en)  20180518 
Similar Documents
Publication  Publication Date  Title 

Song et al.  Efficient routing on large road networks using hierarchical communities  
US20040088307A1 (en)  Data structure for information systems  
CN102176223B (en)  Protein complex identification method based on key protein and local adaptation  
Xie et al.  Typebased exploration with multiple search queues for satisficing planning  
CN106462620A (en)  Distance queries on massive networks  
CN105183796A (en)  Distributed link prediction method based on clustering  
Li et al.  Timedependent hop labeling on road network  
CN108052743B (en)  Method and system for determining step approach centrality  
Zhu et al.  Transportation routing map abstraction approach: Algorithm and numerical analysis  
Li et al.  Fastest path query answering using timedependent hoplabeling in road network  
CN105138601B (en)  A kind of graphic mode matching method for supporting fuzzy constraint relationship  
CN109992786A (en)  A kind of semantic sensitive RDF knowledge mapping approximate enquiring method  
CN104899283A (en)  Frequent subgraph mining and optimizing method for single uncertain graph  
Li et al.  Network voronoi diagram on uncertain objects for nearest neighbor queries  
CN107121146A (en)  Optimum path planning method based on road chain depth  
Matsyi  Approaches to solving basic problems of closed routes  
CN108413980B (en)  Traffic itinerant path planning method for reducing path branches  
CN107564289B (en)  Road network preprocessing method for merging traffic nodes  
CN110851616A (en)  RDF knowledge graph storage and management method based on domain subgraphs  
CN104899885B (en)  A kind of Frequent tree mining for single uncertain figure excavates and optimization method  
Baidoo et al.  Solving the TSP using traditional computing approach  
Jackovich et al.  Comparing greedy constructive heuristic subtour elimination methods for the traveling salesman problem  
Li et al.  Frequent Subtree Mining Algorithm for Ribonucleic Acid Topological Pattern.  
Qian et al.  A Shortest Path Algorithm Under Specified Nodes Constraint  
CN109614520B (en)  Parallel acceleration method for multipattern graph matching 
Legal Events
Date  Code  Title  Description 

PB01  Publication  
PB01  Publication  
SE01  Entry into force of request for substantive examination  
SE01  Entry into force of request for substantive examination  
GR01  Patent grant  
GR01  Patent grant 