Unit Iv
Unit Iv
Science
UNIT-IV
Social-Network
A social network is an online communication platform that is used for creating
relationships with other people who share an interest or a relationship.
An edge (U1, T2) means that user 1 had placed the tag 2 on at
least one page.
Social Network Analysis
● Social network analysis is the study of social entities (people in an organization,
called actors), and their interactions and relationships. The interactions and
relationships can be represented with a network or graph, where each vertex
(or node) represents an actor and each link represents a relationship.
● Two types of Social Network Analysis
1. Centrality
2. Prestige
● Both centrality and prestige are measures of degree of prominence of an actor
in a social network.
Centrality
•In the context of an organization, a person with extensive contacts (links) or communications with
many other people in the organization is considered more important than a person with relatively
fewer contacts. The links can also be called ties.
•A central actor is one involved in many ties.
•Each node in the social network is an actor and each link indicates that the actors on the two ends
of the link communicate with each other.
•In the below network, actor i is the most central actor because he/she can communicate with most
other actors.
Types of Centrality
•There are different types of links or involvements between actors. Thus, several
types of centrality are defined on undirected and directed graphs.
•Three popular types:
1.Degree Centrality
2.Closeness Centrality
3.Betweenness Centrality
Degree Centrality
● Central actors are the most active actors that have most links or ties with other actors.
● Let the total number of actors in the network be n.
● Undirected Graph: In an undirected graph, the degree centrality of an actor i (denoted by
CD(i)) is simply the node degree (the number of edges) of the actor node, denoted by d(i),
normalized with the maximum degree, n-1.
● The value of this measure ranges between 0 and 1 as n-1 is the maximum value of d(i).
● Directed Graph: In this case, we need to distinguish in-links of actor i (links pointing to i),
and out-links (links pointing out from i). The degree centrality is defined based on only the
out-degree (the number of out-links or edges), do(i).
Closeness Centrality
● This view of centrality is based on the closeness or distance.
● The basic idea is that an actor x is central if it can easily interact with all other actors. That
is, its distance to all other actors is short.
● Use the shortest distance to compute this measure. Let the shortest distance from actor i
to actor j be d(i, j) (measured as the number of links in a shortest path).
● Undirected Graph:The closeness centrality CC(i) of actor i is defined as
● The value of this measure also ranges between 0 and 1 as n-1 is the minimum value of the
denominator, which is the sum of the shortest distances from i to all other actors.
● Directed Graph: The same equation can be used for a directed graph. The distance
computation needs to consider directions of links or edges
Betweenness Centrality
● Betweenness centrality quantifies the number of times a node acts as a bridge along the
shortest path between two other nodes.
● It was introduced as a measure for quantifying the control of a human on the communication
between other humans in a social network.
● If two non-adjacent actors j and k want to interact and actor i is on the path between j and k,
then i may have some control over their interactions.
● Betweenness measures the control of i over other pairs of actors. Thus, if i is on the paths of
many such interactions, then i is an important actor.
● Undirected Graph: Betweenness is given by the formula
where pjk be the number of shortest paths between actors j and k, and pjk(i), (j # i and k # i) is the
number of shortest paths that pass i (denoted by pjk(i), (j # i and k # i).
Betweenness Centrality
● There may be multiple shortest paths between actor j and actor k. Some pass i
and some do not. CB(i) has a minimum of 0, attained when i falls on no shortest
path. Its maximum is (n-1)(n-2)/2, which is the number of pairs of actors not
including i.
● If we are to ensure that the value range is between 0 and 1, we can normalize it
with (n-1)(n-2)/2, which is the maximum value of CB(i).
● The betweenness may be normalized by dividing through the number of pairs of
vertices not including v, which for directed graphs is (n − 1)(n − 2) and for
undirected graphs is (n − 1)(n − 2)/2.
Example
Example
Degree centrality of a node refers to the
number of edges attached to the node.
In order to know the standardized
score, you need to divide each score by
n-1 (n = the number of nodes).
Example
The closeness centrality C (i) of actor i is defined as
C
The closeness centrality of node 1 is 6/ (sum of shortest distance from node 1 to all the remaining
nodes)
Shortest distance from node 1 to 2 is 2
Shortest distance from node 1 to 3 is 1
Shortest distance from node 1 to 4 is 2
Shortest distance from node 1 to 5 is 3
Shortest distance from node 1 to 6 is 4
Shortest distance from node 1 to 7 is 4
Hence, closeness centrality of node 1 is 6/(2+1+2+3+4+4)=6/16=3/8
Example
Closeness centrality of all the other nodes
Example
● Betweenness centrality of node 3 is
● (1/1)+(1/1)+(1/1)+(1/1)+(1/1)+(1/1)+(1/1)+(1/1)+(1/1)+(0/1)+(0/1)+(0/1)+(0/1)+(0/1
)+(0/1)= 9
● Normalized score=9/ ((n-1)(n-2)/2) = 9/ ((6)(5)/2) = 9/15.
● For every pair of vertices (x,y), not including 3, find the number of shortest paths
between (x,y) that pass through node 3 and divide it by the total number of shortest
paths between (x,y).
● For instance, consider vertices (1,2). The number of shortest paths between (1,2) that
pass through 3 is 1 and the total number of shortest paths between (1,2) is also 1.
Hence 1/1 is the value.
● Repeat the same procedure for the remaining vertices (1,5), (1,6), (1,7), (2,4), (2,5),
(2,6), (2,7), (4,5), (4,6), (4,7), (5,6), (5,7) and (6,7). Find the sum of all the values which is
9. Normalize by dividing 9 with 15. This gives the betweenness of node 3.
Clustering of Social-Network Graphs
● A social network contains communities of entities that are connected by the
edges.
● These correspond to groups of friends at school or groups of researchers
interested in the same area.
● Inorder to identify the communities, clustering of graphs is to be considered.
● Assume that the nodes are close if they have an edge between them and distant
if not.
● d(x,y)=0 if there is an edge (x,y), otherwise d(x,y)=1.
Clustering of Social-Network Graphs
Betweenness:
Betweenness of an edge (a, b) is defined as the number of pairs of nodes x and y such
that the edge (a, b) lies on the shortest path between x and y.
Computes the number of shortest paths from node X to each of the other nodes that
go through each of the edges.
Step-1: Perform a breadth-first search (BFS) of the graph, starting at the node X.
Clustering of Social-Network Graphs
This calculation is done for both nodes and edges from bottom.
● First, A and C, being leaves, get credit 1. Each of these nodes have only
one parent, so their credit is given to the edges (B, A) and (B, C),
respectively.
● At level 2, G is a leaf, so it gets credit 1. B is not a leaf, so it gets credit
equal to 1 plus the credits on the DAG edges entering it from below.
Since both these edges have credit 1, the credit of B is 3.
● B has only one parent, D, so the edge (D, B) gets the entire credit of B,
which is 3.
● G has two parents, D and F . We therefore need to divide the credit of 1
that G has between the edges (D, G) and (F, G).
● After Step-2, D and F have label 1, representing the fact that there is
one shortest path from E to each of these nodes.
● Give half the credit of G to each of these edges; i.e., their credit is each
1/(1 + 1) = 0.5.
● If D and F had the labels 5 and 3, then the credit of edge (D, G) would
have been 5/8 and the credit of edge (F, G) would have been 3/8.
Clustering of Social-Network Graphs
● D gets 1 plus the credits of the edges entering it from below, which are 3 and
0.5. That is, the credit of D is 4.5. The credit of F is 1 plus the credit of the edge
(F, G) i.e. 1.5.
● Finally, the edges (E, D) and (E, F ) receive the credit of D and F , respectively,
since each of these nodes has only one parent.
● To complete the betweenness calculation, we have to repeat this calculation
for every node as the root and sum the contributions.
● Finally, we must divide by 2 to get the true betweenness, since every shortest
path will be discovered twice, once for each of its endpoints.
Using Betweenness to find Communities
● Start with the graph and all its edges; then remove edges with the highest betweenness until the graph
has broken into a suitable number of connected components.
● Betweenness scores of each edge is shown in the below graph.
● Edge (B, D) has the highest betweenness, so it is removed first.
● That leaves two communities, namely: {A, B, C} and {D, E, F, G}.
● Next to leave are (A, B) and (B, C) with a score of 5, followed by (D, E) and (D, G) with a score of 4.5.
Then, (D, F ), whose score is 4, would leave the graph.
Direct discovery of communities
In Graph clustering, it is not possible to place an individual in more than one community.
Cliques:
A Clique is a set of nodes in a graph such that every two nodes in Clique are adjacent.
A complete bipartite graph consists of s nodes on one side and t nodes on the other side,
with all st possible edges between the nodes of one side and the other present. We denote
this graph by Ks,t.
Direct discovery of communities
Finding Complete Bipartite Subgraphs
● Given a large bipartite graph G we want to find instances of Ks,t within it.
● It is possible to view the problem of finding instances of Ks,t within G as one of finding
frequent itemsets.
● Let the “items” be the nodes on one side of G, which we shall call the left side.
● The “baskets” correspond to the nodes on the other side of G (the right side).
● Let the support threshold be s, the number of nodes that the instance of Ks,t has on
the right side.
● State the problem of finding instances of Ks,t as that of finding frequent itemsets F of
size t.
● If a set of t nodes on the left side is frequent, then they all occur together in at least s
baskets.
Direct discovery of communities
● Each basket corresponds to a node that is connected to all t of the nodes in F . Thus,
the frequent itemset of size t and s of the baskets in which all those items appear
form an instance of Ks,t.
● A proper definition of a “good” cut must balance the size of the cut itself against
the difference in the sizes of the sets that the cut creates.
● Suppose we partition the nodes of a graph into two disjoint sets S and T .
● Let Cut(S, T ) be the number of edges that connect a node in S to a node in T .
● Volume of a set S of nodes, denoted Vol (S), is the number of edges with at least
one end in S.
● Then the normalized cut value for S and T is
Partitioning of Graphs
If we choose S = {H} and T = {A, B, C, D, E,
F, G},
then Cut(S, T ) = 1. Vol(S) = 1, and
Vol(T)=11.
Thus, the normalized cut for this
partition is 1/1 + 1/11 = 1.09.
1. Adjacency matrix
2. Degree matrix
3. Laplacian matrix
1. Adjacency matrix:
It has 1 in the row i and column j if there is an edge between nodes i and j and 0
otherwise.
Matrices that describe graphs
2. Degree matrix
● The entry for row and column i is the degree of the ith node.
● This graph has nonzero entries only on the diagonal.
● For instance, the entry in row 4 and column 4 is 4 because node D has edges to
four other nodes
Matrices that describe graphs
3. Laplacian matrix
● The difference between the degree matrix and the adjacency matrix is called
Laplacian matrix.
● Suppose our graph has adjacency matrix A and degree matrix D. Our third matrix,
called the Laplacian matrix, is L = D − A.
● Sum of each row and each column results in 0.
● It is a symmetric matrix.
Eigenvalues of the Laplacian matrix
Example:
Partition the graph using eigenvalues of the laplacian matrix.
The Laplacian matrix for the given graph is
Proof:
● Let L be the Laplacian matrix for a graph of n nodes, and let 1 be the column vector of
all 1’s with length n.
● Consider row i of L. Its diagonal element has the degree d of node i. Row i also will
have d occurrences of −1, and all other elements of row i are 0.
● Multiplying row i by column vector 1 has the effect of summing the row, and this sum
is clearly d + (−1)d = 0.
● Thus, we can conclude that 0 is the eigenvalue and 1 is its corresponding
eigenvector.
Eigenvalues of the Laplacian matrix
● Let G be a connected graph on vertex set N={1,2,...,n} with adjacency matrix A.
Let L=D-A be the Laplacian matrix of G. Let v2 be an eigenvector corresponding
to the second smallest eigenvalue of L. The graph partition is
C1={i Ɛ N : v2(i)<0}
C2={i Ɛ N : v2(i)>0}