0% found this document useful (0 votes)

14 views

Big Scholarly Data

The document discusses social network analysis and its application to big scholarly data. It begins by defining key concepts in social network analysis like nodes, ties, and networks. It then discusses big scholarly data, providing examples of data sources and volumes. The document outlines methods for analyzing big data, highlighting social network analysis. It describes centrality measures used in social network analysis like degree, closeness, and betweenness. Finally, it discusses steps to analyze big scholarly data through social network analysis and potential applications.

Uploaded by

Yogendra Singh Librarian, SRHU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Big Scholarly Data

Uploaded by

Yogendra Singh Librarian, SRHU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Social Network Analysis and

Big Scholarly Data

Yogendra Singh
University Librarian
Swami Rama Himalayan University
Email - [email protected]
After this talk
• You should have a basic knowledge of Social Network Analysis

• You have an basic understanding of Big Scholarly Data

• You have an understanding of how Social Network Analysis can be applied to

Big Scholarly Data
Social Network
• Society is made of individuals (like wife and husband). They are known as
nodes/actors/vertices in Social Network parlance
• Individuals have relations
• Information flows between these relations
• These relations are known as ties/links/images in Social Networking
• Social network is a network (group) of individuals which have certain type of relations
for a particular information flow
Look at this picture
Now the big question is

• Who matters among this crowd? There could be different answers depending
upon different point of views

• The analysis of this type of social network using graph theory is called Social
Network Analysis

• Since Scholarly networks are the network of people (co-authors), it can well be
applied to large scholarly data or Big Scholarly Data
Big Scholarly Data (BSD)
•BSD refers to millions of scholarly records available today due to
tremendous changes in scholarly communication cycle

•BSD may include

• E-books, articles, reports, standards, patents etc., published by major commercial and not for
profit organizations - sciencedirect.com, tandfonline.com, doaj.org etc.
• Abstracting and Indexing databases- Scopus, Web of Science, EBSCO, Google Scholar
• Academic social networks- Academia, ResearchGate, Mendeley etc.
• Many other type of scholarly data
Three major scholarly data providers
Sl. No. Brand- Publisher Coverage No. of
name Records
1. Google Google Full Universe of Knowledge/ All 350+ million
Scholar Formats

2. Web of Clarivate Bibliographic Information including 90+ million

Science Analytics citations and other details including
abstract

3. Scopus Elsevier Bibliographic Information including 75+ million

Science citations and other details including
abstract
Big data analysis methods

•Statistical analysis
 Suitable for smaller datasets

•Scholarly text mining

 Can be used with big data

•Scholarly Network Analysis (or Social Network Analysis)

Scholarly Network Analysis/ Social
Network Analysis: Important measures
include Centralities

• Average path length

• Clustering coefficient

• Centralities
Average Path Length

• Average path length: Average distance of any two nodes in a

network is known as Average path length
Clustering Coefficient

• Average path length: Average distance of any two nodes in a

network is known as Average path length

• Clustering coefficient: is a measure of the degree to which nodes in

a graph tend to cluster together
Centrality Measures

• Average path length: Average distance of any two nodes in a network is

known as Average path length

• Clustering coefficient: is a measure of the degree to which nodes in a

graph tend to cluster together

• Centrality Measures: They measure how central (important) a node is in a

network
Network Centrality
 Which individuals (nodes) are important (Central)

 Measurement of importance is called Centrality in SNA

 Centrality may mean differently for different people and in different context
Why are Centrality and Centralization
Important?
• Access to information and ideas

• Interaction among members of the network

• Control the flow of information, resources, and other network content

• Visibility

• Ability to act together collectively

Multiple Ways to Calculate Centrality
• Degree

• Closeness

• Betweenness

• Eigenvector
Calculating Centrality
• Degree – Proportional to the number of other nodes to which a node is links –
Number of links divided by (n-1).
Calculating Centrality

• Degree – Proportional to the number of other nodes to which a node

is links – Number of links divided by (n-1).

• Closeness – The sum of geodesic distances (shortest paths) to all

other points in the graph. Divide by (n-1), then invert.
Calculating Centrality
• Degree – Proportional to the number of other nodes to which a node is
linked – Number of links divided by (n-1).

• Closeness – The sum of geodesic distances (shortest paths) to all other

points in the graph. Divide by (n-1), then invert.

• Betweenness – The extent to which a particular point lies ‘between’

other points in the graph; how many shortest paths (geodesics) is it on?
A measure of brokerage or gatekeeping.
Calculating Centrality
• Degree – Proportional to the number of other nodes to which a node is links – Number of
links divided by (n-1).

• Closeness – The sum of geodesic distances (shortest paths) to all other points in the graph.
Divide by (n-1), then invert.

• Betweenness – The extent to which a particular point lies ‘between’ other points in the graph;
how many shortest paths (geodesics) is it on? A measure of brokerage or gatekeeping.

• Eigenvector– A weighted measure of centrality that takes into account the centrality of other
nodes to which a node is connected. That is, being connect with other central nodes increases
centrality. E.g., secretary of powerful person. Google’s page rank algorithm is based on a
variation of this approach.
Network Analysis Tools Applied to BSD
Software/ Access Platform/ Language Description

CiteSpace/ Windows, IOS/ Visualizing and analyzing trends and patterns in scientific
Free Java
literature; knowledge domain visualization, best for WoS datasets

Gephi/ Windows/Linux/IOS Exploratory Data Analysis; Social Network Analysis; Link

Free Java
Analysis
iGraph/ Windows/IOS A collection of network analysis tools with the emphasis on
Free C/R/Python/Perl
efficiency, portability and ease of use

NetworkX / Windows/IOS Creation, manipulation, and investigation of the structures,

Free Python
dynamics,
and functions of complex networks
Pajek/ Windows/IOS Analysis and visualization of large networks having some
Free C/R
thousands or even millions of vertices
Types of Scholarly Networks could be
Generated by Applying SNA to BSD
• Co-Author Network
 Personal Network
 Organizational Network
 Geographic Network

• Co-Word Network
• Co-Citation Network
BSD Analysis Applications
•Scientific Impact Evaluation

 Article Impact

 Author Impact

 Journal Impact

 Institutional Impact
BSD Analysis Application - Academic
Recommendations

• Literature Recommendations

• Expert Recommendations

• Collaboration Recommendations

• Priority Recommendations
Scholarly Data Analysis: Steps
•Data Collection
 Download desired dataset from appropriate source

•Data cleaning
 Most difficult task as same name, institute, department is represented in different ways even by
the same individual

•Create graph using the data

•Use graph for further processing

Source and Software Needed
•An appropriate data source to download desired datasets
 I used Scopus to download research data of IIT Roorkee

•A software tool to clean data

 I used OpenRefine an open source software to clean the data, however, quite a bit was
done manually

•A software tool to create the graph

 An online tool Table2Net was used

•Process the graph for further obtaining necessary measures

 Gephi was used for this purpose
Steps in Analysing BSD through SNA
• Download a dataset

• Clean the data by some cleaning software such as OpenRefine and

Manually

• Create Graph File through some scientific network creating online tool
such as Table2Net or Scopus2Net

• Analyse that Graph file in Network Analysis software such as Gephi. You
can calculate all SNA measures using Gephi
Conclusion
• Application of Social Networking Tools to Big Scholarly Data is going to be big area of
interest to scientometricians as very large BSD is generated daily.
• These measures can be used to evaluate the authors, institutions, subject areas or
countries objectively.
• Special areas of interests, possible collaboration opportunities can be easily identified.
• As the impact of the publications can be easily identified, it will have great impact in
policy making.
• Librarians can also use SNA for analyzing in-house generated data such as circulation,
reference data, even footfall data.
THANK YOU

Johnson, Jeffrey C - Everett, Martin G - Borgatti, Stephen P - Analyzing Social Networks-Sage Publications LTD (2018)
100% (1)
Johnson, Jeffrey C - Everett, Martin G - Borgatti, Stephen P - Analyzing Social Networks-Sage Publications LTD (2018)
501 pages
Download Complete Social Network Analysis 4° Edition John Scott PDF for All Chapters
100% (9)
Download Complete Social Network Analysis 4° Edition John Scott PDF for All Chapters
67 pages
2007 Engine Introduction
100% (1)
2007 Engine Introduction
16 pages
Trackpad Pro Ver. 5.0 Class 8
From Everand
Trackpad Pro Ver. 5.0 Class 8
Nidhi Arora
No ratings yet
Oracle Inventory User Guide
No ratings yet
Oracle Inventory User Guide
105 pages
Ultrasonic Transmitter and Receiver
No ratings yet
Ultrasonic Transmitter and Receiver
2 pages
Lecture 12
No ratings yet
Lecture 12
31 pages
Module1: Introduction: Prof. Punitha K, VIT Chennai
No ratings yet
Module1: Introduction: Prof. Punitha K, VIT Chennai
50 pages
Group 7 - SNA GA2
No ratings yet
Group 7 - SNA GA2
14 pages
SMA Exp 6
No ratings yet
SMA Exp 6
23 pages
03 SNA Network Measures Advanced 2
No ratings yet
03 SNA Network Measures Advanced 2
37 pages
Introduction to SNA
No ratings yet
Introduction to SNA
39 pages
mod1.2
No ratings yet
mod1.2
21 pages
Lec 18-Graph Analytics
No ratings yet
Lec 18-Graph Analytics
100 pages
Introduction To Social Network Analysis
No ratings yet
Introduction To Social Network Analysis
38 pages
Introduction To Social Network Analysis
No ratings yet
Introduction To Social Network Analysis
28 pages
C2 - Social Network Measurement
No ratings yet
C2 - Social Network Measurement
42 pages
UNIT-2
No ratings yet
UNIT-2
28 pages
Social Network Analysis
No ratings yet
Social Network Analysis
50 pages
Session 15
No ratings yet
Session 15
44 pages
DSC 651 - Chapter 3 - Hierarchical
No ratings yet
DSC 651 - Chapter 3 - Hierarchical
26 pages
SocialNetworkAnalysis FullNote
No ratings yet
SocialNetworkAnalysis FullNote
10 pages
Session 1 and 2 Lecture Slides
No ratings yet
Session 1 and 2 Lecture Slides
96 pages
Social Network Analysis Literature Review
100% (3)
Social Network Analysis Literature Review
6 pages
I am sharing 'DSE ASSIGNMENT ADITI CHAUDHARY' with you
No ratings yet
I am sharing 'DSE ASSIGNMENT ADITI CHAUDHARY' with you
7 pages
B5 - Copy
No ratings yet
B5 - Copy
49 pages
Module-1 Lecture-2
No ratings yet
Module-1 Lecture-2
60 pages
SNA Presentation Training IRA ICARDA (SNA Social Network Analysis)
No ratings yet
SNA Presentation Training IRA ICARDA (SNA Social Network Analysis)
41 pages
Graph Centrality Measures
No ratings yet
Graph Centrality Measures
3 pages
Social Network Analytics Notes
No ratings yet
Social Network Analytics Notes
14 pages
Big Data Analysis With Networks (BDAWN) : Name of The Faculty: Affiliation
No ratings yet
Big Data Analysis With Networks (BDAWN) : Name of The Faculty: Affiliation
8 pages
Unit 3 notes_unit3
No ratings yet
Unit 3 notes_unit3
25 pages
2564-Article Text-7906-1-10-20230930
No ratings yet
2564-Article Text-7906-1-10-20230930
14 pages
1st Unit-Doc sna (1)
No ratings yet
1st Unit-Doc sna (1)
19 pages
SNA-UNIT-1 Full
No ratings yet
SNA-UNIT-1 Full
84 pages
Module 2 - Centrality
No ratings yet
Module 2 - Centrality
3 pages
Intermediate Data Science NX
No ratings yet
Intermediate Data Science NX
48 pages
SMA unit-2-complete-notes
No ratings yet
SMA unit-2-complete-notes
31 pages
Web Data Analysis
No ratings yet
Web Data Analysis
93 pages
Complete Download Social Network Analysis 4° Edition John Scott PDF All Chapters
No ratings yet
Complete Download Social Network Analysis 4° Edition John Scott PDF All Chapters
67 pages
Dmproject
No ratings yet
Dmproject
11 pages
Lecture2
No ratings yet
Lecture2
25 pages
SNA
No ratings yet
SNA
16 pages
Lecture 4 - Analyzing Massive Graphs Part I
No ratings yet
Lecture 4 - Analyzing Massive Graphs Part I
27 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Social Network Analysis
No ratings yet
Social Network Analysis
58 pages
SNA- SHORT NOTES
No ratings yet
SNA- SHORT NOTES
43 pages
Introduction to SNA
No ratings yet
Introduction to SNA
33 pages
Visualization Using Tools
No ratings yet
Visualization Using Tools
12 pages
15-Social Network Analysis
No ratings yet
15-Social Network Analysis
18 pages
socialmediaunit2
No ratings yet
socialmediaunit2
11 pages
SMA exp 6 (3)
No ratings yet
SMA exp 6 (3)
2 pages
Chapter 3
No ratings yet
Chapter 3
54 pages
Social Network Analysis
No ratings yet
Social Network Analysis
40 pages
Lecture 4 Centrality Measure
No ratings yet
Lecture 4 Centrality Measure
83 pages
Social media and Data analytics Unit 3 notes
No ratings yet
Social media and Data analytics Unit 3 notes
7 pages
2 Centrality (1)
No ratings yet
2 Centrality (1)
4 pages
T3 Sna 2324
No ratings yet
T3 Sna 2324
80 pages
Basics of Network Analysis
No ratings yet
Basics of Network Analysis
38 pages
C- SNA Applications
No ratings yet
C- SNA Applications
10 pages
Graph Data Science - Vipin Kumar
No ratings yet
Graph Data Science - Vipin Kumar
17 pages
Chapter 37 - Network Visualization
No ratings yet
Chapter 37 - Network Visualization
33 pages
Social Network Analysis - AA - Article - Session 12
No ratings yet
Social Network Analysis - AA - Article - Session 12
56 pages
4 - Cementum Seminar
No ratings yet
4 - Cementum Seminar
77 pages
Laboratory Experiment No1 Turbidity and PH
No ratings yet
Laboratory Experiment No1 Turbidity and PH
5 pages
Req Week 5 Requirement Writing - Questions
No ratings yet
Req Week 5 Requirement Writing - Questions
4 pages
Leibniz Series
No ratings yet
Leibniz Series
8 pages
Cosmological Constant Implementing Mach Principle in General Relativity
No ratings yet
Cosmological Constant Implementing Mach Principle in General Relativity
12 pages
SFCJ E401 - C172R Weight Balance
No ratings yet
SFCJ E401 - C172R Weight Balance
1 page
Instant ebooks textbook 2024 CFA Program Curriculum Level II Volume 4 Equity Valuation Fixed Income 1st Edition Cfa Institute download all chapters
100% (24)
Instant ebooks textbook 2024 CFA Program Curriculum Level II Volume 4 Equity Valuation Fixed Income 1st Edition Cfa Institute download all chapters
40 pages
8.1 O&M-KAT B 1310 EKN Edition2 en 11 09.unlocked
100% (1)
8.1 O&M-KAT B 1310 EKN Edition2 en 11 09.unlocked
16 pages
List of Quatities and Specifications / Technical Material Data Sheet Offered Qty: 17.000 KG
No ratings yet
List of Quatities and Specifications / Technical Material Data Sheet Offered Qty: 17.000 KG
1 page
Grade 9 Physics Work and Energy Notes
No ratings yet
Grade 9 Physics Work and Energy Notes
7 pages
10m Poly Analysis 1 Answers
No ratings yet
10m Poly Analysis 1 Answers
3 pages
S.6 Maths LCB S6 P2 Exercise 1 Revision Past Papers
No ratings yet
S.6 Maths LCB S6 P2 Exercise 1 Revision Past Papers
4 pages
Book 1
No ratings yet
Book 1
80 pages
C Thread Making NR en PDF
No ratings yet
C Thread Making NR en PDF
60 pages
GreatBookEForthLINUX EN v1 3
No ratings yet
GreatBookEForthLINUX EN v1 3
87 pages
INGENIAS Agent Framework: Development Guide Version 1.0
No ratings yet
INGENIAS Agent Framework: Development Guide Version 1.0
51 pages
Armaflex Tuffcoat Application Guide
No ratings yet
Armaflex Tuffcoat Application Guide
12 pages
Intro To R Assignment
No ratings yet
Intro To R Assignment
10 pages
STEC Digital Level DLS-15,07 ID-En.240328
No ratings yet
STEC Digital Level DLS-15,07 ID-En.240328
2 pages
ALPS 2331 Chemistry Assignment Paper
No ratings yet
ALPS 2331 Chemistry Assignment Paper
7 pages
FT Kroftacare 2100
No ratings yet
FT Kroftacare 2100
1 page
Vargavimshopakam (Vimshopaka Bala)
No ratings yet
Vargavimshopakam (Vimshopaka Bala)
2 pages
Qcfi Que Set 4 Theory & Questions by Judges
100% (1)
Qcfi Que Set 4 Theory & Questions by Judges
12 pages
Canton Tower Engineering The Elegance PDF
No ratings yet
Canton Tower Engineering The Elegance PDF
45 pages
Instruction Manual - Disk Image
No ratings yet
Instruction Manual - Disk Image
10 pages
Kendeil PDF
No ratings yet
Kendeil PDF
7 pages
Energy Tranformation Lesson Outline-1
No ratings yet
Energy Tranformation Lesson Outline-1
2 pages

Uploaded by

Uploaded by

Social Network Analysis and

Big Scholarly Data

• You have an basic understanding of Big Scholarly Data

• You have an understanding of how Social Network Analysis can be applied to

•BSD may include

2. Web of Clarivate Bibliographic Information including 90+ million

3. Scopus Elsevier Bibliographic Information including 75+ million

•Scholarly text mining

•Scholarly Network Analysis (or Social Network Analysis)

• Average path length

• Average path length: Average distance of any two nodes in a

• Average path length: Average distance of any two nodes in a

• Clustering coefficient: is a measure of the degree to which nodes in

• Average path length: Average distance of any two nodes in a network is

• Clustering coefficient: is a measure of the degree to which nodes in a

• Centrality Measures: They measure how central (important) a node is in a

 Measurement of importance is called Centrality in SNA

• Interaction among members of the network

• Control the flow of information, resources, and other network content

• Ability to act together collectively

• Degree – Proportional to the number of other nodes to which a node

• Closeness – The sum of geodesic distances (shortest paths) to all

• Closeness – The sum of geodesic distances (shortest paths) to all other

• Betweenness – The extent to which a particular point lies ‘between’

Gephi/ Windows/Linux/IOS Exploratory Data Analysis; Social Network Analysis; Link

NetworkX / Windows/IOS Creation, manipulation, and investigation of the structures,

•Create graph using the data

•Use graph for further processing

•A software tool to clean data

•A software tool to create the graph

•Process the graph for further obtaining necessary measures

• Clean the data by some cleaning software such as OpenRefine and

You might also like