Unit IV Data Processing and Analysis
Unit IV Data Processing and Analysis
R.P. Gajurel 2
Organising and preparing data
R.P. Gajurel 3
❑ Editing
R.P. Gajurel 4
R.P. Gajurel 5
R.P. Gajurel 6
❑ Coding
R.P. Gajurel 7
R.P. Gajurel 8
• Methods and Techniques of Coding Data
• There are different methods and techniques of coding data, depending on the type of data and the research
objectives. Here are some of the common methods:
• 1. Numeric Coding
• Numeric coding involves assigning numerical codes to data categories. For instance, in a survey
questionnaire, the response options could be assigned numbers from 1 to 5, with 1 indicating the least
favorable response and 5 indicating the most favorable response.
• 2. Alphanumeric Coding
• Alphanumeric coding involves assigning codes that consist of letters and numbers to data categories. This
method is useful when dealing with complex data that requires more detailed coding.
• 3. Color Coding
• Color coding involves using different colors to represent different data categories. This method is useful
when working with visual data, such as graphs and charts.
• 4. Hierarchical Coding
• Hierarchical coding involves categorizing data into levels, with each level having its own codes. This method
is useful when dealing with data that has multiple variables and sub-variables.
• Advantages of Coding Data
• Ease of Data Analysis: Coding data simplifies the data analysis process and makes it more efficient.
• Data Organization: It organizes data into manageable units that can be easily retrieved when needed.
• Accuracy and Consistency: It ensures accuracy and consistency in the coding process, which is essential for
reliable research findings.
• Comparison and Generalization: It facilitates comparison and generalization of findings across different
variables and research studies.
R.P. Gajurel 9
❑ Classification
R.P. Gajurel 10
R.P. Gajurel 11
❑ Tabulation
R.P. Gajurel 12
R.P. Gajurel 13
Types of Table
1. Univariate Table
This type of table consists of only one variable.
Year Enrollment
2019 2300
2020 2100
2021 1900
2022 1540
2. Bivariate Table
This type of table involves two different variables.
Year Enrollment
BA MA
Math Economics Math Economics
2019 100 12 200 50
2020 120 87 234 34
2021 50 24 70 654
2022 20 300 80 78
R.P. Gajurel 14
3. Multivariate data
• When the table involves three or more variables, it is categorized under
multivariate table.
Year Enrollment
BA MA
Math Economics Nepali Sociology Math Economics Nepali Sociology
2019 100 12 100 12 200 50 12 200
2020 120 87 120 87 234 34 87 234
2021 50 24 50 24 70 654 24 70
2022 20 300 20 300 80 78 300 80
R.P. Gajurel 15
Generally accepted principles of tabulation
1. Every table should have a clear, concise and adequate title so as to make the table intelligible without reference to the text and this title
should always be placed just above the body of the table.
2. Every table should be given a distinct number to facilitate easy reference.
3. The column headings (captions) and the row headings (stubs) of the table should be clear and brief.
4. The units of measurement under each heading or sub-heading must always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the table, along with the reference symbols used in the
table.
6. Source or sources from where the data in the table have been obtained must be indicated just below the table.
7. Usually the columns are separated from one another by lines which make the table more readable and attractive. Lines are always drawn at
the top and bottom of the table and below the captions.
8. There should be thick lines to separate the data under one class from the data under another class and the lines separating the sub-
divisions of the classes should be comparatively thin lines.
9. The columns may be numbered to facilitate reference.
10. Those columns whose data are to be compared should be kept side by side. Similarly, percentages and/or averages must also be kept
close to the data.
11. It is generally considered better to approximate figures before tabulation as the same would reduce unnecessary details in the table itself.
12. In order to emphasise the relative significance of certain categories, different kinds of type, spacing and indentations may be used.
13. It is important that all column figures be properly aligned. Decimal points and (+) or (–) signs should be in perfect alignment.
14. Abbreviations should be avoided to the extent possible and ditto marks should not be used in the table.
15. Miscellaneous and exceptional items, if any, should be usually placed in the last row of the table.
16. Table should be made as logical, clear, accurate and simple as possible. If the data happen to be very large, they should not be crowded
in a single table for that would make the table unwieldy and inconvenient.
17. Total of rows should normally be placed in the extreme right column and that of columns should be placed at the bottom.
18. The arrangement of the categories in a table may be chronological, geographical, alphabetical or according to magnitude to facilitate
comparison. Above all, the table must suit the needs and requirements of an investigation.
R.P. Gajurel 16
Problems in Data Processing
• We can take up the following two problems of processing the data for analytical
purposes (Cothari, 2004):
(a) The problem concerning “Don’t know” (or DK) responses
• One category of such problem in responses may be ‘Don’t Know Response’ or
simply DK response.
• When the DK response group is small, it is of little significance.
• But when it is relatively big, it becomes a matter of major concern in which case
the question arises: Is the question which elicited DK response useless?
• The answer depends on two points viz., the respondent actually may not know
the answer or the researcher may fail in obtaining the appropriate information.
• Solution: The best way is to design better type of questions. Good rapport of
interviewers with respondents will result in minimising DK responses.
• The other way is to keep DK responses as a separate category in tabulation
where we can consider it as a separate reply category if DK responses happen
to be legitimate, otherwise we should let the reader make his own decision.
R.P. Gajurel 17
• (b) Use or percentages
• Percentages are often used in data presentation for they simplify numbers, reducing all of
them to a 0 to 100 range. Through the use of percentages, the data are reduced in the
standard form with base equal to 100 which fact facilitates relative comparisons.
• While using percentages, the following rules should be kept in view by researchers:
• 1. Two or more percentages must not be averaged unless each is weighted by the group
size from which it has been derived.
• 2. Use of too large percentages should be avoided, since a large percentage is difficult to
understand and tends to confuse, defeating the very purpose for which percentages are
used.
• 3. Percentages hide the base from which they have been computed. If this is not kept in
view, the real differences may not be correctly read.
• 4. Percentage decreases can never exceed 100 per cent and as such for calculating the
percentage of decrease, the higher figure should invariably be taken as the base.
• 5. Percentages should generally be worked out in the direction of the causal-factor in case
of two-dimension tables and for this purpose we must select the more significant factor
out of the two given factors as the causal factor.
R.P. Gajurel 18
Types of Data Analysis
R.P. Gajurel 19
• In essence, descriptive statistics are used to report or describe the
features or characteristics of data. They summarize a particular
numerical data set,or multiple sets, and deliver quantitative insights
about that data through numerical or graphical representation.
• Descriptive statistics only reflect the data to which they are applied.
A descriptive statistic can be:
• A measure of central tendency, like mean, median, or mode: These
are used to identify an average or center point among a data set
• A measure of dispersion or variability, like variance, standard
deviation, skewness, or range: These reflect the spread of the data
points
• A measure of distribution, like the quantity or percentage of a
particular outcome: These express the frequency of that outcome
among a data set
R.P. Gajurel 20
R.P. Gajurel 22
• Two types of inferential analysis:
1. Estimation - अनुमान _
• Estimation is a procedure in which we use the information included in a sample to get
inferences about the true parameter of interest.
• An estimator is a sample statistic that used to estimate the population parameter while
an estimate is the possible values that a given estimator can assume
R.P. Gajurel 23
• A desirable property of a good estimator is the following
• It should be unbiased: The expected value of the estimator must be equal to the
parameter to be estimated.
• It should be consistent: as the sample size increase, the value of the estimator
should approaches to the value of the parameter estimated.
• It should be efficient: the variance of the estimator is the smallest.
• It should be sufficient: the sample from which the estimator is calculated must
contain the maximum possible information about the population.
• Types of Estimation
• a. Point estimation: It uses the information in the sample to arrive at a single
number (that is called an estimate) that is intended to be close to the true value of
the parameter.
• b. Interval estimation: It uses the information of the sample to end up at an
interval (i.e. construct 2 endpoints) that is intended to enclose the true value of the
parameter.
R.P. Gajurel 24
R.P. Gajurel 26