ADIGRAT UNIVERSITY Bass New
ADIGRAT UNIVERSITY Bass New
(HUMAN NUTRITION)
BISTATISTICS COURSE
INDIVIDUAL ASSIGNMENT
PREPARED BY:
MEBRAHTU HADUSH GESESEW
IDNO………………………
Phone- 0914857487
SUBMITTED TO:
BIRHANE F. [ASSISTANT PROFESSOR]
OCTOBER 2023
WUKRO, TIGRAY
1. Describe the role of biostatistics in public health. at least two?
Biostatistics plays a crucial role in public health by providing statistical tools and techniques to
analyze health-related data and make informed decisions. The key roles of biostatistics in public
health are:
Measures of central tendency are statistical measures that provide a representative value for a
dataset. It is a central or typical value for a probability distribution. Measures of central
tendency are often called average.
Two commonly used measures of central tendency are:
A. Mean: It is obtained by summing up all the values and dividing by the total number of
observations. It is the descriptive measure most people have in mind when they speak of
the “average.” The adjective arithmetic distinguishes this mean from other means that can
be computed. It is the most familiar measure of central tendency is the arithmetic mean.
General Formula for the Mean It will be convenient if we can generalize the procedure for
obtaining the mean and, also, represent the procedure in a more compact notational form. Let
us begin by designating the random variable of interest by the capital letter X. In our present
illustration we let X represent the random variable, age. Specific values of a random variable
will be designated by the lowercase letter x. To distinguish one value from another, we attach a
subscript to the x and let the subscript refer to the first, the second, the third value, and so on.
The Sample Mean When we compute the mean for a sample of values, the procedure just
outlined is followed with some modifications in notation. We use x to designate the sample
mean and n to indicate the number of values in the sample.
Properties of the Mean the arithmetic mean possesses certain properties, some desirable and
some not so desirable. These properties include the following:
1. Uniqueness. For a given set of data, there is one and only one arithmetic mean.
2. Simplicity. The arithmetic mean is easily understood and easy to compute.
3. Since each and every value in a set of data centers in to the computation of the mean, it is
affected by each value. Extreme values, therefore, have an influence on the mean and, in some
cases, can so distort it that it becomes undesirable as a measure of central tendency.
B. Median: The median is the middle value of an ordered dataset. It divides the data into
two equal halves, with half of the observations above and half below the median. The
number of values equal to or greater than the median is equal to the number of values
equal to or less than the median.
If the number of values is odd, the median will be the middle value when all values have
been arranged in order of magnitude. When the number of values is even, there is no
single middle value. Instead, there are two middle values. In this case the median is
taken to be the mean of these two middle values, when all values have been arranged
in the order of their magnitudes.
Merit& demerit
B. Stratified Sampling:
1 Stratified sampling involves dividing the population into homogeneous subgroups or
strata based on certain characteristics. A random sample is then selected from each
stratum.
2 This method ensures representation from each subgroup and can provide more precise
estimates for specific subgroups of interest.
3 It is appropriate when the distribution of the characteristic to be studied is strongly
affected by a certain variable (heterogeneous population). The population is first divided
in to groups (strata) according to a characteristic of interest (e.g., sex, geographic area,
prevalence of disease etc.) a separate sample is then taken independently from each
stratum, by simple random or systematic sampling
Proportional allocation: - if the same sampling fraction is used for each stratum.
Non- proportional allocation: - if a different sampling fraction is used for each stratum or if
the strata are unequal in size and a fixed number of units is selected from each stratum.
Merit: The representativeness of the sample is improved. That is, adequate representation of
minority subgroups of interest can be ensured by stratification and by varying the sampling
fraction between strata as required
DEMERIT: Sampling frame for the entire population has to be prepared separately for each
stratum.
C. Cluster Sampling: Cluster sampling involves dividing the population into clusters or
groups and randomly selecting a few clusters for inclusion in the sample. This method is
useful when it is impractical or costly to sample individuals directly. In this sampling
scheme, selection of the required sample is done on groups of study units (clusters) instead
of each study unit individually. The sampling unit is a cluster, and the sampling frame is a list
of these clusters.
Procedure: The reference population (homogeneous) is divided into clusters. These clusters
are often geographic units (e.g., districts, villages, etc.)
1. A sample of such clusters is selected.
2. All the units in the selected clusters are studied.
3. It is preferable to select a large number of small clusters rather a than small number of large
clusters.
Merit: A list of all the individual study units in the reference population is not required. It is
sufficient to have a list of clusters.
Demerit: It is based on the assumption that the characteristic to be studied is uniformly
distributed throughout the reference population, which may not always be the case. Hence,
sampling error is usually higher than for a simple random sample of the same size.
Merits:
Systematic sampling is usually less time consuming and easier to perform than
simple random sampling. It provides a good approximation to SRS.
Unlike SRS, systematic sampling can be conducted without a sampling frame (useful
Demerits:
If there is any sort of cyclic pattern in the ordering of the subjects which coincides with
the sampling interval, the sample will not be representative of the population.
4. Explain basic difference between correlation and regression?
The basic difference between correlation and regression is:
Correlation
1. Measures the strength and direction of the relationship between two variables.
2. It quantifies the degree to which changes in one variable are associated with changes in
another variable.
3. Correlation does not imply causation, meaning it does not determine if one variable
causes change in the other.
4. Measures: the degree of relationship between two independent variables (X and Y) and
stipulates the degree to which both variables can move together.
Regression
1. Is used to model the relationship between a dependent variable and one or more
independent variables.
2. It aims to predict the value of the dependent variable based on the values of the
independent variables.
3. Regression can help identify the nature and strength of the relationship and can be
used for prediction and hypothesis testing. Measures how one variable affects another
and the main purpose is to calculate the values of a random variable
1. Interchangeable factors
Regression establishes how X causes Y to change and the results will change if X and Y are
swapped. With correlation, X and Y are variables that can be interchanged and get the same
result.
2. single data point vs. equation: correlation is a single statistics or data point whereas
regression is the entire equation with all of the data point that are represented with a
line
3. Relationship vs. effect: correlation shows the relationships between the two variables,
while regression allows us to see how one affects the other.
4. Cause and effect the data shown with regression establishes a cause and effect. when
one changes so does the other, and not always in the same direction. with correlation
the variables move together.
5. What are the assumption of logistic regression? explain? describe at least two?
These assumptions are important to ensure the validity and reliability of the logistic regression
model. Violation of these assumptions may lead to biased or misleading results.
IN general, logistic regression is a data analysis technique that uses mathematics to find the
relationship between two data factor.it then uses this relationship between two data factors. It
then uses this relationship to predict the value of one those factors based on the others the
prediction usually has a finite number of outcomes like yes or no.