Scheduling Metrics
Scheduling Metrics
Schedule metrics involves the measurement of any particular parameter in a plan of work, such
as the number of activities, constraints or relationships in a schedule. While the subject of
schedule metrics has been adequately covered in literature, the lack of standardization in the
definition of metrics and their acceptable limits remains a source of ongoing debate.
The schedule is an essential management tool and its purpose is to predict completion dates for
the project’s activities and assist project teams in the decision-making process. As such, the
schedule must convey the right information so the dates should be reliable. One of the initial
steps in the schedule review process is the metrics evaluation that provides an easy way to
verify the schedule integrity. However, a schedule that passes a metrics test proves that it
meets the stated limits of the parameters tested and nothing more. There are no indications if
the plan is realistic, feasible, or if it reflects the stated execution strategy. What importance
does the schedule metrics evaluation play in the schedule review and approval process?
Since the introduction of the US Defense Contract Management Agency (DCMA) schedule
evaluation protocol in 2005 and of its subsequent revisions, the scheduling community started
taking sides on the matter of its applicability. In the years that followed, numerous papers
covered the subject with views ranging from praising and recommending the methodology to
those highlighting its numerous flaws. Most reviews merely explained how to perform the
analysis without actually weighing-in on its appropriateness or limitations.
In a 2011 paper, DCMA 14-Point Schedule Assessment, this author performed a thorough
evaluation of the protocol and highlighted a series of inconsistencies and implementation
issues [1, pp. 20-21]. A decade later, not much seems to have changed and this analysis persists
and even thrives in the industry. Now other metrics tests have entered the schedule quality
analysis field.
Acumen Fuse software makes an attempt at standardization and proposes the Schedule Quality
IndexTM as a mean to quantify the schedule quality. More recently, SmartPM TM Technologies, a
schedule analytics company specializing in the construction industry, introduced its own version
of a schedule quality index, based on the DCMA’s protocol. But while these indices become
increasingly popular, their adoption needs to be made on an informed decision.
The authors believe that the original intent of using the schedule metrics was to improve the
quality of the submitted schedules by adherence to predefined thresholds. The difficulty lies in
A brief overview of the most popular schedule standards and guidelines is presented, followed
by the introduction of software solutions for schedule metrics evaluation 1. The subsequent
section highlights the importance of using the correct metrics and corresponding thresholds for
a given schedule and pinpoints the differences that exist in some of the most popular schedule
checks. The authors present some of the factors that need to be considered while selecting the
metrics and establishing their thresholds.
The paper addresses the importance of using the correct evaluation protocol for each situation.
In addition, one needs to have a good understanding of the system’s configuration when
performing the analysis using a commercial software. The paper also includes a series of
recommendations for the schedule evaluation using metrics and concludes by reminding
practitioners that this evaluation is only the initial step of any thorough schedule review.
Among the recent developments that gain traction in the scheduling community, the authors
look into the issue of benchmarking the various schedule metrics. The benchmarking feature
proposed by Deltek is becoming popular and the authors highlight some of the issues pertaining
to the use of the tool.
Many public standards and guidelines exist that prescribe the schedule quality, with
most of them having been specifically designed to evaluate detailed control schedules.
These detailed control schedules, defined as Class 2 schedules in the AACE
Recommended Practice 27R-03 Schedule Classification System, can exist as
independent entities or be developed as parts of an Integrated Master Schedule (IMS)
[2, p. 8]. The lowest level of detail that a Class 2 schedule is usually presented is a
Level 4, although Level 3 schedules also are frequently used in the EPC and EPCM
environment.
1
The example used were obtained by using Deltek’s Acumen Fuse version 8.5, and Schedule Analyzer for the
Enterprise, Baseline Checker version 4.
Scheduling Metrics and the Dangers of Remaining Silent 2
While the literature abounds in best practices recommendations and guidelines, what
seems to be missing is the general consensus as to what characteristics a quality
schedule should exhibit. To prove this point, the criteria included in the most popular
schedule checks are listed below.
To begin, the DCMA 14-point assessment is included in the following table [3, pp. 28-
32]:
No. Criteria Condition being evaluated Threshold
1 Logic Number of incomplete tasks without predecessors Less than 5%
and/or successors
2 Leads Number of logic links with a lead (negative lag) in 0
predecessor relationships for incomplete tasks
3 Lags Number of lags in predecessor logic relationships for Less than 5%
incomplete tasks
4 Relationship types Number of relationships for incomplete tasks Minimum 90% Finish-to-
Start (FS) relationships
5 Hard Constraints Number of incomplete tasks with hard constraints in Less than 5%
use
6 High Float Number of incomplete tasks with total float greater Less than 5%
than 44 working days
7 Negative Float Number of incomplete tasks with total float less than 0
0 working day
8 High Duration Number of incomplete tasks with a duration greater Less than 5%
than 44 working days
9 Invalid Dates Number of incomplete tasks with forecast 0
start/finish dates prior to the status date or with
actual start/finish dates beyond the status date
10 Resources Number of incomplete tasks with durations greater 100%
than zero that have dollars or hours assigned
11 Missed Tasks Number of tasks that have been completed or will Less than 5%
finish later than planned in the baseline
12 Critical Path Test Measures the slippage of the project completion Should be proportional
date (or other milestone) when an intentional slip is with the intentional slip
introduced in the network applied
13 Critical Path Measures the critical path “realism” relative to the Not less than 0.95
Length Index baselined finish date
(CPLI)
14 Baseline Cumulative number of tasks completed compared to Not less than 0.95
Execution Index cumulative tasks with baseline finish date on or
(BEI) before the status date
Table 1. DCMA-14 Point Assessment for Integrated Master Schedules (IMS)
The original DCMA-14 definition as well as numerous independent papers explain the
process involved in collecting these metrics. A detailed look into the implication behind
these quality checks is available in this author’s paper, DCMA 14-Point Schedule
Assessment [1].
These metrics are then scored from 1 to 3, based on the percentages found. The final
score, known as Overall Project Rating, with values between 1 and 3, is determined by
weighing the respective individual scores as indicated.
The US Government Accounting Office (GAO) has published a schedule quality metrics
practice titled Best Practices for Project Schedules. It proposes its own 10-point
checklist and associated quantitative measurements [5, pp. 183-188].
However, NDIA does not define any specific thresholds, but rather recommends
threshold guidelines that could trigger additional analysis [7, pp. 138-144].
Software Solutions
While all these metrics can be calculated manually in a spreadsheet, the process can
prove time-consuming and is prone to errors. Many software providers responded to the
increasing interest in the metrics analysis and started integrating the most referenced
protocols in their products or even creating their own checks. Among the popular tools
available today, Oracle’s P6 schedule check functionality, available for their EPPM
platform, includes its 14-point checks, as shown in the Table 4 below:
No. Criteria Condition being evaluated Target
1 Logic Activities missing predecessors or successors Less than 5%
2 Negative Lags Relationships with a lag duration of less than 0 Less than 1%
3 Lags Relationships with a positive lag duration Less than 5%
*Note: the 352h label refers to 352 hours and is equivalent to 44 days
(considering a working schedule of 8hrs/day).
Another software provider, Deltek through its Acumen Fuse product, makes available
multiple evaluation protocols: DCMA 14-point assessment, GAO and NASA Health
Check. It also includes its own schedule review criteria that produces an overall score to
evaluate the schedule quality, the Schedule Index. The various metrics that compose
the Schedule Index are included in the Table 5 below:
Threshold
No. Criteria Condition being evaluated
None Low Medium High
Total number of activities
that are missing a Less than Less than Less than More
1 Missing Logic
predecessor, a successor, or 5% 10% 25% than 25%
both
The average number of
2 Logic Density™ Minimum 2, maximum 4
logic links per activity
Point of
3 Critical Number of critical activities 0 50% 100%
reference
Number of activities with Less than More
4 Hard Constraints 0 5-25%
hard or two-way constraints 5% than 25%
Number of activities with
Less than More
5 Negative Float total finish float less than 0 0 5-25%
5% than 25%
working days
Number of activities that
Insufficient have a duration longer than Less than Less than More
6 0
Detail™ 10% of the total duration of 5% 5% than 5%
the project
Scheduling Metrics and the Dangers of Remaining Silent 6
Threshold
No. Criteria Condition being evaluated
None Low Medium High
Total number of activities
Less than Less than More
7 Number of Lags that have lags in their 25-50%
5% 25% than 50%
predecessors
Total number of activities Less than Less than More
8 Number of Leads 25-50%
carrying negative lag 5% 25% than 50%
Total number of activities
with a high number of Less than More
9 Merge Hotspot 0 Up to 25%
predecessor links (more 10% than 25%
than 2)
Table 5. The Metrics that Compose the Schedule Index
Using pre-defined weightings, these metrics are then combined into an overarching
Schedule Index. The value of the Schedule Index can vary from 0 to 100, with a
pass/fail threshold established at the 75 mark or higher. It is suggested that any value
lower than 75 should trigger a schedule rejection or at least a review and /or an update.
The creator of this software, Dr. Dan Patterson suggests that if organizations are
looking to outperform, they should consider raising this threshold to 85, but provides no
additional information to support this recommendation [8, p. 5].
With Acumen Fuse, users can create their own evaluation protocols by selecting any of
the industry standard metrics, additional metrics included in the product’s libraries or
their own developed metrics. This is a very powerful feature that can meet the needs of
expert users, allowing them to define the metrics calculation formula, its inclusions or
exclusions filters and its tripwire thresholds scales.
SmartPM, a young schedule analytics company, proposes its own Schedule Quality
Index, based on the DCMA-14 point protocol. In the brief white paper that describes the
index, the justification given for the selection of each of the fourteen metrics is not
convincing and seems odd at times. For example, the Total Relationships to Activities
The advances of technology are continually pushing the schedule analytics boundaries.
New functionalities, new products and new players are entering the industry, trying to
make best use of the schedule data. As shown in the following section, the users
challenge is to select products adapted to their needs, rather than adopt tools and
standards by mere convenience or commercial pressure.
The schedules are developed throughout the project lifecycle and their purpose and
characteristics evolve over time. This fact is reflected in the AACE Recommended
Practice 27R-03, Schedule Classification System that makes the distinction between the
schedule class, linked to the maturity of the information that supports the schedule
development, and the schedule levels, that only reflects the schedule granularity in its
presentation [2, p. 2]. Usually there is a correlation between the schedule class and the
schedule level, that ensures that the schedule is developed in line with the amount and
quality of the available information.
In her paper “Which Schedule Quality Assessment Metrics to Use? … and When?,
Shoshanna Fraizinger proposes a schedule quality metric application by schedule class
[12, p. 20]. In this case, the Class 3 schedule referenced seems to reflect the baseline
construction schedule approved for execution.
Other than the class that a schedule belongs to and the level to which it was developed,
additional factors affect the suitability of the metrics to be retained in the analysis or
their imposed thresholds. The industry for which the schedule was prepared brings its
own characteristics. For example: engineering schedules contain activities that are often
planned with preferential logic; maintenance projects would have multiple constraints
and so on.
The size of the project influences as well some of the metrics thresholds. The higher the
total installed cost is, the greater the schedule tends to be in terms of number of
activities, the longer the project’s lifecycle and the smaller the granularity. The notion of
‘high duration’ and ‘high total float’ doesn’t have the same meaning for a 6-months
project or for a 10-year project. Whether or not the project schedule was developed on a
rolling wave methodology also needs to be considered. Just as with the specification of
percentages of activities meeting a stated metric, high duration activities should be
defined as a percentage of the project length.
Amongst the most popular checks included in the diagnostic protocols and listed above,
the disagreement between metrics persists on the thresholds to apply, as highlighted
below:
1. Logic. This is one of the most frequent metrics used in a schedule diagnostic protocol.
Any scheduler is familiar with the mantra “every activity needs at least one
predecessor and one successor, with the exception of the first and last milestones in
the chain”. Any CPM schedule belonging to a Level 3 or higher should be subjected to
this check. While most protocols seem to accept a 5% threshold for the total number
of activities that are missing a predecessor or successor, any result higher than 2
activities with missing logic should at least trigger a review. A single activity that
should belong in the critical path and is not integrated in the network can lead to a
schedule with a fatal flaw. The deviation to the rule can be accepted on a case-by-case
basis, but relying on the 5% rule is a risky practice;
As for the proprietary metrics, Deltek’s Merge Hotspot metric does not have an
equivalent in the industry standards. As with the Logic Density metric above, its
application on schedules Levels 3 and lower is not appropriate, but even for Class 2 and
Class 1 schedules, developed to a Level 4 granularity, the ideal target of less than 10%
activities with no more than 2 predecessors is difficult to defend. On the opposite side, it
can be argued that schedules that meet these criteria are under-developed and have
insufficient logic.
Among other checks performed by the software providers, some are very helpful and
could indeed reveal structural issues within the schedule model. As an example, the
Open Ends metric helps identify the dangling activities. However, in the Acumen Fuse
application, this analysis incorrectly flags milestones as having open ends, and leads to
start milestones linked start-to-start to its successor or finish milestones linked finish-to-
finish to its predecessor to appear as having an open finish or an open start,
respectively. For an accurate analysis, these occurrences would need to be omitted
from the check.
Scheduling Metrics and the Dangers of Remaining Silent 11
Other metrics are just informative, in that they give additional insight into the schedule
structure, without a predefined threshold. For instance, minimum lag, maximum lag,
number of activities, either normal tasks, level of effort or summary tasks, number of
constraints, by individual type, can add another layer of information into the schedule
diagnostic. It is up to schedule reviewers to look into these metrics further if particular
parameters seem problematic.
The evaluation protocols to be used in any schedule investigation differ whether or not
the schedule under evaluation is a baseline schedule or an update schedule. Baseline
schedule reviews should concern themselves with quality and completeness issues; the
metrics used to evaluate them convey mainly information on the schedule’s structural
integrity. Update schedule reviews should primarily look at status and schedule changes
[9] [10]. For update schedules, metrics such as missed tasks, baseline execution index
(BEI), invalid dates, ‘Hit or Miss’ ratio, are a primary focus, providing additional
information into the project’s adherence to the baseline plan.
As baseline schedules usually do not contain actual dates [9, p. 8], their evaluation
concerns all the activities in the network. Some activities might subsequently be
excluded from the analysis, based on the metrics that are evaluated, as to depict the
schedule structure more adequately. A first selection can be made by activity type,
filtering out level of effort or summary activities for example. Other selections can be
made to zoom in specific groups of activities, using the same type of filters as when
developing layouts in the scheduling software.
For update schedules, although their structural soundness is expected, the metrics
evaluation usually focuses on the activities that are not completed. This view is shared
by most standards and guidelines. However, Deltek adopts a different approach and, by
default, in the calculation of the Schedule Quality Index, with the exception of Critical
and Negative Float metrics, all other metrics are counting the completed activities as
well. As with the other customizations that are possible within the tool, the selection of
the activities to include in the analysis is then left at each user’s discretion.
But within any one category, either for baseline schedule review or update schedule
review, the protocols might differ in implementation and lead to different results. In the
case of the ‘in-house’ tools developed by users in simple spreadsheets, there is a risk of
errors originating from incorrect formulas or data corruption. When a software solution is
Scheduling Metrics and the Dangers of Remaining Silent 12
used for the evaluation, the different results obtained could be attributed to the version
of the standard used by the software, or by the interpretation of the standard
requirements by the developers of the respective protocol.
To prove the point, a Class 3, Level 3 schedule, prepared at funding request, was
analyzed based on the DCMA-14 protocol, using two different software: Acumen Fuse
and Schedule Analyzer. This protocol was selected simply because it is common to
both software providers. It is important to note that the submitted schedule was not
actually subjected to the DCMA-14 protocol. The results are included in Table 6 below.
Nb. Metric Fuse Result Schedule Analyzer Result
82 activities
1a Logic / Missing Logic 13 (2%)
Ratio = 1.51%
1b Dangling Logic 82 activities
2 Leads 0 (0%) 0
3a Lags 451 (30%) 572
3b Long Lags 569
4 SS/FF Relationship Count 479 (32%) 479
4 SF Relationship Count 0 (0%) 0
5 Hard Constraints 0 (0%) 0 or 0.00%
6 High Float / Total Float > 44d 447 (52%) 449 or 52.27%
7 Negative Total Float 0 (0%) 0
8 High Duration / Original Duration > 44d 238 (28%) 238 or 27.71%
9a Invalid Forecast Dates 0 (0%) 0
9b Invalid Actual Dates N/A 0
10 Resources 470 (55%) 415
11 Missed Activities / Missed Tasks 0 (N/A) N/A
12 Critical Path Test (check) Instructions for P6 Check
13 CPLI (Critical Path Length Index) 1.00 1
14 BEI (Baseline Execution Test) N/A N/A
Based upon "OMP/IMS
Protocol Version Used Not indicated Training" ENGR120
presentation dated 21-11-09
Table 6. Example DCMA-14 Point Analysis Results from Two Metrics Software
As indicated in Table 6 above, Acumen Fuse does not specifically state the DCMA-14
point protocol version used for the evaluation, while Schedule Analyzer bases its
analysis upon ‘OMP/IMS Training’ ENGR120 presentation dated Nov 2009.
The analysis above also highlights the importance of using the correct protocols for any
given schedule. The sample schedule selected, prepared at funding request, was
submitted to quality checks using a customized protocol, derived from the Acumen Fuse
Schedule Index. The DCMA-14 protocol, while not requested for this schedule, proves
to be inappropriate in this case. To note that the project for which this schedule was
prepared was completed successfully, ahead of its target, and was also recipient of two
major Canadian awards. This only to prove the point that not meeting a certain
threshold, that is inadequate from start, has no negative consequence on the schedule
integrity.
The implementation protocols also vary for the update schedules. Calculating some of
the metrics, such as ‘Missed activities’, ‘CPLI’ and ‘BEI’, requires using baseline
information. Knowing that the native .xer files do not contain baseline information, raises
the question as to what fields Acumen Fuse reads when the schedule is prepared in P6
and then imported as .xer file. As it turns out, Acumen Fuse is considering the planned
start and planned finish as baseline start and baseline finish. Expert P6 users know that
this is not the case, so these metrics are simply incorrectly calculated. This author has
recommended a product change on this basis.
When deciding to use alternate evaluation protocols, developed by the various software
providers, users need to understand the system configuration options. For example, the
results of any Fuse analysis are dependent on the scoring option selected prior to the
analysis [14] [15] and on the metrics customization.
By default, Acumen Fuse uses a record-based scoring method. This method requires
activities (or ‘records’) to pass all metrics selected for the analysis and computes the
final score based on a pass/fail criterion, counting the number of activities that passed
Scheduling Metrics and the Dangers of Remaining Silent 14
all metrics versus the number of activities that failed any of the selected metrics. In the
metrics editor, users can designate the selected metrics as ‘bad’, ‘neutral’ and ‘good’
and any occurrence of ‘bad’ metrics negatively influences the overall score.
The second evaluation option offered by Acumen Fuse, the metrics-based scoring,
allows activities to receive partial credit towards the overall score, based on the number
of metrics that pass the criteria and the weighting values attributed to those metrics.
Deltek states that, by default, the metrics weighting values were set to the midpoint in
the weighting scale, but this doesn’t always seem to be the case. Open ends, minimum
lag, maximum lag, leads, negative float, more than 30 days float, are just a couple of
metrics set to the absolute ‘bad’ values in Acumen Fuse metrics library [14].
The selected score calculation method greatly influences the overall score results.
Using the same sample schedule as before, below are the results of the Acumen Fuse
analysis for the overall Schedule Index, using each of the available scoring methods.
Missing Logic 17 2%
Hard Constraints 0 0%
Negative Float 0 0%
Insufficient Detail™ 44 5%
Number of Leads 0 0%
While all individual metrics results are identical, the overall score might generate a pass
or fail results, based on the selected scoring method.
Moreover, using the same scoring method, users can influence the overall score results
by changing the weightings applied to the selected metrics. As an example, another
analysis was performed under the record-based method option, modifying the
While a very powerful feature due to the great flexibility it provides, all these
customizations are not transparent to the readers when viewing the analysis report, and
arguably are not known to every user of this software. When publishing the analysis
report, the details behind the calculations are not included, and it is up to each user to
document all the changes made for customization.
Schedule Benchmarking
As the concept of benchmarking the schedule quality starts to make way in the industry,
the obvious question to be asked is, are the benchmarking data reliable? To answer this
question, the very nature of assessing the schedule quality needs to be considered.
What data was collected in any previous schedule evaluation, at which moment in the
project’s lifecycle, from what source, using what diagnostic protocol? What controls are
put in place in a benchmarking exercise, to make sure that the comparison is made on
similar bases? As the projects are, by definition, unique endeavors, any benchmarking
data would need to be normalized, requiring a significant amount of information that
documents the project’s context.
The information that comprises the benchmarking data lacks transparency. There is no
available information as to the number of schedules that comprises the data set, the
schedule class, the schedule level, the project development stage, the size of the
project, the scheduling party (owner, contractor, EPCM company), and so on. All of
these are factors that would need to be taken into account for a benchmarking exercise
and for normalization purposes. Other than the category / sub-category criteria used to
group projects by industry, there are no additional criteria to zoom-in for a specific set of
data.
Scheduling Metrics and the Dangers of Remaining Silent 16
Once the software users opt-in for sharing their own set of data, the corresponding
metrics are collected by the software company. The data set will then be associated
with a specific project type, based on the category / sub-category selected by the user.
If no category is assigned by the user, the project gets put into a pool of ‘uncategorized’
projects. If a user would want to compare his results with the ones in other industries,
he would select a different category and launch the feature. Due to a current flaw in the
product, the same data set could be collected for more than one category of projects, as
each use of the functionality will trigger the sampling. An enhancement request was
raised by this author and it is hoped that the data collection procedure will be modified
in a future version of the software to correct this flaw.
On the same subject of data collection protocol, there is another weakness. Consider a
schedule that is in the development stage. As the schedule matures, the schedulers
could use Acumen Fuse to self-asses the quality of their schedule and use the results to
bring it to adherence standards, whatever they might be. In this case, each iteration will
get stored and the software does not allow for differentiation between a work in
progress schedule and a final schedule. Moreover, as the schedule is getting ready to
be issued for client review, another analysis could be performed by the functional team.
Finally, as the schedule is sent over to the client for approval, another analysis can be
made by the client, using the same software. And if iterations are required to fix
outstanding issues, this will just multiply the number of instances that get collected. The
data collection protocol would need controls in place to ensure no duplication of the
same data set is stored.
From a practical point of view, the benchmarking feature allows the comparison to be
made based on two different overall scores: Fuse Schedule Index and Fuse Logic
Index. In each case, the default scoring method and weighting criteria are used for the
comparison. During the software customization process, if the user decides to remove
or to add metrics in the analysis, to modify the weightings or the thresholds as to suit its
project characteristics, then all those customizations are ignored by the benchmarking
feature. A schedule could fare very well on a custom quality index, but then would be
‘normalized’ and brought into the mold of the Fuse standard analysis. Currently, the
benchmarking feature can’t be customized to include a specific set of metrics for
comparison, and this could prove confusing for the report readers.
Other software providers are using a different version of the benchmarking feature,
allowing users to create reports from their own scheduling database. One example of
such providers, Schedule Analyzer Enterprise Forensic software package, contains a
benchmark module that allows users to create benchmarking from their own scheduling
As standards and guidelines covering the schedule quality evolve, the software
providers follow suit and adapt their packages. With the rapid advancement of
technologies, it is expected that even more schedule analytics products will appear on
the market. But users need always to be wary as to how these products work and get
involved in addressing the discovered flaws. Bugs identifications and enhancement
requests are good ways of communicating to software providers technical issues. These
processes are meant to improve the products usability and ultimately help the
scheduling community.
As seen above, a multitude of metrics can be used to analyze the structural integrity of
the schedules during their development and throughout most of the project’s life cycle.
This metrics analysis is best used as a first step in a more comprehensive schedule
review process. Its results will only attest if the schedule dates can be reasonably
trusted, and nothing more. Subsequent analysis will treat scope completeness, accurate
reflection of the execution strategy, the plan’s feasibility, and the like. Performing these
detailed analyses on a flawed schedule model risks being a waste of time.
But not all schedule statistics are equally important in measuring schedule quality.
Some metrics are more critical than others, based on the specificity of the schedule that
is being analyzed. Some thresholds are stricter than others, and trigger schedule
revisions or even schedule rejections. And some metrics are informative only, and do
not trigger any further action. Any schedule quality analysis should be concerned with
focus, appropriate protocol, and normalization.
1. Focus. At the onset of any analysis, users need to ask a series of questions to better
define the framework of the exercise. What is the objective of the analysis? Is the
analysis performed as a self-assessment of schedule quality or is it performed to
ensure compliance to a given standard? Is it a company standard or a client standard?
The answer to this question would indicate the degree of flexibility with the results
obtained and the actions to follow the analysis. What are the activities to retain in an
analysis? Does the analysis cover the entire schedule or a specific portion of it? The
answer will determine the filters to be used in the analysis and the data set that will
be investigated.
After adopting the appropriate protocol and normalizing the individual tests, the review analysis
can be performed in simple spreadsheets, in-house developed tools, or commercial software.
When selecting the latter, users would need to give special consideration to software
customization, transparency, and results validation.
1. Software customization. Based on the software solution used for the analysis, users
might have more or less flexibility in customizing the test by modifying the selection
of metrics to be included in the analysis or their thresholds. The proposed schedule
quality indices should be seen as guidelines only and they should prompt reflection.
The users should strive to develop schedule quality indices that take into account the
particularities of the schedule that needs to be evaluated. A good analysis should
capture all relevant metrics for that specific schedule. This would include everything
that would otherwise compromise the use of the schedule for the purpose it was
developed for. Considering the multitude of metrics included in the protocols listed
above and many more others available in the various software packages, some
metrics are more critical than others at various points in the project lifecycle.
2. Transparency. Irrespective of the tool used for the analysis, users should be
transparent in the methodology used, in the metrics selection criteria, and in the
thresholds used. All assumptions, additional filtering of the information, and tool
customizations, including the calibration of the metrics weighting, when applicable,
should be clearly described in the schedule basis document.
3. Validate the results obtained. While software continues to evolve and new products
enter the market, users need to keep their guard up and always validate and question
the outcomes they produce. Just because a software issues a certain result, this
doesn’t necessarily make it ‘right’. By asking questions and raising flags on software
Conclusion
With reference to the original question, the review of the current public schedule metrics
shows a widespread confusion as to which metrics to use. Guidelines such as the
DCMA 14-Point Assessment, NASA’s Scheduling Management Criteria, NAVAIR 11-
Point Assessment, NDIA were introduced. Schedule metrics software implementations
such as Oracle’s P6 14-point checks, Acumen Fuse Schedule Index, Schedule Analyzer
for the Enterprise’s Baseline Checker, and SmartPM Schedule Quality Index were also
briefly introduced.
General issues with using the correct metrics were presented. Schedule class and level
has a definite bearing on metric appropriateness. The size of the project being
investigated definitely influences the thresholds being used to assess schedule quality.
Each schedule quality analysis should be concerned with focus, appropriate protocol,
and normalization, and it was highlighted that software implementations require proper
customization, transparency, and validation of the results obtained.
The bodies that issue standards unfortunately do not test and certify software
implementations. This can lead to various versions of the same guideline being used by
different software providers, or to different interpretations of a given metric. Many of the
software products proport to measure the same thing but can deliver different results
Moreover, different results can occur using the same software, but with different
‘standard’ settings. How the proprietary metrics are being used has a direct bearing on
the evaluation of a metric’s conclusions.
And while benchmarking is a sought-after undertaking, the tools available on the market
do not provide yet a suitable solution. Acumen Fuse’s benchmarking feature, while
interesting in concept, requires additional controls in place to ensure reliable data
collection and normalization before a wide-spread adoption of the tool can be
envisioned. Other software tools for benchmarking various schedule statistics should
also be considered and evaluated.
In conclusion the question remains, can the metrics proposed by the various software
providers be relied upon without an industry validation? Innovation is welcome but
should be promoted with transparence and support data. Just because a software
provider includes a certain metric in its protocol or proposes a default threshold for a
given metric, this does not make them an ‘industry standard’. System implantation
options used should be thoroughly understood and documented and users should take
an active role in helping software companies fix the identified bugs.
References