Theory of monitoring: shifts from baselines in seagrass metrics.

The following material is copyright to Schultz et al., and is excerpted from:

Schultz S.T., Kruschel C., Bakran-Petricioli T., Petricioli D. 2015. Error, Power, and Blind Sentinels: The Statistics of Seagrass Monitoring. PLoS ONE 10: e0138378. doi:10.1371/journal.pone.0138378.

To pursue the COREBIO goal of using baselines as indicators of habitat change over monitoring time interals, we derived statistical properties of standard methods for monitoring of habitat cover worldwide, and criticize them in the context of mandated seagrass monitoring programs, as exemplified by Posidonia oceanica in the Mediterranean Sea. We report the novel result that cartographic methods with non-trivial classification errors are generally incapable of reliably detecting habitat cover losses less than about 30 to 50%, and the field labor required to increase their precision can be orders of magnitude higher than that required to estimate habitat loss directly in a field campaign. We derive a universal utility threshold of classification error in habitat maps that represents the minimum habitat map accuracy above which direct methods are superior. Widespread government reliance on blind-sentinel methods for monitoring seafloor can obscure the gradual and currently ongoing losses of benthic resources until the time has long passed for meaningful management intervention. We find two classes of methods with very high statistical power for detecting small habitat cover losses: 1) fixed-plot direct methods, which are over 100 times as efficient as direct random-plot methods in a variable habitat mosaic; and 2) remote methods with very low classification error such as geospatial underwater videography, which is an emerging, low-cost, non-destructive method for documenting small changes at millimeter visual resolution. General adoption of these methods and their further development will require a fundamental cultural change in conservation and management bodies towards the recognition and pro- motion of requirements of minimal statistical power and precision in the development of international goals for monitoring these valuable resources and the ecological services they provide.

The following figure indicates that a random plot method for detecting a 10% loss from baseline, where the coefficient of variation in the indicator metric is approximately 1, requires over 2000 random plots.  In typical random plot methods of seagrass monitoring, the sample size is on the order of 1/50th this minimum.  The random plot method has been proposed as a portion of the national monitoring protocol of Posidonia oceanica for Croatia.

In a fixed-plot method for monitoring the location of the margin of a seagrass meadow, or shoot density along the margin, the difference in the descriptor is measured directly rather than inferred relative to the large spatial variance among random plots each monitoring event. The standing spatial variance in cover or shoot density is thus irrelevant, and the only relevant variance is that of the difference within each plot between monitoring events.

The great advantage of the fixed-plot method lies in the fact that if the causes of habitat loss are diffuse, as is expected in an aquatic environment where a decline in water quality affects all plots about equally, then there may be essentially no variance among fixed plots in the loss they experience. In this case, the standard deviation of the difference is just the sum of the measurement errors inherent in the method, which by definition will satisfy the minimum sample size requirement if the same measurements are made in a random plot method. If the margin position is measured to within a few centimeters precision, and the fixed plots are reasonably stable and do not shift more than a few decimeters during the monitoring interval, then the standard deviation might be on the order of a few decimeters, and the minimum detectable mean margin regression for a sample size of 20 fixed plots will be roughly 22 cm. This regression is less than one hundredth the surface area of any long fringing meadow of width 22 meters or more, if the regression occurs along the long dimension of the meadow. Thus, if the causes of Posidonia loss are diffuse and equal throughout such a meadow, then a fixed-plot method can detect one-tenth the loss of a random plot method with one hundredth the sample size.

For example, the figure below shows that if the maximum progression of Posidonia is 0.3 meter (which includes both linear growth of the seagrass plus chance movement of underwater markers), and the maximum allowable regression is five times this maximum progression, then for 20 fixed plots our minimum detectable mean loss is between 2 and 3 times the maximum progression, which comes to 0.6 to 0.9 meter of regression, for a power of 0.95. So for 20 fixed balisage plots that are reasonably stable, we will detect a mean regression of less than one meter at the lower margin of a Posidonia meadow very reliably, in the worst case scenario where the distur- bance is not diffuse. This amounts to roughly five times the detectable regression if the loss disturbance is diffuse in the water column, but is still achieved at most at one hundredth the minimum sampling effort of the random plot method. Another way of arriving at this result is via Eq 6: if l m = 5, then the standard deviation in the difference is 3. In Posidonia oceanica, this s d is equivalent to roughly 70 cm of margin regression, which constitutes one hundredth of the cover of a long fringing meadow of width 70 m (see above). Hence the standard deviation is 0.01 in units of propor- tional cover in such a meadow, while the standard deviation of the mean difference = 0.5 in units of proportional cover; hence the standard deviation is approximately equal to 0.02 times the standard deviation in mean difference, easily satisfying the requirement for minimum sample size. This result clearly demonstrates a 70-fold superiority of the fixed- plot method in units of δ for such a meadow. We conclude that the balisage method of Posidonia monitoring is approximately 70-fold superior to the random-plot method commonly used and currently adopted as the national monitoring protocol for Croatia for Posidonia oceanica.

The main source of statistical error in the remote or indirect methods, such as acoustic sensing of habitat, is the error in classifying the substrate. This error is inherent in all acoustic methods and visual methods where the sensor is above water, because loss of information in the signal creates classification ambiguity.  The two classification errors are false positive and false negative classification of the presence of target habitat. The ratio of observed to true difference in the target habitat is discounted by the sum of the two classification errors, that each error has the same effect, and the estimated difference might actually be the reverse of the actual difference if the classification error is high enough. For example, assume that a seagrass bed is destroyed between monitoring events, with the cover falling from 1 to 0, from a continuous meadow to a habitat of macroalgae on gravel. Also assume an acoustic map with false negative error of 0.2 and false positive error of 0.8 for algae on gravel. Then at time 0 we would estimate a cover of 0.8, and at time 1 we would estimate a cover of 0.8, and conclude there was no change, when in fact the entire meadow was lost. Indeed in this example, we will always observe a cover of 0.8 regardless of the true starting or ending values of cover, because the classification errors are constant.

This raises the question: under what circumstances can these maps be considered reliable monitoring tools; especially, reliable enough to satisfy the 10% criterion? The answer, as discussed presently, is that the classification error and its variance both must be close to zero. Here we consider the possibility that classification error can vary between monitoring events, and influence the uncertainty in map estimates of habitat cover. Certainly there are many reasons why the false positive or negative errors will change between monitoring events: the habitat matrix might change, thereby changing the error rates; the precise conditions under which the acoustic or photo image was taken will be different and therefore with different resulting data even though the habitats might be unchanged; geopositioning error introduces classification error along the margins of continuous ground habitats or everywhere in a patchy matrix. Therefore, an observed loss or gain in any habitat may reflect nothing more than a change in classification errors between monitoring events. For example, consider a site where the true seagrass cover declines from 0.5 to 0.45 between monitoring events (a 10% loss), the false negative error changes from 0.2 to 0.05, and the false positive error changes from 0.05 to 0.2. Then the observed seagrass cover will be 0.43 at the first monitoring event and 0.54 at the second, which represents an observed increase of 26%. This is 2.6 times the actual change and in the opposite direction, and is entirely an artifact of the change in classification error. Variation in classification error clearly represents random noise that can obscure any real change or constancy in habitat cover.

For arbitrary variance in classification errors, we can thus calculate the variance in the observed difference in habitat cover between two maps, the margin of error of this difference, and the minimum detectable proportion difference, with detection defined as the margin of error not overlapping zero. For the indicated standard deviation in classification error, the following figure provides this information, with the margin of error equal to the minimum detectable proportional difference for power = 0.5.

The above analysis clearly demonstrates that the difference in habitat cover between two uncorrected habitat maps is a blind sentinel method unless the standard deviation in classification error is near zero. Such uncorrected maps may, however, be able to detect losses as low as 30–50%, if the methods are careful and meticulous enough to maintain the standard deviation at 0.1 or below.

Random variation in classification error can be quantified with field sampling, and the map can be corrected in theory.  However, this requires a large field sampling effort.  The resulting standard deviation in the corrected habitat loss is plotted against the number of ground truth points per habitat (seagrass and non-seagrass) for the indicated actual classification error in the following figure. For example, if our positive and negative classification error is 0.2, and we estimate these errors by sampling a random 100 ground points in both seagrass and non-seagrass, for a total of 200 points each monitoring event, then the standard deviation of the corrected map difference in seagrass cover is 0.1, and the margin of error would be (1.96)(0.1) = 0.2, and the method can detect a habitat loss of only approximately 40%.


 The folowing figure shows how this field effort translates into a minimum detectable difference in habitat cover per monitoring event.  Very low classification error, lower than observed for most seagrass species in acoustic sensing models, is necessary for detecting losses of about 10% at typical field sampling efforts.

The following figure shows the minimum number of ground truth points per habitat necessary to detect a map difference of 0.1, for the indicated power and classification error. This figure shows that in the range of commonly reported lower limits of Posidonia classification error, e.g. from 0.2 to 0.35 (see section 3.4), from nearly 2000 to over 10000 ground truth points per habitat per monitoring event are necessary to achieve a corrected map difference that is precise enough to reliably detect a 10% loss, and any sampling effort with fewer than 500 to 3000 ground truth points is a blind sentinel method.


These results raise a final question. Because the ground truth study needed to correct the map is the direct observation of the ground at numerous georeferenced points, and therefore constitutes valuable data that can be used for an immediate estimate of habitat cover that has zero classification error, why not use these data to estimate seagrass habitat cover directly, rather than go to the extra effort to create and attempt to correct an error-laden map that pro- vides only an indirect estimate? We answer this question by comparing the minimum ground truth sample size for estima- tion of a 10% map loss to that for estimation of a 10% loss directly by random point sampling. The results presented in the following figure show that, for example, if the classification errors are each equal to 0.2, then the minimum number of ground truth points necessary to detect a 10% loss after map correction is from 1.5 to 15 times the number needed to directly detect a 10% loss, depending on the starting habitat cover.

The following figure shows this relationship from a different perspective, the threshold classification error at which the field labor is equalized: the labor for estimating a 10% loss directly is equal to the labor for estimating the loss from two maps. This we call the “utility threshold” of a map habitat classification for sentinel monitoring. Values above the lines favor the direct method, and below the lines favor the map method. This figure shows how this threshold depends on the starting habitat cover, with the maximum near the maximum binomial variance of p = 1/2.  This figure actually underestimates the power of the indirect method, since it ignores the sampling effort for map training points.  Generally the utility threshold will be half that presented in this figure.

We conclude that blind sentinel methods, i.e. generally incapable of detecting 10% habitat losses at a statistical power > 0.5, include direct random-plots methods of monitoring subtidal, patchy seagrass cover and shoot density, and remote mapping methods with non-trivial levels of classification error, such as sidescan, multibeam, and single-beam sonar. Aerial or satellite imagery may provide methods that satisfy the utility threshold for reliably detecting 10% loss in most seagrasses, but these methods currently are useful for sentinel monitoring only at depths shallower than about four meters. Acoustic methods can detect homogeneous seagrass on homogeneous sand with near 100% accuracy, but these field conditions are rare in Posidonia and where other vegetation or hard structure is mixed with seagrass. Resource management and regulatory agencies should recognize that mapping and monitoring are two independent activities with conflicting goals and methods, and that remote Posidonia maps are generally not capable of dependably detecting less than 30–50% seagrass loss. We found two classes of methods powerful enough to reliably detect a 10% loss of seagrass habitat throughout the natural depth range: direct, fixed- plot methods, and remote underwater videography (RUV), the only remote method with near zero classification error. These should be considered the gold standards for seagrass sentinel monitoring across all substrates and depths. The former include the balisage method used in the Posidonia Monitoring Network, and the SeagrassNet global seagrass monitoring method. The only method we found capable of satisfying the 10% criterion throughout the depth range of seagrasses, without the need for SCUBA or direct observation, and without any habitat alteration, is RUV. Non-destructive methods that can reliably detect 10% loss in seagrasses do exist, and can be relied on to prevent further declines in all species. For these methods to become international standards, however, management and regulatory powers must recognize that rigorous and reliable science is the cornerstone of all management success, and formalize this idea with explicit requirements for minimum precision, power, and detectable losses in the official protocols that create the sentinels that watch over these valuable resources.