Flood Frequency Analysis

Open Flood Frequency Analysis

Estimate design flood magnitudes directly from observed streamflow records using Flood Frequency Analysis (FFA). This guide covers annual maximum and partial-duration series, the seven supported probability distributions, parameter estimation (method of moments, L-moments, MLE, PWM), goodness-of-fit testing, confidence intervals via bootstrap, outlier detection using Grubbs-Beck, and weighted regional skew following USGS Bulletin 17C.

Introduction

Flood frequency analysis (FFA) is a cornerstone of hydrological engineering. It provides a systematic framework for estimating the magnitude of flood events associated with specific return periods (recurrence intervals). These estimates are essential for the design of hydraulic structures such as bridges, culverts, spillways, stormwater systems, dam spillways, and floodplain management infrastructure.

The fundamental approach involves fitting theoretical probability distributions to a series of observed extreme streamflow values. Once a distribution is fitted, its inverse cumulative distribution function (quantile function) can be used to estimate flood magnitudes beyond the observed record, enabling engineers to derive design flows for return periods of 50, 100, or even 10 000 years.

AMS vs PDS

Two sampling strategies are commonly used to extract extremes from a continuous streamflow record:

Annual maximum series (AMS): the single largest peak flow from each water year is retained, giving one value per year. AMS is the standard input to flood frequency procedures worldwide (Bulletin 17C, UK Flood Estimation Handbook, Australian Rainfall and Runoff). Its strength is statistical simplicity — the values are approximately independent and identically distributed.
Partial-duration series (PDS) / peaks-over-threshold (POT): all independent peaks above a chosen threshold are retained, regardless of year. PDS uses more of the information in the record, which can improve estimates at short return periods, but requires careful threshold selection and an independence criterion between peaks.

For return periods beyond about 10 years, AMS and PDS estimates converge. For shorter return periods (1 – 5 years), PDS tends to give slightly higher quantiles because more events per year contribute to the sample.

AEP and ARI terminology

Return period is equivalent to the reciprocal of annual exceedance probability (AEP) for large floods:

T \;=\; \frac{1}{\text{AEP}} \;\approx\; \text{ARI}

Return period, AEP, and ARI

A 1:100 year flood has an AEP of 1% (0.01) and an average recurrence interval (ARI) of about 100 years. Terminology varies by country — AEP is preferred in Australian and modern American practice because it avoids the common misinterpretation that a 1:100 year flood occurs exactly once every 100 years, when in fact it has a 1% chance of being exceeded in any year (and roughly a 63% chance of being exceeded at least once in 100 years).

Probability distributions

The choice of distribution is critical to the accuracy of flood frequency estimates. Different distributions make different assumptions about the shape of the flood frequency curve, particularly in the upper tail where design floods are estimated. The tool supports seven commonly used distributions.

Gumbel (Extreme Value Type I)

The Gumbel distribution is the simplest extreme value distribution, with two parameters (location $\xi$ and scale $\alpha$ ). It assumes a fixed coefficient of skewness of approximately 1.1396, which makes it suitable for regions where flood data exhibit moderate positive skewness. Gumbel is widely used in European and many international standards.

F(x) \;=\; \exp\!\left[-\exp\!\left(-\tfrac{x - \xi}{\alpha}\right)\right]

Gumbel cumulative distribution function

Log-Normal (2-parameter)

The Log-Normal distribution assumes that the natural logarithms of the data follow a normal distribution. It is characterised by two parameters (mean $\mu_y$ and standard deviation $\sigma_y$ of the log-transformed data) and produces a positively skewed distribution in the original space. It is a good default choice for many hydrological variables when the coefficient of skewness in log-space is near zero.

Log-Pearson Type III

The Log-Pearson III (LP3) distribution is the standard distribution recommended by the United States Water Resources Council and documented in USGS Bulletin 17C. It extends the Log-Normal by adding a skewness parameter, providing greater flexibility in fitting the upper tail of flood frequency curves. LP3 fits a Pearson III distribution to the base-10 logarithms of the data.

\log Q_T \;=\; \overline{\log Q} \;+\; K_T(G) \cdot s_{\log Q}

Log-Pearson III quantile (method of moments)

Where $\overline{\log Q}$ is the mean of the log-flows, $s_{\log Q}$ is the standard deviation of the log-flows, and $K_T(G)$ is the frequency factor for return period $T$ and skew coefficient $G$ .

Generalised Extreme Value (GEV)

The GEV distribution unifies all three extreme value types (Gumbel, Frechet, and Weibull) through a shape parameter $\kappa$ . When $\kappa = 0$ , it reduces to the Gumbel. Positive values of $\kappa$ produce heavier upper tails (Frechet-type), while negative values produce bounded upper tails (Weibull-type). GEV is widely recommended by the World Meteorological Organization and is the default in many national guidelines, including UK FEH Volume 3.

F(x) \;=\; \exp\!\left\{-\left[1 - \kappa\left(\tfrac{x-\xi}{\alpha}\right)\right]^{1/\kappa}\right\}, \quad \kappa \neq 0

GEV cumulative distribution function

Pearson Type III

The Pearson III distribution is a three-parameter gamma distribution that can represent a wide range of skewness values. It is fitted to the untransformed data (unlike LP3, which works in log-space). P3 is commonly used in Australian flood frequency analysis (Australian Rainfall and Runoff, 2019).

Generalised Logistic

The Generalised Logistic distribution is recommended for flood frequency analysis in the United Kingdom (Flood Estimation Handbook, Volume 3). It has three parameters and provides flexibility in fitting both the body and tails of the distribution. Its L-moment ratios differ from the GEV, making it a useful alternative for comparison on the same dataset.

Generalised Pareto

The Generalised Pareto distribution is commonly used for peaks-over-threshold (POT) analysis, but can also be applied to annual maxima. It is a two-parameter distribution (scale $\alpha$ and shape $\kappa$ ) with a location threshold $\xi$ . It is particularly useful for modelling the tail behaviour of extreme events when a natural threshold can be identified.

Parameter estimation

Once a distribution is selected, its parameters must be estimated from the observed data. Four estimation methods are supported, each with distinct advantages.

Method of moments (MOM)

Equates theoretical distribution moments (mean, variance, skewness) to sample moments. Simple and intuitive, but can be sensitive to outliers and may produce biased estimates for small samples. MOM is required for Bulletin 17C compliance with LP3.

L-moments (LMOM)

Based on linear combinations of order statistics. L-moments are more robust to outliers than conventional moments and provide nearly unbiased estimates even for small samples. They are strongly recommended by Hosking & Wallis (1997) and are the foundation of regional flood frequency analysis.

Maximum likelihood estimation (MLE)

Finds the parameters that maximise the probability of observing the given data. MLE is asymptotically efficient (optimal for large samples) but may not converge for small samples or certain distribution-data combinations. MLE-fitted parameters are required for the AIC and BIC information criteria used in model selection.

Probability weighted moments (PWM)

Closely related to L-moments, PWM uses expectations of order statistics weighted by probability. PWM estimators are available in closed form for most distributions and share the robustness properties of L-moments.

Method	Strength	Weakness
Method of moments	Simple; Bulletin 17C compliant	Outlier-sensitive; biased for small $n$
L-moments	Robust; unbiased for small $n$	Less efficient than MLE for very long records
MLE	Asymptotically efficient; basis for AIC/BIC	May fail to converge; outlier-sensitive
PWM	Closed-form; robust	Limited distribution support historically

Goodness-of-fit testing

Goodness-of-fit (GoF) tests measure how well a fitted distribution matches the observed data. The tool applies multiple complementary tests and ranking criteria to help identify the best-fit distribution.

Kolmogorov-Smirnov (KS) test

Measures the maximum absolute difference between the empirical cumulative distribution function (ECDF) and the fitted theoretical CDF. It gives equal weight to all parts of the distribution. Less powerful than Anderson-Darling for detecting tail discrepancies, but widely used and easy to interpret.

D \;=\; \max_i \left| F_n(x_i) - F(x_i) \right|

Kolmogorov-Smirnov statistic

Anderson-Darling (AD) test

Measures the discrepancy between the ECDF and the theoretical CDF, with greater weight given to the tails. This makes it particularly suitable for flood frequency analysis, where accurate tail estimation is critical. Lower AD statistics indicate a better fit.

Chi-square test

Groups data into bins and compares observed vs expected frequencies. Results depend on the choice of binning, which can limit its reliability for small samples. Included for completeness and regulatory reporting.

AIC / BIC information criteria

The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) balance goodness-of-fit against model complexity by penalising additional parameters. Lower values indicate a better trade-off. These require MLE-fitted parameters.

\text{AIC} = 2k - 2\ln \hat{L}, \qquad \text{BIC} = k \ln n - 2\ln \hat{L}

AIC and BIC

Where $k$ is the number of free parameters, $n$ is the sample size, and $\hat{L}$ is the maximised likelihood.

Plotting positions

Empirical exceedance probabilities are assigned to ranked observations using a plotting position formula, which determines where each observation appears on a frequency plot. The tool supports the Weibull, Cunnane, and Gringorten formulas. For an observation of rank $i$ in a record of length $n$ :

P_i^{\text{Weibull}} = \frac{i}{n+1}, \qquad P_i^{\text{Cunnane}} = \frac{i - 0.4}{n + 0.2}

Weibull and Cunnane plotting positions

The Weibull formula is unbiased for the mean of the exceedance probability; Cunnane is closer to unbiased for the quantile itself and is recommended for visual comparison with fitted frequency curves.

Confidence intervals

Quantile estimates are subject to sampling uncertainty. The tool computes confidence intervals using the parametric bootstrap: the observed data are resampled with replacement many times (default: 500 iterations), the distribution is re-fitted to each resampled dataset, and quantiles are estimated. The resulting distribution of quantile estimates provides percentile-based confidence bounds.

For example, at a 95% confidence level, the lower bound is the 2.5th percentile and the upper bound is the 97.5th percentile of the bootstrapped quantile estimates. Wider confidence intervals indicate greater uncertainty, which typically increases for longer return periods and shorter data records.

Outlier detection

The tool implements the Grubbs-Beck test for detecting low and high outliers, following the methodology described in USGS Bulletin 17C. Outliers are data points that depart significantly from the trend of the remaining data when plotted on log-probability paper.

High outliers may represent rare catastrophic events that genuinely belong in the record. Retaining them can inflate quantile estimates, but removing them may underestimate risk.

Low outliers — termed Potentially Influential Low Floods (PILFs) in Bulletin 17C — can distort the fitted distribution, particularly the skewness estimate. The Bulletin 17C approach identifies PILFs using the Multiple Grubbs-Beck test and applies a conditional probability adjustment in which the low flows are treated as censored data below a threshold.

Data quality and homogeneity

Before fitting any distribution, the annual maximum series should satisfy the statistical assumptions of independence, identical distribution, and stationarity. The following tests are recommended:

Mann-Kendall trend test — a non-parametric test for a monotonic trend in the series. A significant trend indicates non-stationarity, often linked to land-use change, reservoir regulation, or climate change.
Pettitt change-point test — a non-parametric test for an abrupt shift in the mean. Useful for detecting dam construction, land-use step changes, or changes in measurement method.
Wald-Wolfowitz runs test — tests whether the sequence of values above and below the median is random (an independence test).
Spearman rank correlation — tests for a monotonic trend and is complementary to Mann-Kendall.

If a significant trend or change-point is detected, the series is not stationary and standard FFA is not strictly valid. Options include shortening the record to a stationary sub-period, detrending the data, or applying a non-stationary framework in which the distribution parameters are functions of time or a climate covariate.

Regional skew

Sample skewness estimates from short records are highly uncertain. Bulletin 17C recommends using a weighted average of the station (at-site) skew and a regional skew estimate to reduce this uncertainty:

G_w \;=\; \frac{\mathrm{MSE}_r \cdot G_s + \mathrm{MSE}_s \cdot G_r}{\mathrm{MSE}_r + \mathrm{MSE}_s}

Weighted skew (Bulletin 17C)

Where $G_s$ is the station skew, $G_r$ is the regional skew, and $\mathrm{MSE}_r$ , $\mathrm{MSE}_s$ are their respective mean square errors. The tool allows you to enable regional skew weighting and specify the regional skew coefficient and its MSE. In the United States, regional skew maps are published in USGS Scientific Investigations Reports; for South Africa, a regional skew surface can be derived from the national DWS network via L-moment regionalisation.

Worked example

Consider the following annual maximum flows (m³/s) recorded at a gauging station over 20 years:

45.2, 67.8, 123.4, 89.1, 56.3, 201.5, 78.9, 95.6, 110.3, 54.7,
142.8, 63.4, 87.2, 175.3, 92.1, 58.9, 134.6, 71.5, 105.8, 82.4

Step 1 — Data entry. Paste the values into the Data Input panel. The parser accepts comma-separated, space-separated, or one-value-per-line formats.

Step 2 — Quality control. Run the Mann-Kendall and Pettitt tests to confirm stationarity. Apply the Grubbs-Beck test to flag any potential low or high outliers and review them against historical information.

Step 3 — Configure analysis. Select the distributions to fit (e.g. all seven), choose L-Moments as the estimation method, and set the return periods of interest (typically 1:2, 1:5, 1:10, 1:20, 1:50, 1:100, 1:200).

Step 4 — Run analysis. The tool computes sample statistics (mean = 96.8 m³/s, $n$ = 20), fits all selected distributions, and ranks them by composite goodness-of-fit.

Step 5 — Review results. Examine the frequency curves to verify that the fitted distributions align with the plotting positions (Cunnane recommended). Check the GoF table for the best-fit ranking. Review bootstrap confidence intervals to assess uncertainty — a 95% CI of [230, 380] m³/s around a point estimate of 285 m³/s at 1:100 years indicates substantial extrapolation uncertainty.

Step 6 — Extract design values. Read the quantile table for your design return period. If different distributions produce substantially different 1:100 year estimates, report the range and use engineering judgement.

Limitations

FFA is a powerful technique but rests on strong statistical assumptions that are rarely fully satisfied in practice:

Stationarity — the process generating floods is assumed not to change over time. Climate change, land-use change, and reservoir operations all violate this.
Independence — successive annual maxima are assumed statistically independent. Persistent multi-year wet or dry phases (ENSO, PDO) can induce autocorrelation.
Sample size — reliable estimates at 1:100 years require at least 25 – 30 years of record; 1:1000 year extrapolations from 50-year records are speculative.
Rating-curve uncertainty — observed peaks are derived from stage via a rating curve that is often extrapolated beyond the measured range for the largest floods.

Finding gauge data

Use the Stream Gauge Finder to locate DWS (or USGS) stations near your project site, download the annual peak flow series directly, and feed it into this tool. Complement with Daily Rainfall Data when catchment-averaged precipitation is needed as a covariate for non-stationary analysis.

References

England, J.F., Cohn, T.A., Faber, B.A., et al. (2019). Guidelines for Determining Flood Flow Frequency — Bulletin 17C. USGS Techniques and Methods, Book 4, Chapter B5.
Hosking, J.R.M. & Wallis, J.R. (1997). Regional Frequency Analysis: An Approach Based on L-Moments. Cambridge University Press.
Stedinger, J.R., Vogel, R.M. & Foufoula-Georgiou, E. (1993). Frequency analysis of extreme events. Chapter 18 in Handbook of Hydrology (D.R. Maidment, ed.), McGraw-Hill.
Institute of Hydrology. (1999). Flood Estimation Handbook, Volume 3: Statistical procedures for flood frequency estimation. Centre for Ecology & Hydrology, Wallingford, UK.
Ball, J., Babister, M., Nathan, R., et al. (2019). Australian Rainfall and Runoff: A Guide to Flood Estimation. Commonwealth of Australia (Geoscience Australia).
Grubbs, F.E. & Beck, G. (1972). Extension of sample sizes and percentage points for significance tests of outlying observations. Technometrics, 14(4), 847 – 854.

Open Flood Frequency Analysis