Keynotes
Black-box inference methods for intractable models
Authors: Amanda Lenzi
Abstract: Recent advancements in optimization libraries and graphics processing units have enabled deep neural networks to efficiently represent massive datasets of arbitrary complexity. These improvements have motivated a new area of research in statistical inference, focusing on fast and tractable methods, particularly when traditional estimation methods fall short. In this talk, I will present new developments in black-box procedures for point estimation and full posterior distribution approximation. By inputting observed data into a deep neural network trained on simulated data from the intractable model, we obtain efficient and robust point estimates and posterior distributions of model parameters. Our new methods demonstrate computational efficiency and theoretically justified uncertainty quantification, addressing gaps in current simulation-based deep neural network approaches. These benefits are illustrated through an application to spatial extremes.
Modeling the local impact of summer heat on mortality
Authors: Christel Faes
Abstract: Amidst the escalation of global warming and the increasing frequency of extreme weather events, the impact of temperature on human health, particularly on cardiovascular, respiratory and all-cause mortality has been a critical concern. Within Flanders, the health impact is spatially highly variable, due to local heat islands. In addition, recent studies highlight the lagged effects of temperature on mortality, showing that the impact of temperature on health is not always immediate. A distributed lag non-linear model (DLNM) is used to describe the possibly non-linear and lagged relationship. These results are used to develop a heat impact tool at a small geographical level of use to health managers. Despite temperature’s substantial fluctuations throughout the day, studies often rely on a single summarizing measure of daily temperature. We propose an extension of the traditional DLNM allowing us to describe the joint delayed effect of both the temperatures during the day and at night. We derived a Laplace approximation to fasten computation.
Combining school-catchment area models with geostatistical models for analysing school survey data from low-resource settings: Inferential benefits and limitations
Authors: Emanuele Giorgi
Abstract: School-based sampling has been used to inform targeted responses for malaria and neglected tropical diseases. Standard geostatistical methods for mapping disease prevalence use the school location to model spatial correlation, which is questionable since exposure to the disease is more likely to occur in the residential location. In this paper, we propose to overcome the limitations of standard geostatistical methods by introducing a modelling framework that accounts for the uncertainty in the location of the residence of the students. By using cost distance and cost allocation models to define spatial accessibility and in absence of any information on the travel mode of students to school, we consider three school catchment area models that assume walking only, walking and bicycling and, walking and motorized transport. We illustrate the use of this approach using two case studies of malaria in Kenya and compare it with the standard approach that uses the school locations to build geostatistical models. We argue that the proposed modelling framework presents several inferential benefits, such as the ability to combine data from multiple surveys some of which may also record the residence location, and to deal with ecological bias when estimating the effects of malaria risk factors. However, our results show that invalid assumptions on the modes of travel to school can worsen the predictive performance of geostatistical models. Future research in this area should focus on collecting information on the modes of transportation to school which can then be used to better parametrize the catchment area models.
Spatial statistics and spatial modelling - current and future perspectives and challenges
Authors: Janine Illian
Abstract: Over the last few decades statistics as a discipline has changed drastically. More and more, and more detailed, data can be collected with rapidly developing technology, and these data are being analysed and interpreted to inform decisions, while the importance of statistics and data analytics to society is being acknowledged more widely. At the same time, increasingly complex analysis tools are being developed, and are now available for use by non-statisticians through free software packages. This talk will discuss some of the implications, challenges and perspectives resulting from these developments, in particular with the field of spatial statistical modelling in mind.
Specifically, we will discuss the need for providing methodology that is relevant and accessible to applied users, yet generic enough to be of use across several disciplines outside statistics. Further, we will consider the need for taking the practical considerations of model fitting and interpretation into account, beyond providing methodology and associated software, asking if we are sufficiently aware of the obstacles users will come across when applying spatial statistical methodology. Similarly, as developers of complex statistical methodology we have a responsibility to support the adequate use of the methodology by scientists, who are quantitatively trained, yet non-specialist users of statistical methodology. This talk will discuss the challenge of making the methodology truly accessible.
Monitoring Earth’s atmospheric carbon dioxide from space
Authors: Noel Cressie
Abstract: Carbon dioxide (CO2) is a leading greenhouse gas (GHG) and a principal driver of climate change. Measuring, mapping, and monitoring it globally and regionally is key to understanding its dynamically varying distribution. Earth’s temperature is rising to critical levels, largely a consequence of CO2 concentrations rising a decade or more before. Currently, atmospheric CO2 concentration is around 420 ppm, a level not seen since the middle Pliocene (approximately 3.6 million years ago). The last decade of accumulated CO2 has locked in future global heating, and what was chronic has become critical. There is now a realization by almost all sectors of society that future climate at global and regional scales will be hotter and more extreme, caused by GHGs, particularly CO2. Global averages or sparse in situ observations of CO2 do not give the spatial information needed for mitigation. However, since 2014 a NASA satellite (Orbiting Carbon Observatory-2 or OCO-2) has been sensing spatio-temporal CO2 concentrations over much of the globe. This article uses exploratory spatio-temporal data analysis to analyse the more-than-seven years of data from the NASA satellite. Various temporal and spatial filters are used to create critical summary statistics to monitor atmospheric CO2. This work is joint with Dr Yi Cao, University of Sydney.
Towards Integrated Spatial Health Surveillance
Authors: Peter J Diggle, Marta Blangiardo and Guangquan Li
Abstract: Statistical methods for near-real-time spatial analysis of disease incidence/prevalence data have been available for at least 20 years but, to the authors’ knowledge, have not yet been incorporated into routine public health surveillance. One possible reason is that even in wealthy countries, the pro-active collection of comprehensive, spatiotemporally resolved data on incident cases of a particular disease can be prohibitively expensive. The ongoing COVID-19 epidemic has led to growing interest, in the UK and elsewhere, in the use of affordable alternatives to traditional epidemiological study designs and disease metrics, including mobile symptom-reporting apps (Fry et al, 2020) and biochemical analysis of wastewater samples (Cianella et al, 2023; Li et al, 2024).
A second problem that became obvious during the COVID-19 epidemic is that even answering a superficially simple question like “how many new cases have we had this week?” can yield multiple answers depending on different case-definitions. Analogous problems arise in low-income country settings, where surveys of so-called Neglected Tropical Diseases (Feasey et al, 2010) have typically been carried out for a single disease, with little or no consideration of the efficiency gains that could be obtained from combining surveys of multiple diseases with overlapping spatial distributions. The statistical challenge in these cases is to exploit the potential efficiency gains from integrating multiple data-sources whilst identifying and adjusting for their potential biases. In this talk, I will describe several approaches that have been used to address this challenge, with a particular focus on the potential for combining high-volume, low-cost proxies for disease incidence/prevalence with relatively small-scale gold-standard randomised incidence/prevalence surveys (Elliott et al, 2021).
References
Ciannella, S., González-Fernández, C., & Gomez-Pastora, J. (2023). Recent progress on wastewater based epidemiology for COVID-19 surveillance: A systematic review of analytical procedures and epidemiological modeling. The Science of the Total Environment, 878, 162953.
Elliott, P.; Haw, D., Wang, H., Eales, O., Walters, C.E., Ainslie, K.E.C., Atchison, C., Fronterre, C., Diggle, P.J., Page, A.J., Trotter, A.J., Rosolek, S.J., Ashby, D., Donnelly, C.A., Barclay, W., Taylor, G., Cooke, G., Ward, H., Darzi, A. and Riley, S. (2021). Exponential growth, high prevalence of SARS-CoV-2, and vaccine effectiveness associated with the Delta variant. Science, 374, p.eabl9551
Feasey, N., Wansbrough-Jones, M., Mabey, D.C.W. and Solomon, A.W. (2010). Neglected tropical diseases. British Medical Bulletin, 93, 179-200.
Fry, R.J., Hollinghurst, J., Stagg, H.R., Thompson, D.A., Fronterre, C., Orton, C., Lyons, R.A. , Ford, D.V., Sheikh, A. and Diggle, P.J. (2021). Real-time spatial health surveillance: mapping the UK COVID-19 epidemic. International Journal of Medical Informatics, 149, Article 104400.
Li, G., Diggle, P. and Blangiardo, M. (2024). Integrating wastewater and randomised prevalence survey data for national COVID surveillance. Scientific Reports, 14, Article 5124.
Invited
Approaches for continuous domain modeling of areal counts
Authors: Elias Krainski
Abstract: Areal counts are the king of data that usually results from randomly located events grouped by small areal units. There are few attempts to use continuous domain models for this kind of aggregated data. Some simplifications of the modeling problem allow faster computations whereas some use full Bayesian analysis with a data augmentation strategy and specialized MCMC methods. Yet, this problem can also be taken iterating a first-order Taylor expansion to another analytical approximation to avoid MCMC methods. This work details these approaches pointing to the differences and possible implications for practical applications.
Bayesian Kriging Approaches for Spatial Functional Data
Authors: Heesang Lee, Dagun Oh, Sunhwa Choi, and Jaewoo Park
Abstract: Functional kriging approaches have been developed to predict the curves at unobserved spatial locations. However, most existing approaches are based on variogram fittings rather than constructing hierarchical statistical models. Therefore, it is challenging to analyze the relationships between functional variables, and uncertainty quantification of the model is not trivial. In this manuscript, we propose a Bayesian framework for spatial function-on-function regression. However, inference for the proposed model has computational and inferential challenges because the model needs to account for within and between-curve dependencies. Furthermore, high-dimensional and spatially correlated parameters can lead to the slow mixing of Markov chain Monte Carlo algorithms. To address these issues, we first utilize a basis transformation approach to simplify the covariance and apply projection methods for dimension reduction. We also develop a simultaneous band score for the proposed model to detect the significant region in the regression function. We apply the methods to simulated and real datasets, including data on particulate matter in Japan and mobility data in South Korea. The proposed method is computationally efficient and provides accurate estimations and predictions
Spatio-temporal Gaussian processes on metric graphs
Authors: Jonas Wallin
Abstract: Recent advancements in data modeling on networks and graphs have sparked considerable interest across statistics, machine learning, and signal processing. Common applications in statistics include modeling traffic accidents on road networks and environmental factors such as temperature or pollutants on river networks. In these scenarios, models are applied to both the edges and vertices of graphs, with edges representing roads or river segments. A fundemental aspect of these models is the measurement of distances along the network, rather than using traditional Euclidean distances. Our recent research has developed Gaussian processes tailored to such network structures, employing WhittleMatérn fields on edges and joining them with Kirchoff boundary conditions to form a cohesive graph. In this presentation, I will discuss the extension of these models to include temporal dimensions by incorporating spatial advection-driven WhittleMatérn fields. This model assigns a specific direction to each edge of the graph, which governs the directional flow of the process.
Two-sample tests for point processes on linear networks
Authors: Maribel Borrajo Garcia
Abstract: Data sets representing the spatial location of a series of observations appear in a wide variety of scenarios, for example, trees in a forest, earthquakes in a region or traffic accidents on road networks. The former are examples of spatial point processes which lay in a two- or three-dimensional Euclidean space, whereas the latter is an example of point patterns which are constrained to a one-dimensional subset in a Euclidean plane. These types of patterns are said to lay in a linear network.
Population comparison is a widely studied problem in Statistics consisting in determining whether two (or more) samples are generated by the same stochastic process. This type of problem also arises when dealing with point processes, for example, the distribution of two species of flora in a forest, outbreaks of natural or caused forest fires, car-car and car-motorcycle collisions on a road network
In this talk, we focus on the two-sample problem for point processes on linear networks, proposing two specific testing methods based on a Kolmogorov-Smirnov and a Cramer-von-Mises type test statistics. A thorough simulation study is conducted to detail the finite sample performance of our proposals, which are also applied to traffic collisions in Rio de Janeiro (Brazil).
References
González-Pérez, I., Borrajo, M. I. and González-Manteiga, W. (Under review). Nonparametric testing of first-order structure in point processes on linear networks.
Bayesian Models of Environmental and Climate Change Impact on Dengue Dynamics in Brazil
Authors: Monica Pirani, Patricia Marques Moralejo Bermudi, Man Ho Suen, Camila Lorenz, Marta Blangiardo, Francisco Chiaravalloti Neto
Abstract: In this talk we delve into spatiotemporal dynamics of dengue incidence in Brazil, utilizing hierarchically formulated Bayesian models to understand and forecast the influence of environmental and climate changes.
Our investigation begins with evaluating the impact of the large-scale climatic phenomenon of El Niño Southern Oscillation (ENSO) along with weather fluctuations on Aedes aegypti mosquito populations, the primary vectors of dengue. Using municipality data from 2008 to 2018 across São Paulo State, aggregated on seasonal periods, we employed a spatiotemporal model to analyse the effects of ENSO, local temperature, and rainfall on mosquito infestation levels. Results indicate that El Niño events significantly increase the Ae. aegypti larval index, with positive containers expected to rise by 1.30 units during moderate to strong El Niño phases.
Expanding our scope, we conducted a joint analysis of dengue and Zika infections, both members of the Flaviviridae family and sharing the same vector, from 2015 to 2019. By integrating satellite-derived vegetation data, climate re-analysis, and socioeconomic information, we developed a multi-likelihood Bayesian model to assess the joint impact of environmental and climatic factors on disease incidence. This spatiotemporal analysis revealed significant geographic and socioeconomic disparities in disease risk, with distinct temporal patterns across different regions.
Finally, we present the results from a study that forecasts microregion-level disease cases to 2060, considering different gas emission scenarios. Our Bayesian models predict an increase in dengue incidence in regions previously less affected, particularly in the southern areas, due to climate shifts creating more favourable conditions for mosquito proliferation. Conversely, we found that some regions may experience stable or decreased cases as temperatures exceed thresholds viable for dengue transmission. These projections highlight the profound influence of climate change on dengue spread, emphasizing the urgency for adaptive public health responses.
Challenges and Opportunities in Self-Exciting Spatio-Temporal modelling
Authors: Nicholas Clark
Abstract: In criminology, the pervasive theory of repeat victimization posits that locations previously affected by crime or violence are more susceptible to future incidents. Statistical modeling sometimes tackles this phenomenon through the application of Hawkes processes or self-exciting spatio-temporal models. However, this talk will shed light on the limitations inherent in standard self-exciting models, emphasizing that improperly structured spatio-temporal frameworks can yield inaccurate insights into the root causes of criminal activities or the spread of violence. We will further address issues in parameter estimation and model identifiability and show how machine learning methods may aid in disentangling self-exciting processes from other common spatio-temporal models.
Bayesian modelling for the integration of spatially misaligned health and environmental data
Authors: Paula Moraga
Abstract: Spatially misaligned data is increasingly common primarily due to advancements in data collection and management. In this talk, I will present a flexible and fast Bayesian modelling framework for the combination of data available at different spatial resolutions and from various sources. Inference is performed using INLA and SPDE, which provides a fast approach to fit latent Gaussian models. The approach is flexible and can be applied in preferential sampling and spatio-temporal settings. The Bayesian modelling approach is demonstrated in a range of health and environmental settings. Specifically, a spatial model is developed to combine point and areal malaria prevalence data, to integrate air pollution data from different sources, and to detect disease clusters. The approach presented provides a useful tool in a wide range of situations where information at different spatial scales needs to be combined, and provides valuable insights for decision-making in health and environmental fields.
A Bayesian Change-Point Analysis of Vector Autoregressive Processes
Authors: Stefano Peluso
Abstract: Vector Autoregressive Processes (VAR) model contemporaneous and lagged dependences in multivariate time series. Data heterogeneities call for abrupt change-points in model parameters, but regimes are often unknown and need to be estimated. We propose a Bayesian method for inferring change-points in VAR models. We show how the posterior distribution of the change-point location asymptotically concentrates on the true change-point, when a conjugate prior is assigned to the regime parameters. We extend this result to non-conjugate priors and we discuss the intricacies inherent when multiple change-points are present. Simulated studies confirm the ability of recovering the regimes, and an MCMC posterior sampler based on a latent process characterization of the change-point process is applied on macroeconomic US data and cardiac illnesses in the Swiss Canton Ticino.
Gaussian process models for pollution in rivers
Authors: Theresa Smith
Abstract: The impact of human activity on the quality of surface waters, including rivers, has recently garnered considerable attention in the media. Statistical models to characterise the spatio-temporal distribution of biological and chemical indicators in a river network must accommodate several features not seen in typical spatial modelling applications including censoring of measured concentrations and complicated representations of distance. In this talk, I will present a Bayesian approach that addresses these two challenges within the framework of the Spatial Stream Network model of Ver Hoef et al. (2006). This work is in collaboration with Stellenbosch University.
Contributed
A Bayesian Multisource Fusion Model for Spatiotemporal PM2.5 and NO2 Concentrations: For Exposure Health Assessment in an Urban Setting
Authors: Abi Riley, Marta Blangiardo, James Kirkbride, Fred Piel, Monica Pirani
Abstract: Epidemiological studies on the health effects of air pollution benefit from accurate and individualised exposure estimation, requiring good quality modelled air pollution concentrations. This study develops a Bayesian hierarchical spatiotemporal model for monthly PM2.5 and NO2 concentrations for Greater London for 2010-2019, at a 1km spatial resolution. Computationally, it relies on the Integrated Nested Laplacian Approach (INLA) framework, implementing a hierarchical Bayesian spatiotemporal model, coupled with a Stochastic Partial Differential Equation (SPDE) process term. This method assimilates multiple sources of air pollution data, including ground-monitored data, numerical and chemical transport model outputs, and proxy satellite-derived measurements, such as aerosol optical depth. We additionally use other fixed-effect covariates, such as temperature, humidity, and population density. Building on the baseline model, which consists of a separable space-time formulation, we consider additional modelling choices. We first consider changes to temporal terms, including within the spatiotemporal SPDE additional additive time terms, to better capture the residual temporal fluctuations of the data. Then, we extend the flexibility of the model to assuming spatially-varying coefficients, allowing variation in covariate effects across space. We found that allowing the effects of humidity and temperature to vary spatially is beneficial for the Greater London area, by potentially accounting for the urban heat effect. 1 These model outputs will be used to study the effects of air pollution exposure on children’s mental health. Using a two-stage Bayesian approach, we will propagate uncertainty from the air pollution model (with estimates at children’s residence and school) into the mental health model.
Advanced spatio-temporal modelling for malaria incidence forecasting and outbreak detection in Mozambique.
Authors: Alejandro Rozo Posada, Emanuele Giorgi, Christel Faes, James Colborn, Thomas Neyens
Abstract: Mozambique bears a significant burden of malaria, with the disease ranking as the country’s fifth leading cause of death in 2021 while accounting for 3.8% of global malaria-related mortality. In the last twenty years, there have been global efforts to end malaria faster. The Mozambique National Malaria Control program has put efforts into surveillance, monitoring, and evaluation activities in which health specialists and policymakers make use of spatio-temporal malaria incidence predictions for decision and policy-making. However, many current approaches to predicting malaria incidence assume that the geographical correlation is similar throughout the study region. This is suboptimal for malaria in Mozambique since infection trends are strongly related to environmental and climatic conditions. These conditions vary considerably throughout Mozambique, suggesting that a modelling approach should allow region-specific variation in spatio-temporal processes.
We propose a novel multivariate time series model tailored to accommodate the spatio-temporal dynamics of malaria in Mozambique. The novelty lies in using district-specific Gaussian processes to model temporal trends in residual information that are connected to each other by imposing a spatial correlation on the parameters of the Gaussian processes. Estimation is performed using a Bayesian hierarchical approach with Markov Chain Monte Carlo methods, including the Metropolis-Hastings algorithm, the Gibbs Sampler and the Metropolis-Adjusted Langevine Algorithm. Using monthly district-specific data from 2017 to 2021 from Mozambique, we illustrate the potential uses of the proposed modelling approach and its estimation and computational challenges.
Generative multi-fidelity modeling and downscaling via spatial autoregressive Gaussian processes
Authors: Alejandro Calle-Saldarriaga, Paul Wiemann, Matthias Katzfuss
Abstract: Computer models are often run at different fidelities or resolutions due to trade-offs between computational cost and accuracy. For example, global circulation models can simulate climate on a global scale, but they are too expensive to be run at a fine spatial resolution. Hence, regional climate models (RCMs) forced by GCM output are used to simulate fine-scale climate behavior in regions of interest. We propose a highly scalable generative approach for learning high-fidelity or high-resolution spatial distributions conditional on low-fidelity fields from training data consisting of both high- and low-fidelity output. Our method learns the relevant high-dimensional conditional distribution from a small number of training samples via spatial autoregressive Gaussian processes with suitably chosen regularization-inducing priors. We demonstrate our method on simulated examples and for emulating the RCM distribution corresponding to GCM forcing using past data, which is then applied to future GCM forecasts.
Navigating Challenges in Spatio-Temporal Modelling of Antarctic Krill Abundance: Addressing Zero-Inflated Data and Misaligned Covariates
Authors: André Victor Ribeiro Amaral, Sophie Fielding, Emma Cavan, Adam M. Sykulski
Abstract: Antarctic krill are among the most abundant species on our planet and serve as a vital food source for many marine predators in the Southern Ocean. In this work, we utilize statistical spatio-temporal methods to aggregate data from various sources and resolutions, aiming to accurately model krill abundance. Our focus lies in fitting the model to a novel dataset comprising acoustic in situ data of krill swarms. To achieve this, we integrate climate covariates obtained from satellite imagery with information gathered by floating buoys (also known as drifters), such as water surface temperature and velocities (eastward and northward). Additionally, we incorporate sparsely collected krill abundance data obtained from net fishing efforts (KRILLBASE) into our modelling scheme. The KRILLBASE data is meant to be used for model validation and extrapolation when predicting krill abundance in areas where we lack acoustic data. However, these datasets present significant modelling challenges, including spatio-temporal misalignment, incomplete covariate information, and inflated zeros in the observed krill abundance. To address these challenges, we employ a hurdle model, which utilizes two likelihoods to jointly model the occurrence of zeros and the number of krill per areaall while accounting for incomplete covariates and misaligned observations. Therefore, our work presents a comprehensive framework for analysing and predicting krill abundance in the Southern Ocean, leveraging information from various sources and formats. This is crucial due to the impact of krill fishing, as understanding their distribution is essential for informed management decisions and fishing regulations aimed at protecting the species.
Spatio-temporal data fusion of threshold exceedances
Authors: Daniela Castro-Camilo, M. Daniela Cuba, Craig J. Wilkie, Marian Scott
Abstract: Air pollution poses a significant risk to public health. Heavy and extremely heavy episodes of high particle matter pollution (PM) are linked to increased hospitalisations due to the exacerbation of cardiovascular and respiratory conditions. Mitigating the effects of PM pollution is a priority for national and regional authorities; however, air quality management requires PM concentration data at high spatial and temporal resolutions. While high-quality monitoring networks are available in the UK, they can have low spatial and temporal coverage. Alternative data are available from remote-sensing or reanalysis sources but are less reliable than in-situ monitoring stations. Data fusion has been proposed to combine desirable properties of different data sources, but applications to air pollution have widely focused on the mean concentrations, effectively smoothing the data and underestimating episodes of extreme pollution and the risk posed by them. We propose a bespoke modelling approach within a Bayesian hierarchical structure that enables the fusion of threshold exceedance data measured over different spatiotemporal supports. Our model treats data at each location as observations of smooth functions over space and time, which allows us to predict at any space-time location. We demonstrate the reliability of our approach through a simulation study and apply the model to produce a spatio-temporal interpolation of in-situ PM2.5 data in the UK.
Estimating velocities of infectious disease spread through spatio-temporal log-Gaussian point process models
Authors: Fernando Rodriguez, Paula Moraga, Jorge Mateu.
Abstract: Understanding of the spread of infectious diseases such as COVID-19 is crucial for informed decision-making and resource allocation. A critical component of disease behavior is the velocity with which disease spreads, defined as the rate of change for each location and time. In this paper, we propose a spatio-temporal modeling approach to determine the velocities of infectious disease spread. Our approach assumes that the locations and times of people infected can be considered as a spatio-temporal point pattern that arises as a realization of a spatio-temporal log-Gaussian Cox process. The intensity of this process is estimated using fast Bayesian inference by employing the integrated nested Laplace approximation (INLA) and the Stochastic Partial Differential Equations (SPDE) approaches. Velocities are then computed by using finite differences that approximate the derivatives of the intensity function. Finally, the directions and magnitudes of the velocities can be mapped at specific times to better examine disease spread across the region. We demonstrate our method by analyzing COVID-19 spread in Cali, Colombia, during the 2020-2021 pandemic.
Exploring Random Fractal Models for Analyzing Second-Order Properties of Point Processes on Linear Networks
Authors: Francisco J. Rodríguez-Cortés, Juan F. Díaz-Sepúlveda, Ramón Giraldo
Abstract: The statistical modeling of second-order characteristics in point processes often commences with testing the hypothesis of spatial randomness. We address the challenge of assessing complete randomness within the geometric framework of point processes on linear networks, where the conventional properties of a point process undergo alterations and data visualization becomes less intuitive. In the case of planar scenarios, traditional goodness-of-fit tests rely on quadrat counts and distance-based methods. As an alternative approach, we propose a novel statistical test of randomness based on the fractal dimension, which is calculated using the box-counting method. This provides a robust inferential perspective, offering a departure from the more commonly descriptive application of this method. Additionally, it enables the discrimination between clustered and inhibitory behaviors of point patterns. We assess the performance of our methodology through a simulation study and the analysis of a real dataset. The results bolster the efficacy of our approach, presenting it as a viable alternative to the computationally more demanding classical distance-based strategies.
Sequential Thinning subsampling method for learning Large-Size Latent-Marked Point Processes
Authors: François d’Alayer de Costemore d’Arc, Edith Gabriel, Samuel Soubeyrand
Abstract: In numerous fields, understanding the characteristics of a large-size marked point process is crucial, particularly when the marks are latent and can only be inferred through sampling. In this aim, we need to gather an efficient sample which is a thinning of the original point process. To address this challenge, we introduce a sequential thinning approach tailored for diverse goals such as estimating the mark distribution. For each objective, a loss function is defined on the space of thinnings of the complete process. Since this loss relies on unknown elements like the marks or the complete process, it remains unknown itself. We minimize this loss by sequentially minimizing an estimate of it. This involves coupling two distinct sequential thinning processes: initially employing a uniform sampling for error estimation, followed by a non-uniform sampling aimed at minimizing this estimated error. We illustrate our approach through a case study in plant disease surveillance. Here, the point process denotes the spatial locations of plants, with the binary mark indicating infection status. Our method facilitates the estimation of spatial mark distribution and the delineation of infected and non-infected regions.
Magnitude-weighted likelihood scores and residuals for earthquake forecasts
Authors: Frederic Schoenberg
Abstract: An assortment of different goodness-of-fit tests and residual methods have been proposed and implemented in prospective earthquake forecasting experiments such as the Collaboratory for the Study of Earthquake Predictability (CSEP). Unfortunately, current methods, such as tests based on the log-likelihood of the model, essentially reward models equivalently for accurately forecasting small earthquakes or large earthquakes. As a result, models such as the Epidemic-Type Aftershock Sequence (ETAS) models seem frequently to offer the best fit among the proposed models for earthquake occurrences. However, ETAS has very little value for forecasting the largest events, which are of primary interest in practice. The current paper explores alternative measures, such as variants of the Brier score or information gain restricted to or weighted toward the subset of earthquakes of most concern. We also explore graphical residual methods, such as variants of Voronoi or superthinned residuals, tailored specifically toward rewarding models for accurately forecasting the largest events.
Integrating Spatial Modeling and Machine Learning for Plant Health Surveillance
Authors: Edith Gabriel, Camille Portes
Abstract: Understanding the health of plants in a specific area is crucial for confirming the absence of regulated harmful organisms. Official monitoring procedures typically involve visual inspections, sampling, and subsequent sample analysis. If harmful organisms are detected, health authorities collaborate on implementing collective control measures to prevent their establishment in the area, thereby safeguarding the surrounding territory. Risk-based surveillance is now a well-established paradigm in epidemiology, involving the strategic allocation of sampling efforts across time, space, and populations, considering various risk factors. To assess and map the risk of the presence of the bacterium Xylella fastidiosa, we combine spatial modeling and machine learning. As a first approach, we consider selecting factors from an ensemble method and fit a Bayesian spatial model for prediction using the INLA and SPDE approaches. As a second approach, we propose an adapted XGBoost model that integrates spatial components, both by considering spatially weighted factors and by basing model selection on block environmental cross-validation. The different approaches are compared using observations of Xylella fastidiosa in the French Mediterranean Basin, both in situations of interpolation and extrapolation.
Nonparametric estimation of the variogram and the effective sample size
Authors: Jonathan Acosta
Abstract: This work introduces a novel approach for the nonparametric estimation of the variogram and the Effective Sample Size (ESS) in the context of spatial statistics. The variogram is crucial in modeling intrinsically stationary random fields, particularly in spatial prediction using kriging equations. Existing nonparametric variogram estimators often lack the guarantee of producing a conditionally negative definite function. To address this, we propose a new valid variogram estimator based on a linear combination of functions within a specified class, ensuring the satisfaction of critical properties. A penalty parameter that avoids overfitting is incorporated, thus eliminating spurious fluctuations in the estimated variogram function.
Additionally, we extend the notion of the ESS, a crucial measure in spatial regression processes, to a nonparametric setting. The proposed nonparametric ESS relies on the reciprocal of the average correlation and is estimated using a plug-in approach. The proposed estimators’ theoretical properties and consistency are discussed, and numerical experiments demonstrate their performance.
This work not only enhances the robustness and flexibility of spatial statistical analyses through nonparametric methods but also extends the discussion to the application of these methods in spatiotemporal data, in both the nonparametric estimation of the variogram and the ESS, respectively, thereby broadening the potential impact of our research.
Statistical inference for random T-tessellations models: application to agricultural landscape modeling
Authors: Katarzyna Adamczyk-Chauvat, Mouna Kassa, Julien Papaïx, Kiên Kiêu, Radu S. Stoica
Abstract: The Gibbsian T-tessellation models allow the representation of a wide range of spatial patterns. In this talk we present statistical tools for these models and illustrate their application to the comparison of three agricultural landscapes in France. Model parameters are estimated via Monte Carlo Maximum Likelihood based on an adapted Metropolis-Hastings-Green dynamics. In order to reduce the computational costs, a pseudolikelihood estimate is used for the initialization of the likelihood optimization. Model assessment is based on global envelope tests applied to the set of functional statistics of tessellation.
A flexible space-time model for extreme rainfall data
Authors: Lorenzo Dell’Oro, Carlo Gaetan
Abstract: Extreme value analysis is critical for understanding rare and extreme events, whose study is of significant interest in various fields, notably in environmental sciences. The most common modeling approach for spatio-temporal extremes relies on asymptotic models, such as max-stable and r-Pareto processes, respectively for block maxima and peaks over high thresholds. However, the lack of flexibility that characterizes the dependence structure of these models makes them unable to capture the empirical pattern of extreme events becoming more localized in space and time as their severity increases. To address this limitation, some authors have recently focused on models capable of flexibly describing the “sub-asymptotic” dependence of data, many of which utilize a random scale construction.
This study presents an extension of the spatial random scale mixture model proposed by Huser and Wadsworth (2019, JASA) to the spatio-temporal domain, providing a comprehensive framework for characterizing the dependence structure of extreme events across both dimensions. Indeed, the model is able, through parametric inference, to discriminate between asymptotic dependence and independence, concurrently both in space and time.
Due to the high complexity of the likelihood function for the proposed model, parameter estimation relies on a simulation approach based on neural networks, which leverages summaries of the sub-asymptotic dependence present in the data.
The effectiveness of the model in assessing the limiting dependence structure of spatio-temporal processes is shown both via simulation studies and through an application to rainfall datasets.
Semi-parametric spatio-temporal Hawkes process for modelling road accidents in Rome
Authors: Marco Mingione, Pierfrancesco Alaimo Di Loro, Paolo Fantozzi
Abstract: We propose a semi-parametric spatio-temporal Hawkes process with periodic components to model the occurrence of car accidents in a given spatio-temporal window. The overall intensity is split into the sum of a background component capturing the spatio-temporal varying intensity and an excitation component accounting for the possible triggering effect between events. The spatial background is estimated and evaluated on the road network, allowing the derivation of accurate risk maps of road accidents. We constrain the spatio-temporal excitation to preserve an isotropic behavior in space and we generalize it to account for the effect of covariates. The estimation is pursued by maximizing the expected complete data log-likelihood using a tailored version of the stochastic reconstruction algorithm that adopts ad-hoc boundary correction strategies. An original application analyzes the car accidents that occurred on the Rome road network in 2019, 2020, and 2021. Results highlight that car accidents of different types exhibit varying degrees of excitation, ranging from no triggering to a 10% chance of triggering further events.
stopp: An R package for spatio-temporal point pattern analysis
Authors: Nicoletta D’Angelo, Giada Adelfio
Abstract: stopp is a novel R package specifically designed for the analysis of spatio-temporal point patterns which might have occurred in a subset of the Euclidean space or on some specific linear network, such as roads of a city. It represents the first package providing a comprehensive modelling framework for spatio-temporal Poisson point processes. While many specialized models exist in the scientific literature for analyzing complex spatio-temporal point patterns, we address the lack of general software for comparing simpler alternative models and their goodness of fit. The package’s main functionalities include modelling and diagnostics, together with exploratory analysis tools and the simulation of point processes. A particular focus is given to local first-order and second-order characteristics. We aim to welcome many further proposals and extensions from the R community.
Sampling Design for Binary Geostatistical Data - Application to inspection actions of the fishing activity in Portugal
Authors: Belchior Miguel, Paula Simões, Rui Gonçalves de Deus, Isabel Natário
Abstract: The definition of surveillance routes is a very important but complex issue. The Navy, in its common form of operation is in charge of the Naval Standard device, which is distributed throughout the various areas of the country. Enforcement actions can involve very high costs, so a good plan for the sampling designs used are in order, as to maximize the efficiency in obtaining information from the data of the actions developed over the area under consideration. The main objective of this study is to propose sampling design criteria based on geostatistical models, in the context of binary data, that are advantageous in the optimization of maritime surveillance actions, in terms of efforts employed in their execution, in the Portuguese maritime area of responsibility.
Two sampling design selection criteria are proposed: maximization of estimated risk and maximization of variability associated with the estimated risk. These are then compared to the simple random design by the root mean square error (RMSE). A comparison of the designs at different sample sizes is made and for sample sizes of 50 and of 100 points, the estimated risk variability maximization sampling design presents the best RMSE value, while for the 200 point sample size, the estimated risk maximization sampling design corresponds to the best RMSE value. The proposed sampling designs may assist in the creation of alternative enforce-ment Portuguese Navy routes, optimizing the scheduling that maximizes the probability of finding a higher number of presumed fishing perpetrators with less resource efforts.
A Bayesian spatio-temporal Poisson auto-regressive model for the disease infection rate: application to COVID-19 cases in England
Authors: Pierfrancesco Alaimo Di Loro, Sujit K. Sahu, Dankmar Boehning
Abstract: The COVID-19 pandemic provided many modeling challenges to investigate the evolution of an epidemic process over areal units. A suitable encompassing model must describe the spatio-temporal variations of the disease infection rate of multiple areal processes while adjusting for local and global inputs. We develop an extension to Poisson Auto-Regression that incorporates spatio-temporal dependence to characterize the local dynamics while borrowing information among adjacent areas. The specification includes up to two sets of space-time random effects to capture the spatio-temporal dependence and a linear predictor depending on an arbitrary set of covariates. Adopted in a fully Bayesian framework and implemented through a novel sparse-matrix representation in Stan, the proposed model provides a framework for evaluating local policy changes over the whole spatial and temporal domain of the study. It has been validated through a substantial simulation study and applied to the weekly COVID-19 cases observed in the English local authority districts between May 2020 and March 2021. The model detects substantial spatial and temporal heterogeneity and allows a full evaluation of the impact of two alternative sets of covariates: the level of local restrictions in place and the value of the Google Mobility Indices. The paper also formalizes various novel model-based investigation methods for assessing additional aspects of disease epidemiology.
Spatial and Temporal Analysis of COVID-19 Cases in Relation to human activity
Authors: Poshan Niraula, Edzer Pebesma
Abstract: Infectious diseases, including Covid-19, can spread through various channels, notably anthropogenic activities like human mobility and interaction, which play a pivotal role in disease transmission. As a result, while analyzing the patterns of spread of such diseases human activities must be taken into consideration. However, limitations in data accessibility and privacy lead to aggregated data on disease transmission across different spatial and temporal scales. Moreover, predictive factors like socio-demographic indicators sourced from different places vary in granularity across these scales. This study examines the link between COVID-19 infection rates and human activity indicators. While COVID-19 case counts are widely available (albeit potentially inaccurate), data on human activities pose challenges due to measurement complexities and privacy concerns. To overcome this, alternative proxy measures are explored, such as an anonymized human activity index from Mapbox. This index, a computed and normalized metric reflecting activity levels over time and across geographic areas, serves as an approximation for interactions that could facilitate disease transmission. A challenge encountered is that datasets are not available at the same level of granularity both spatially and temporally. The infection cases are aggregated to zip code level / census polygons and reported weekly whereas the human activities (possible predictor)are aggregated at a 100 meters grid and are computed daily. To overcome this challenge, we aggregated the human activities data to zip code level / census polygons. Our approach employs spatial analysis techniques, specifically generalized linear mixed models, to assess how activity, location, and time are associated with COVID-19 distribution.
Spatio-temporal modelling of fish species distribution
Authors: Raquel Menezes, Daniela Silva, Susana Garrido
Abstract: Scientific tools capable of identifying species distribution patterns are crucial, as they contribute to advancing our understanding of the factors driving species fluctuations. Species distribution data often exhibit residual spatial autocorrelation and temporal variability, making both components essential for studying the evolution of species distribution from an ecological perspective. Fishery data typically originate from two primary sources: fishery-independent data, often obtained from commercial fleets, and fishery-independent data, typically collected through research surveys. Research surveys are conducted once or twice a year over a broader spatial region and involve standardized sampling designs that cover fewer spatial locations. In contrast, data collected from commercial fleets often exhibit a higher recurrence, with more sampled locations within a smaller region due to a preferential selection of these locations.
While these two data sources may offer distinct yet valuable information, they can be used complementarily. Jointly modeling these two sources requires an approach capable of accommodating the differing sampling designs. Classical tools excel at handling standardized sampling designs but are ill-equipped to address the preferential nature of commercial data.
The current presentation shares preliminary results on a proposed joint model capable of handling both preferential and non-preferential sampling designs. Furthermore, when modeling fish species distribution, it’s essential to consider zero-inflated data, a common occurrence in research surveys data. The discussed models, addressing these challenges, are designed to provide a comprehensive understanding of species distribution patterns and fluctuations, offering a promising tool for ecological research in the context of fisheries management.
A concordance coefficient for areal data analysis
Authors: Ronny Vallejos, Clemente Ferrer
Abstract: In this presentation, we introduce an index for measuring concordance in areal data, motivated by the study of poverty in Chile. The primary data source for poverty statistics is the National Socio-Economic Characterization Survey (CASEN), conducted by the Chilean government every two or three years. This survey is crucial for informing the design and evaluation of public policies. Recognizing the need for a modern methodology to enhance the precision of community-level estimates, a committee was convened in 2011 to implement small-area estimation (SAE) methods. The prevailing statistic used was the Horvitz-Thompson nonparametric estimator. However, since then, no studies have examined the agreement between this method and subsequent methodologies. To address this gap, we propose a concordance coefficient for comparing two sequences measured on the same areal units. Our approach is grounded in bivariate Conditional Autoregressive (CAR) processes and inherits several desirable properties for agreement measures. We illustrate the efficacy and limitations of our coefficient through numerical experiments and by analyzing poverty trends in Chile based on the CASEN survey data.
Gaussian process regression on an incomplete and large spatio-temporal grid
Authors: Sahoko Ishida, Wicher Bergsma
Abstract: Reducing the cubic time complexity in Gaussian process (GP) regression has been a popular area of research in spatial statistics and machine learning. One such attempt to make GP regression scalable is to exploit a Kronecker product structure in the covariance matrix, which is applicable when the data of interest has a multi-dimensional grid/panel structure commonly seen in spatio-temporal analysis. The structured covariance matrix allows for efficient evaluation of the likelihood and posterior. However, the Kronecker method generally requires the data to form a complete grid, i.e., no missing values in the response. While this may be a reasonable assumption for applications such as image analysis, many real-world data sets exhibit missing values, including those from environmental monitoring. In some cases, inference on missing values may be the main interest. The existing approaches fail when the missingness mechanism is not at random. To address this, we propose a stochastic EM algorithm which enables sequential model parameter estimation and missing-grid imputation. The algorithm can also handle a large number of missing values by combining Gibbs sampling in the stochastic step of the algorithm. This expands the applicability of Kronecker GP methods to real-world scenarios where missing values are common.
Density estimation for spatio-temporal point patterns on complicated domains
Authors: Simone Panzeri, Eleonora Arnone, Blerta Begu, Michelle Carey, Aldo Clemente, Laura Maria Sangalli
Abstract: Our research interest lies in spatio-temporal point patterns evolving over time on complicated spatial domains, including irregular planar regions and curved surfaces. Given pairs of locations and time of occurrence, we aim to study the associated distribution, jointly capturing spatial and temporal dependencies. We model data as independent and identically distributed realizations drawn from a distribution with unknown spatio-temporal density to estimate. This density estimation problem is equivalent to the intensity estimation problem in the modelling framework of spatio-temporal Poisson point processes, under conditions of inhomogeneity and nonseparability. We propose a novel nonparametric method for estimating the unknown density or intensity associated with spatio-temporal point patterns. We achieve this by penalizing the maximum likelihood estimator with roughness penalties, based on differential operators in space and time. We establish some important theoretical properties of the considered estimator, including its asymptotic consistency. We develop a flexible estimation procedure that leverages advanced techniques from numerical analysis, employing the finite element method in space and cubic B-spline basis functions defined over the considered time interval. This flexibility enables the proposed method to efficiently accommodate various types of spatial domains, possibly endowed with non-trivial geometries. In order to keep the overall computational costs sustainable, we resort to known optimization routines. We introduce uncertainty quantification tools to address statistical inference, providing estimates with confidence intervals. The presented method demonstrates its capability to capture complex multi-modal and strongly anisotropic signals. The obtained results highlight significant advantages over state-of-the-art techniques, in all the scenarios considered in this work.
Advantages of nonparametric testing in spatial statistics
Authors: Tomás Mrkvicka, Jirí Dvorák
Abstract: Parametric methods in spatial statistics are well-developed, but these methods often lead to problems when the model is wrongly specified. The pitfall of the wrong specification may come from misspecification of the autocorrelation structure of random fields, misspecification of the form of points interaction, misspecification of the model of dependence of one spatial object to other spatial objects, or misspecification of the dependence of data on the sample locations, etc. Such misspecifications may lead to the liberality of the tests higher than 0,25 when 0,05 is the target nominal level. In addition, the parametrical methods are usually based on asymptotic approximations, which can be for real data far from reality due to the long range of dependencies. These issues can lead to slight liberality, which can be higher than 0,08 according to our simulation studies if all the assumptions of the parametric models are met. On the other hand, the nonparametric methods are free of these assumptions, and they are designed to work correctly even in non-asymptotic situations due to the resampling strategies. Therefore, we built the R package NTSS, which collects Nonparametric Tests in Spatial Statistics. The package contains tools for selecting the relevant spatial covariates influencing the point pattern based on the independence test of the point pattern and a covariate with the presence of the nuisance covariates. Further, it contains tools for disentangling the dependence between points, marks, and covariate, the test of independence between two point patterns and two random fields.
Posters
A fully non-separable structure for log-Gaussian Cox processes
Authors: Adriana Medialdea, José Miguel Angulo, Jorge Mateu, Giada Adelfio
Abstract: Log-Gaussian Cox processes define a flexible class of spatio-temporal models which allow the description of a wide variety of dependency effects in point patterns. The clustering structure observed in these patterns can be described by the inclusion of random heterogeneities in an unobservable intensity function. In this work, we propose a model with a double non-separable structure that combines a non-separable first-order intensity function and a non-separable correlation structure for the underlying random field. This approach allows us to reflect the interaction between the spatial and temporal dimensions present in the pattern. We apply the proposed model to a dataset of forest fires in Nepal and generate risk maps for future events. The predictive performance of the model is compared to that obtained with a separable structure and a mixed model which combines a separable deterministic component and a non-separable covariance structure. This evaluation is carried out using global and local weighted second-order statistics.
This research is partially supported by grants PID2021-128077NB-I00 (A. Medialdea, J.M. Angulo) and PID2022-141555OB-I00 (J. Mateu) funded by MCIN / AEI/10.13039/501100011033 / ERDF A way of making Europe, EU, and grant CEX2020-001105-M funded by MCIN/AEI/10.13039/501100011033 (J.M. Angulo).
Extending Inlabru to spatio-temporal Hawkes Processes
Authors: Alba Bernabeu, Finn Lindgren, Jorge Mateu, Francesco Serafini
Abstract: Hawkes processes are a type of stochastic process used to model the occurrence of events over time. These processes, originally introduced by the statistician Alan G. Hawkes in the 1970s (Hawkes, 1971), are characterized by their “self-exciting” nature, meaning that each event increases the likelihood of future events occurring in the short term, creating a clustering effect. We propose a novel parameter estimation method for Hawkes processes. Our approach involves a two-step procedure: first, we utilize the Expectation-Maximization (EM) algorithm within a Bayesian framework to estimate the parameters; second, we employ the INLA (Integrated Nested Laplace Approximation) approach using the inlabru package to derive the posterior distribution of the parameters. This method offers improved accuracy and computational efficiency compared to traditional estimation techniques, providing a robust framework for analyzing temporal event data.
Risk-based spatiotemporal hotspot dynamics assessment
Authors: Ana E. Madrid, José M. Angulo
Abstract: Analysis of hotspots is an important technical aspect often involved in application studies oriented to spatiotemporal risk assessment, particularly for risk mapping in diverse fields such as Ecology, Epidemiology, Environmental Sciences, Security, among others. The most primary form, common in practice, consists in depicting the areas where the magnitude of interest exceeds a certain critical threshold level. In this work, we adopt a wider concept of hotspot associated with extremal values of general risk measures defined on different structural indicators of threshold exceedance sets, such as the exceedance area, the excess volume, etc. In particular, we propose a methodological approach based on empirical characterization of the indicator probability distributions from simulation conditional on observations, calculated at local scales determined by sliding windows. The procedure allows a flexible specification of various tuning parameters.
For illustration, we consider an extension of a spatiotemporal diffusive blur-generated autoregressive model involving a dynamical spatial deformation. Evolutionary assessment of hotspots is addressed under different scenarios, with reference to well known quantile-based risk measures. This research is partially supported by grant PID2021-128077NB-I00 funded by MCIN / AEI/10.13039/501100011033 / ERDF A way of making Europe, EU, and grant CEX2020-001105-M funded by MCIN/AEI/10.13039/501100011033.
Statistical methods for evaluating the environmental sustainability of urban infrastructures
Authors: Andrea Gilardi, Francesca Ieva, Laura Sangalli, Piercesare Secchi
Abstract: In the here and now, air pollution is emerging as a pressing global and national concern, especially in some areas of the country near the Po Valley. During the last few years, the European Union (EU) promoted several projects targeting climate actions, such as waste collection, development of renewable energy sources, or, as discussed here, carbon abatement. There are just a handful of papers investigating the impact of the EU funds in terms of reduction of air pollutants and, additionally, these studies mainly focus on a single air pollutant (e.g. PM10 or NO2) at a time, ignoring multivariate dynamics and chemical interactions. Therefore, starting from a database provided by the European Environment Agency, we introduce a series of preliminary analysis and results regarding the joint study of multiple air pollutants. We encode the multivariate data as covariance matrices that summarise the correlations among air pollutants at different monitoring sites and times. Furthermore, considering that covariance matrices lie on a Riemannian manifold, we showcase the benefits of using the Riemannian geometry in the analysis of multi-site, multi-pollutant atmospheric monitoring data, discussing an extension of classical kriging techniques to objects on Riemannian manifolds.
Model-based disease mapping using primary care registry data
Authors: Arne Janssens, Bert Vaes, Gijs Van Pottelbergh, Pieter J. K. Libin, Thomas Neyens
Abstract: Background: Spatial modeling of disease risk using primary care registry data is promising for public health surveillance. However, it remains unclear to which extent challenges such as spatially disproportionate sampling and practice-specific reporting variation affect statistical inference.
Methods: Using lower respiratory tract infection data from the INTEGO registry, modeled with a logistic model incorporating patient characteristics, a spatially structured random effect at municipality level, and an unstructured random effect at practice level, we conducted a case and simulation study to assess the impact of these challenges on spatial trend estimation.
Results: Even with spatial imbalance and practice-specific reporting variation, the model performed well. Performance improved with increasing spatial sample balance and decreasing practice-specific variation.
Conclusion: Our findings indicate that, with correction for reporting efforts, primary care registries are valuable for spatial trend estimation. The diversity of patient locations within practice populations plays an important role.
Multiple imputation of time series for high-dimensional rainfall networks
Authors: Brian O’Sullivan, Gabrielle Kelly
Abstract: Climate monitoring is commonly limited by the significant amounts of missing entries present in precipitation data. Further analysis typically requires a complete data set, indicating the need for robust imputation methods to estimate missing values. We present two novel imputation techniques for incomplete monthly data, Elastic-Net Chained Equations (ENCE) and Multiple Imputation by Chained Equations with Direct use of Regularised Regression by elastic-net (MICE DURR). Both methods have been applied to incomplete monthly rainfall collected in the Republic of Ireland from 1981-2010, with time series available from over 1,100 rain gauge stations. The number of stations surpasses the length of each series, resulting in a dataset with high-dimensionality, a challenge that both methods are tailored to address. They are constructed using a series of regularised regression models, with MICE DURR additionally applying multiple imputation.
Performance of ENCE and MICE DURR is evaluated across a variety of validation procedures at different levels of missingness, and they are shown to outperform methods present in the literature such as spatio-temporal kriging and multiple linear regression. When imputing series that are at least 50% complete during the study period, an RMSE of 14.16mm and 14.17mm per month is reported for ENCE and MICE DURR, respectively. MICE DURR begins to outperform ENCE at higher levels of missingness, highlighting the value of multiple imputation for increasingly sparser data.
O’Sullivan, B., Kelly, G., Infilling of high-dimensional rainfall networks through Multiple Imputation by Chained Equations, International Journal of Climatology
A further look for trajectories in space-time point patterns
Authors: Carles Comas, Jorge Mateu
Abstract: Contemporary data collection methodologies facilitate the continuous monitoring of objects in space and time, affording not only real-time positional insights but also space-time tracking capabilities. A trajectory pattern is considered as a set of tracks of some moving objects that can interact with each other. Examples of such trajectories include animal movements and car real-time positions using GPS technologies. In this work, we propose a new algorithm to simulate stochastic space-time point pattern trajectories, and the definition of a new second order characteristic, based on and adapted from Ripley’s K function, to analyse the space-time structure of such trajectories. Our methods are applied to a taxi trajectory data from Beijing, China.
Evaluation of the effects of the “ProAire” program on respiratory diseases in Mexico City. An Interrupted spatio-temporal Series Analysis
Authors: Carlos Diaz-Avalos, Pablo Juan, Somnath Chauduri, Chris NNanatu, Marc Saez
Abstract: The Metropolitan Area of the Valley of Mexico has historically faced significant challenges in terms of air quality, public health and environmental protection. To address this problem, various Air Quality Management Programs, known as “ProAire”, have been implemented, involving different levels of government, the private sector, society and the educational field. In this study, we focus on the timeframe spanning from 2011 to 2020, a period marked by a strong emphasis on healthcare prioritization in Mexico. We will explore an innovative methodology currently in development known as Bayesian Interrupted Spatio-Temporal Series Analysis. This approach incorporates spatial and temporal effects within the widely recognized interrupted time series framework. The data that will be used in the study include spatial and temporal socio-economic indicators of the region along with particulate matter (PM2.5) concentrations for the entire time frame. Additionally, records of respiratory diseases from various health centers across Mexico City will be included in the analysis. As a preliminary result we have found important spatial variability, probably associated with the socioeconomic level of the area. Although we have observed a reduction in the air pollutant levels after the ProAire implementation, it is not clear the effect of the program on the incidence of respiratory diseases would not be clear in the whole territory. Bayesian framework using INLA with SPDE will be used, allowing us to examine the precise temporal and spatial instances of shifts in public health in Mexico City and to determine if there have been actual improvements in air quality.
Spatiotemporal cluster analysis of ESG scores of European firms
Authors: Simone Boccaletti, Paolo Maranzano, Caterina Morelli, Philipp Otto
Abstract: This paper investigates and traces the spatial and temporal pattern of the sustainability evaluations of European firms, focusing on the environmental aspect and the carbon emission assessment. As sustainability is becoming an increasingly important topic for companies and stakeholders, data providers assign ESG scores (Environmental, Social and Governance) to firms as an evaluation of the company’s sustainability commitment. Economic and financial literature provides plenty of evidence describing the benefits related to high ESG scores, as many stakeholders prefer companies that respect sustainability principles. Using the annual ESG scores, Environmental score and Carbon Emission score and coordinates in latitude and longitude of firms located in Western European countries between 2013-2023, we implement both spatial and spatio-temporal hierarchical spatial cluster analysis to put in light regularities in the ESG score dynamics. Specifically, we combine the dissimilarity matrices of the features and the geographical distances, and we run a Ward-like hierarchical clustering, and we propose a new algorithm allowing us to find the combination of the distance matrices and the number of clusters to maximise the explained variability. Moreover, we propose a new algorithm aiming to combine more than two matrices, thus including the time series of more than one variable and extending the baseline methodology to the spatiotemporal case. Both economic and statistical fields will benefit from the obtained results. We provide a solid basis future studies on ESG scores by identifying the geographical patterns and dynamics, and we present a novel algorithm to combine spatiotemporal information in hierarchical clustering.
Advances in Geostatistical Disease Mapping: Multivariate Poisson Cokriging
Authors: David Payares-Garcia, Frank Osei, Jorge Mateu, Alfred Stein
Abstract: Mapping the spatial patterns of disease occurrences is crucial for public health research. However, traditional geostatistical methods encounter challenges when dealing with spatially correlated count data, such as heterogeneity, zero-inflation, and unreliable estimation, leading to difficulties in estimating spatial dependence and poor predictions. This study introduces a novel approach, multivariate Poisson cokriging, for predicting and filtering out disease risk. This method incorporates pairwise correlations between the target variable and multiple ancillary variables, enabling accurate disease risk estimation that captures fine-scale variation. Through a simulation experiment and an application to human immunodeficiency virus (HIV) incidence and sexually transmitted diseases (STDs) data in Pennsylvania, this work demonstrates the superiority of Poisson cokriging over ordinary Poisson kriging in terms of prediction and smoothing. The simulation study revealed a reduction in mean square prediction error (MSPE) of up to 50% when utilizing auxiliary correlated variables. The real data analysis further corroborated these findings, with Poisson cokriging yielding a 74% drop in MSPE relative to Poisson kriging, highlighting the value of incorporating secondary information. The findings underscore the potential of Poisson cokriging in disease mapping and surveillance, offering richer risk predictions, better representation of spatial interdependencies, and identification of high-risk and low-risk areas. This method presents a promising approach for public health researchers and policymakers in their efforts to mitigate the impact of diseases through targeted interventions and resource allocation.
Spatio-temporal Modelling Using Wastewater for Norovirus Surveillance
Authors: Ella White, Marta Blangiardo, Monica Pirani
Abstract: Wastewater-based epidemiology is valuable surveillance tool that has recently emerged as a cost-effective method for early detection and surveillance of viral outbreaks. Extensive research was conducted during the COVID-19 pandemic. However, there remains a notable gap in the development of spatially explicit models to predict wastewater concentrations of other pathogens, such as norovirus, at fine spatio-temporal resolutions covering entire regions or countries. We consider norovirus, the most common cause of acute gastroenteritis globally. Norovirus surveillance in the UK relies on clinical samples from confirmed outbreaks in hospitals, excluding mild and asymptomatic cases which underestimates the true disease burden. Wastewater-based epidemiology can overcome this issue though being virtually free from selection bias, thereby improving the estimates of norovirus activity. In this study, we address this through specifying a geostatistical model that quantifies the relationship between fortnightly norovirus concentration in sewage treatment works’ (STWs) catchment areas and relevant covariates including viral genogroup, indices of deprivation, demographic factors (including proportion of Black, Asian, and Minority Ethnic populations and age structure), land use and population mobility. We used data on fortnightly average of flow-normalized norovirus concentration, reported as the number of viral gene copies per 100 000 people, collected from 152 STWs between 27-5-2021 and 30-3-2022. We accounted for spatial and temporal correlations to map fortnightly norovirus concentrations at desired levels of spatial resolution. We then extended the model to predict norovirus activity using public health surveillance data, which is important for policy makers.
A spatial statistical overview on landslide hazard across Japan via a marked point process equipped with the Barrier model
Authors: Erin Bryce, Luigi Lombardo, Janine Illian, Hakan Tanyas, Daniela Castro-Camilo
Abstract: In this work, a marked log-Gaussian Cox process (LGCP) approximation with the Barrier model is proposed to simultaneously model the rate of landslide occurrence through the theory of spatial point processes with landslide size as the associated mark of the process. The model utilises the Matérn covariance function based on the stochastic partial differential equation (SPDE) approach to define the random spatial effect of the data through the Barrier model. The nature of the approximation arises in the resolution of the data when the grid approximation method is used for defining the LGCP with the SPDE approach. Spatial covariate information is incorporated through a generalised additive modelling structure conducted in the Bayesian framework. In this framework, by ensuring our model is allocated to the class of the latent Gaussian, inference can be performed by a numerical technique named the integrated nested Laplace approximation.
This modelling approach is applied to Japan, where there is a substantial inventory of landslides for the country’s history and exceptional climatic conditions that could bring about so many landslides. The results show that the model can accurately estimate where landslides occur and at what rate they do, as well as their associated size. Thus, highlighting areas of Japan that should be targeted for mitigation measures.
Bayesian Hawkes process approximation with inlabru
Authors: Francesco Serafini, Mark Naylor, Finn Lindgren, Maximilian Werner
Abstract: Hawkes processes are point processes useful to model phenomena with a self-exciting nature where an event has the ability to induce or trigger additional events. They have been widely used in different fields like seismology, epidemiology, and finance. In many of these problems, it is relevant to correctly quantify the uncertainty around the parameters and to study the effect of covariates. Bayesian methods for Hawkes process models exist, however, they usually suffer from the correlation between parameters, do not scale well with the number of observations and needs to be tailored to the problem when additional parameters or covariates are introduced. We have developed a novel approximation method for Bayesian Hawkes process models which relies on the Integrated Nested Laplace Approximation (INLA) and implemented it using the R-package inlabru. The method is based on a new log-likelihood decomposition in a form that can be handled by INLA. We focus on the temporal Epidemic-Type Aftershock Sequence (ETAS) model used to model earthquake occurrence in time and we compare the results of our new method with an existing MCMC algorithm on synthetic data and on data regarding the Amatrice seismic sequence in Italy. We show that our method provides similar results to MCMC but it is faster and scales better increasing the amount of data. We extend our approach to the spatio-temporal ETAS model, and to spatio-temporal ETAS model with covariates for modelling the number of offsprings generated by each observation. We apply these models to the Amatrice and L’Aquila seismic sequences.
Inequalities maps for educational infrastructures
Authors: Giacomo Milan
Abstract: This contribution aims to model the spatial distribution of educational infrastructure over the Italian municipalities, in relation to demographic and economic indicators, namely population density and average income. School locations are treated as the outcome of an inhomogeneous Poisson point process with a loglinear density computed from demographic covariates. The estimated model predicts the expected school density, which is combined with the observed data. The final results are presented as a Lisa map of the residual difference between the fitted density and the observed density. Lisa maps cluster discrepancies in educational infrastructure on a municipal basis, revealing social inequalities on the territory, which might be explained by urban settlement and territory administration. They provide an effective guide for policy planning.
Using nonparametric replication to test for isotropy in spatial point patterns
Authors: Jakub Pypkowski, Adam Sykulski, James Martin
Abstract: Spatial point pattern analysis often assumes isotropy in the underlying point process. This assumption simplifies the analyses, but when it is incorrect it may result in erroneous conclusions. The need to reliably detect anisotropy in point patterns has led to the development of several isotropy hypothesis testing methods, for which the main limiting factor has been a difficulty in obtaining the distribution of a test statistic under the null hypothesis. A prevailing approach has been to repeatedly obtain the test statistic using patterns simulated from an isotropic null model, which relies on a user’s ability to select a point process model and specify its parameters. If this is done incorrectly, the test power may be reduced, or the size violated. To circumvent this issue, we propose a novel, more general, testing procedure based on nonparametric replication of point patterns. We consider replication techniques such as tiling, marked point method, or stochastic reconstruction. Via a large-scale simulation study, we inspect the performance of such tests combined with different descriptive metrics (e.g. directional K-function or direction spectrum) when applied to different types of point processes of both clustering and repelling types.
Stochastic partial differential equation for spatio-temporal modeling of crimes. A Bayesian framework.
Authors: Julia Calatayud, Marc Jornet, Javier Platero, Jorge Mateu
Abstract: We propose a stochastic partial differential equation model for referenced data in the plane, with spatially correlated noise and temporal log-normal evolution. Discretization in space permits elaborating the model in a finite-dimensional framework, reducing it to a set of stochastic differential equations, coupled by correlated Wiener processes. The correlations are considered time varying and stochastic, with transformed log-normal distribution. The final model is hierarchical, and parameter inference can be conducted within the Bayesian framework. The statistical methodology is illustrated analyzing crime activity in the city of Valencia, Spain.
A Randomness Test Based on the Variance of the Nearest-Neighbour Distance for Spatial point process
Authors: Juan F. Rodríguez-Berrio, Ramón Giraldo, Francisco J. Rodríguez-Cortés
Abstract: We present a statistical test to assess complete spatial randomness (CSR) by evaluating the variance of the nearest neighbor distance. The distribution of this test is derived using Monte Carlo methods. We validate the test’s efficacy using simulated and in all cases the test consistently demonstrating its robust performance. We apply the proposed methodology to real data.
Socioeconomic inequalities in excess mortality attributable to extreme heat during the summer of 2022 in Catalonia, Spain. A Bayesian spatio-temporal model
Authors: Manuel A. Moreno, Maria A. Barceló, Pablo Juan, Marc Saez
Abstract: Introduction: In 2022, excess mortality in Spain was the third highest, only surpassed by 2020 and 2015. However, a maximum of 25% of the excess has been directly attributed to extreme heat. The problem is that these estimates are based on models that could present biases and limitations. Objectives: We first intended to estimate the excess mortality attributed to extreme heat in Catalonia in the summer of 2022, using much smaller units of analysis and temperature predictions obtained from a spatio-temporal Bayesian model. Second, we assessed the existence of socioeconomic inequalities in excess mortality attributed to extreme heat. Methods: We used a longitudinal ecological design, from 2015 to 2022 with information on daily mortality and temperatures at the level of 288 health areas (ABS). We used a sample that includes covering 6.3 million people, 81.6% of the population of Catalonia. Results: The excess mortality attributed to extreme heat during the summer months of 2022 was 49.41% using our model. This excess was 4 times greater than that attributed to extreme heat in the summers from 2015 to 2019. In rural areas and in the most socioeconomically deprived urban ABS, the excess of deaths was higher than in the rest. Conclusions: The difference between our model and the standard one could be because this latter suffers from the modifiable areal unit problem and contains measurement errors in exposure to extreme temperatures. Socioeconomic inequalities could have existed, at the level of ABS, in the excess of mortality attributed to the extreme heat.
Data Integration & Multiscale Animal Movement Modelling in Inlabru
Authors: Megan Morton
Abstract: Animal movement studies vary widely in spatiotemporal resolutions, model types, and data collection methods, resulting in numerous windows of insight into multiscale habitat preferences and movement processes. For example, telemetry tagging provides insight into local habitat preferences of individuals, whereas survey data can uncover global-scale preferences of the population as a whole. Different data types and modelling methodologies have their own pros and cons, as well as varying sources of bias and perspectives on the underlying processes of interest. Using data integration and joint modelling, we can strengthen estimation of selection parameters, improving overall understanding of animal habitat preferences.
This poster presents an ongoing project in which a joint model of two animal movement modelling perspectives (resource selection functions and step selection functions) is implemented using the R package inlabru. Survey and telemetry data are jointly modelled, combining point process methodology with Langevin diffusion-based movement modelling. The method makes use of the Integrated Nested Laplace Approximation (INLA) methodology for computationally efficient spatiotemporal modelling and The Stochastic Partial Differential Equation (SPDE) approach to incorporate a Gaussian Random Field (GRF) into the model. By accounting for spatial autocorrelation, the risk of spurious significance in estimation of selection parameters is reduced. The joint modelling framework is implemented in inlabru with user-friendly and concise code.
Modelling traffic incidents forecast to recommend safe alternative routes
Authors: Vicente R. Tomás, Somnath Chaudhuri, Luis A. García, Ivan Monzón, Ana M. Lluch, Pablo Juan
Abstract: Traffic accidents represent a pressing public health issue worldwide, in urban and non-urban areas. This study explores spatial and temporal patterns of road traffic accidents, to develop predictive models and road network risk maps in order to recommend safe alternative routes to end-users. The current study employs Bayesian methodology to develop a spatio-temporal modeling technique precisely designed for linear road networks and another recently proposed class of Gaussian processes for compact metric graphs. Integrated Nested Laplace Approximations (INLA) combined with stochastic partial differential equations (SPDE) are utilized in both approaches. Traditionally, SPDE methods are applied to triangulate the entire study area, which poses challenges when fitting models to discrete spatial events like traffic accidents that occur exclusively on road networks. To address this, the study proposes designing SPDE triangulation specifically for road networks and extends Gaussian fields with Matérn covariance functions to non-Euclidean Metric Graph settings. These methodologies are compared and explored using data from the Rotterdam Harbour in the Netherlands. The resulting network mesh facilitates localized prediction of accident likelihood, enhancing the accuracy and relevance of risk maps. Routing recommendations incorporating incident predictions are made, suggesting traffic itineraries that bypass risky areas, thus reducing congestion and improving road safety.
EarthquakeNPP: Benchmarking Neural Point Processes for Earthquake and Aftershock Forecasting
Authors: Samuel Stockman, Daniel J. Lawson, Maximilian J. Werner
Abstract: Point processes have been dominant in modeling the evolution of seismicity for decades, with the epidemic-type aftershock sequence (ETAS) model being most popular. Recent advances in machine learning have constructed highly flexible point process models using neural networks to improve upon existing parametric models. While the development of these models is accelerating within the machine learning community, the extent to which these advancements enhance earthquake predictability remains uncertain. In response, we introduce EarthquakeNPP: an ongoing benchmarking experiment aimed at evaluating the efficacy of Neural Point Process (NPP) models for earthquake and aftershock forecasting. Our study compares various spatio-temporal NPPs, including those based on normalizing flows, diffusion models, latent Gaussian processes, and integral networks, as showcased in machine learning conferences. These NPP models are rigorously benchmarked against established earthquake forecasting models developed over decades within the seismology community. Our earthquake models encompass a range of spatio-temporal Hawkes processes, including variants of the ETAS model that account for missing data, alongside Bayesian approaches that utilize Integrated Nested Laplace Approximation (INLA). We present preliminary findings from pseudo-prospective forecasting experiments conducted on earthquake catalogs from diverse global tectonic regions and outline the future of the benchmarking exercise.
Cluster models selection for stationary spatial point patterns.
Authors: Yeison Yovany Ocampo-Naranjo, Tomá Mrkvi?ka, Jorge Mateu, Francisco J. Rodríguez-Cortés
Abstract: Selecting a suitable model that fits a dataset for making inferences about parameter estimation is a pivotal goal in statistics. This task becomes particularly challenging in the realm of spatial point processes given that multiple candidate models imply a significant theoretical and computational hurdle due to the large heterogeneity of spatial configurations. While formal and graphical tests can help discern between random, cluster, or regular, the essence of characterizing a point pattern lies in fitting a specific model through a goodness-of-fit test. However, when two or more models pass this test, it raises the question of which model is the most suitable. In this work, we propose a statistical test based on a Monte Carlo method to cluster model selection for stationary spatial point patterns. We evaluate the power of the test and its performance concerning Type I error through an extensive simulation study. Finally, we apply this proposed methodology to Paleolithic tools uncovered in an archaeological dig in Tanzania.