Anomaly Detection

Anomaly detection is a vital data analysis task where algorithms identify outliers or deviations from expected patterns, signaling potential issues. Detecting anomalies prevents data misrepresentation and addresses underlying problems. Effective anomaly detection delivers business value, including cost reduction, customer retention, and optimized resource allocation. Machine learning (ML) techniques enhance anomaly detection by improving speed and accuracy, enabling timely insights.

 

In machine learning, anomaly detection applies models to quickly identify anomalies in complex datasets. This ensures data integrity, which is crucial for reliable analysis and informed decision-making. High-quality data allows businesses to derive accurate insights, leading to better strategies.

Anomaly detection: A functional analysis perspective

We develop a new approach for detecting anomalies in the behavior of stochastic processes and random fields. The approach uses tensor product representations, in particular the multivariate Karhunen–Loève (KL) expansion on complex domains. From the associated eigenspaces of the covariance operator a series of nested function spaces are constructed, allowing detection of signals lying in orthogonal subspaces. In particular this can succeed even if the stochastic behavior of the signal changes either in a global or local sense. A mathematical approach is developed to locate and measure sizes of extraneous components based on construction of multilevel nested subspaces. We show examples in on the real line and on a spherical domain. However, the method is flexible, allowing the detection of orthogonal signals on general topologies, including spatio-temporal domains.

deFOREST: Fusing Optical and Radar satellite data for Enhanced Sensing of Tree-loss

In this paper we developed a deforestation detection pipeline that incorporates optical and Synthetic Aperture Radar (SAR) data. A crucial component of the pipeline is the construction of anomaly maps of the optical data. By using the residual space of a Karhunen-Loeve expansion anomaly detection to high accuracy is achieved. For the nominal state of the forest concentration bounds on the distribution of the residual components are also achieved. Furthermore, this approach does not require knowledge of the distribution of the data. This is contrast to statistical parametric methods that assume the knowledge of the distribution of the data, which is an impractical assumption, in particular for high dimensional data. The optical anomaly maps are then combined with SAR data and the state of the forest is classified by using a Hidden Markov Model (HMM). We test our approach with Sentinel-1 (SAR) and Sentine-2 (data) data on a 100 km×100 km region. The results show that both the Hybrid optical-radar and optical method achieve high accuracy. However, not only does the Hybrid method achieve superior accuracy, but it is also significantly more robust towards the dearth of optical data.

Stochastic Functional Analysis and Multilevel Vector Field Anomaly Detection

Massive vector field datasets are common in multi-spectral optical and radar sensors, among many other emerging areas of application. In this paper we develop a novel stochastic functional (data) analysis approach for detecting anomalies based on the covariance structure of nominal stochastic behavior across a domain. An optimal vector field Karhunen-Loeve expansion is applied to such random field data. A series of multilevel orthogonal functional subspaces is constructed from the geometry of the domain, adapted from the KL expansion. Detection is achieved by examining the projection of the random field on the multilevel basis. In addition, reliable hypothesis tests are formed that do not require prior assumptions on probability distributions of the data. The method is applied to the important problem of deforestation and degradation in the Amazon forest. This is a complex non-monotonic process, as forests can degrade and recover. Using multi-spectral satellite data from Sentinel-2, the multilevel filter is constructed and anomalies are treated as deviations from the initial state of the forest. Forest anomalies are quantified with robust hypothesis tests. Our approach shows the advantage of using multiple bands of data in a vectorized complex, leading to better anomaly detection beyond the capabilities of scalar-based methods.

CONTACT

Stochastic Machine Learning Group

© 2024 – 2025, Stochastic Machine Learning Group

Scroll to Top