AI Learns to Focus on What Matters in Time Series

TL;DR

A new method helps AI predict trends by compressing historical data, cutting noise and costs while improving accuracy.

In fields like finance, weather forecasting, and energy management, predicting future trends based on historical data is crucial for decision-making. However, simply feeding AI models more historical data often backfires, as longer lookback windows introduce irrelevant noise and computational burdens that can degrade predictive accuracy. Researchers from Beijing Jiaotong University have developed a novel framework called Dynamic Semantic Compression (DySCo) that addresses this paradox by teaching AI to intelligently compress long time series, retaining only the most informative segments while discarding redundant information. This approach not only enhances the ability of models to capture long-term dependencies but also significantly reduces computational costs, making it a versatile tool for real-world applications where efficiency and accuracy are paramount.

The core of DySCo is its ability to dynamically identify and preserve high-entropy segments—those rich in unpredictable, critical information—while compressing predictable, low-entropy trends. Unlike traditional s that rely on fixed heuristics, such as assuming older data is less important, DySCo uses a learnable mechanism called Entropy-Guided Dynamic Sampling (EGDS) to evaluate the semantic value of each data segment. For example, in a time series spanning thousands of steps, EGDS can detect an anomaly from the distant past and allocate dense sampling to preserve its details, overriding the typical bias that favors recent data. This ensures that essential precursors, regardless of their temporal distance, are retained, enabling more accurate long-term forecasts without the noise accumulation that plagues conventional approaches.

To achieve this, DySCo employs a three-component ology. First, the Hierarchical Frequency-Enhanced Decomposition (HFED) separates the input sequence into multiple scales using low-pass filters with varying bandwidths. For instance, with a lookback window of 2440 steps, HFED creates representations at horizons of 720, 1440, and 2440 steps, where longer horizons are smoothed to capture global trends and shorter ones preserve high-frequency details like sharp variations. Next, EGDS divides these filtered sequences into segments and uses a learnable importance scorer—a small neural network—to assign importance scores, guiding dynamic compression. Finally, the Cross-Scale Interaction Mixer (CSIM) replaces simple linear aggregation with a gating network that dynamically weights predictions from different scales, ensuring the final forecast benefits from both stable long-term trends and sensitive short-term details. This multi-scale decomposition approach shares conceptual similarities with frameworks like TimeMixer, which also leverages disentangled multiscale series for forecasting.

, As detailed in the paper, demonstrate DySCo's effectiveness across seven datasets, including electricity consumption, traffic, and weather data. When integrated into mainstream models like TimeMixer, PatchTST, iTransformer, and Linear, DySCo consistently boosted performance. For example, on the Electricity dataset, DySCo-integrated TimeMixer achieved a mean squared error (MSE) of 0.141, compared to 0.201 for the vanilla version, representing a significant reduction in prediction error. As shown in Table II, DySCo models with a fixed lookback window of 2440 often outperformed basic models even when the latter were tuned to their optimal window lengths. Visualization in Figure 5 further illustrates that on datasets with strong long-term trends, such as ETTh1, DySCo accurately captures underlying patterns while baseline models struggle with noise.

Beyond accuracy, DySCo offers substantial computational advantages. The paper's theoretical analysis reveals that by compressing sequences from length L to a sparse representation T (where T is much smaller than L), DySCo reduces parameter complexity. For instance, with L=2440 and T=336, DySCo cuts parameters by approximately 58.7% compared to a standard Linear model. In Transformer-based architectures, this compression leads to a 94.3% reduction in attention-related computations, as shown in Figure 4, which compares GPU memory consumption and training time. This efficiency makes DySCo particularly valuable for applications in resource-constrained environments, such as real-time financial forecasting or edge computing in smart grids, where reducing computational overhead without sacrificing accuracy is critical. Implementations of the baseline models used for comparison are available in the Time-Series-Library.

Despite its strengths, DySCo has limitations noted in the paper. The framework's performance gain is more pronounced on datasets with long-term evolutionary trends, such as ETTh1, where it effectively filters noise. However, on datasets dominated by short-term periodicity, like Electricity, the accuracy improvement is marginal, though the computational savings remain substantial. Additionally, DySCo introduces hyperparameters, such as the number of scales and semantic sensitivity weight, which require tuning for optimal . The paper's ablation studies in Table IV confirm that each component—HFED, EGDS, and CSIM—contributes to overall performance, and removing any degrades , highlighting the framework's integrated design. Future work may explore adapting DySCo to non-stationary data or integrating it with emerging AI architectures to further enhance its versatility.

---SOURCES---
- PatchTST: A Time Series is Worth 64 Words — arXiv
- TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting — arXiv
- iTransformer: Inverted Transformers Are Effective for Time Series Forecasting — arXiv
- ETDataset: Electricity Transformer Dataset — GitHub
- Time-Series-Library: Advanced Deep Time Series Models — GitHub

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn