AI Crop Monitor Trained to Spot Rare Plants in Skewed Data

TL;DR

A new training method uses varied data distributions to help AI reliably detect rare crops alongside common ones in real agricultural fields.

Accurate crop monitoring is crucial for food security, but real-world agricultural data presents a major for artificial intelligence: severe class imbalance. Common crops like wheat dominate landscapes, while rare ones like parsley are heavily underrepresented. This imbalance, combined with the high cost of labeling data, forces researchers to use few-shot learning—training models with only a handful of examples per class. However, these training sets are often artificially balanced to stabilize learning, creating a mismatch with the skewed distributions found in real-world test scenarios. This mismatch, known as prior shift, degrades the model's ability to generalize, leading to poor performance when deployed. A new study addresses this issue by introducing Dirichlet Prior Augmentation (DirPA), a that proactively simulates unknown label distribution skews during training, making AI models more robust without needing knowledge of the actual test distribution.

The researchers found that DirPA consistently improves model accuracy and robustness across various few-shot settings. In experiments using the EuroCropsML dataset from Estonia, which includes 102 crop types with a strong imbalance (grassland grass represents 46% of samples), DirPA enhanced performance in low-data regimes. For a randomly initialized model with cross-entropy loss, DirPA achieved higher accuracy and Cohen's kappa scores across all few-shot tasks from 1 to 200 shots, with the largest gains in low-shot scenarios (k ≤ 20). For instance, in the 10-shot setting, accuracy improved from 0.390 to 0.4594, and kappa from 0.199 to 0.275. When fine-tuning a model pretrained on Latvian data, DirPA also boosted metrics, with cross-entropy DirPA achieving higher accuracy in every few-shot task, such as increasing from 0.360 to 0.491 in the 5-shot setting. These , detailed in Table 1 and visualized in Figure 4, show that DirPA acts as a dynamic feature regularizer, stabilizing predictions where data is sparse.

Ology behind DirPA involves modeling the real-world label distribution as Dirichlet-distributed random variables. At each training step, samples a pseudo-prior vector from a symmetric Dirichlet distribution, which controls the degree of imbalance—with parameters like α < 1 sampling highly skewed distributions. This sampled prior is then used to adjust the model's logits via a scaling factor τ, effectively augmenting the training distribution with diverse class priors. The process, outlined in Algorithm 1, forces the model to learn a feature representation that is invariant to class prior, without requiring any knowledge of the test distribution. The experiments used a Transformer encoder architecture with sinusoidal positional encoding, trained on time-series data from Sentinel-2 satellite observations, with models evaluated in few-shot scenarios ranging from 1 to 500 shots.

Analysis of reveals that DirPA's benefits are most pronounced in low-shot regimes, where data scarcity exacerbates the prior shift problem. improved overall accuracy and Cohen's kappa across both randomly initialized and pretrained models, with the largest relative advantages observed in settings with 20 shots or fewer. For example, in the pretrained model with cross-entropy loss, DirPA increased accuracy from 0.470 to 0.594 in the 20-shot setting. The researchers note that DirPA does not degrade performance as the number of samples increases; in higher-shot or full-data conditions, all s converge to similar , as the training prior aligns with the empirical data distribution. However, the study also highlights a trade-off: while macro-averaged F1 scores showed inferior performance for DirPA (as seen in Figure 4), this concentrated loss on stable, high-shot classes is justified by the consistent gain in overall system reliability, as indicated by improved kappa scores.

Of this research extend beyond crop-type classification to any few-shot learning task suffering from discrepancies between training and test label distributions. In agriculture, robust AI models can enhance monitoring efforts, supporting food security by accurately identifying rare crops and adapting to regional variations. The proactive approach of DirPA eliminates the need for post-hoc corrections at inference time, which often require explicit knowledge of the test distribution. Future work will explore applying pseudo-priors from asymmetric Dirichlet distributions, extensive hyperparameter tuning, and testing on additional European Union countries to further enhance class-specific performance. This study demonstrates a practical step toward making AI more adaptable to real-world complexities, where data imbalances are the norm rather than the exception.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn