In IT service management, support tickets must be categorized quickly and accurately into hierarchical taxonomies to route issues efficiently, but existing AI models often struggle with the dynamic, interpretability, and speed demands of real-world enterprise environments. A new dual-embedding centroid framework addresses these s by combining semantic and lexical representations, achieving performance competitive with established s like Support Vector Machines while significantly boosting operational efficiency. This approach is particularly valuable for organizations that need to adapt their classification systems frequently without sacrificing transparency or incurring high computational costs.
The researchers found that their achieves a hierarchical F1-score of 0.731, slightly outperforming Support Vector Machines at 0.727, and a top-3 accuracy of 0.681 compared to 0.675 for SVM, as shown in Table 3 of the paper. These indicate that the framework maintains competitive classification accuracy on a dataset of 8,968 ITSM tickets across 123 categories, where hierarchical relationships matter for partial credit in predictions. uses leaf-only scoring for this dataset, where over 95% of samples are at leaf nodes, but it supports configurable strategies like weighted or simple-average scoring for datasets with different depth distributions. This flexibility allows it to adapt to various hierarchical structures without requiring complex retraining.
Ology involves maintaining separate centroid representations for each category using two types of embeddings: semantic embeddings from SBERT and lexical features from TF-IDF. During training, centroids are computed by averaging the embeddings of training samples assigned to each category, with optional multi-centroid clustering disabled in this study for efficiency. At inference time, the framework generates independent rankings from both embedding types and fuses them using reciprocal rank fusion, a technique that combines ranked lists to leverage complementary strengths. This dual-embedding architecture, detailed in Algorithm 1 and Figure 1, ensures that both the broader meaning and specific terminology of tickets are considered, enhancing accuracy in ITSM contexts where technical jargon is common.
Analysis of shows that the framework excels in computational efficiency, with training times of 1.81 seconds compared to 10.66 seconds for SVM, a 5.9 times speedup as reported in Table 4. For incremental updates, which are critical in evolving taxonomies, achieves speedups of 8.6 to 8.8 times for batch sizes of 100 to 1000 samples, and even higher for smaller batches, such as 151.9 times for single samples, as shown in Table 5. This efficiency stems from recomputing only affected centroids rather than retraining the entire model, making it suitable for production environments where categories are frequently added or modified. The paper notes that while per-sample inference is slower than SVM (2.70ms vs 0.051ms), this overhead is negligible in typical ITSM workflows, balancing speed with interpretability benefits.
Of this research are significant for IT service management and other domains with hierarchical classification needs, as it provides a practical alternative to black-box models that lack transparency. The framework's interpretability allows domain experts to trace predictions back to centroid similarities, enabling validation and trust in automated systems. Its efficiency in training and updates reduces operational costs and downtime, addressing the dynamic nature of enterprise taxonomies where business needs evolve rapidly. By demonstrating that simple, centroid-based s can rival more complex algorithms, the study s the assumption that advanced AI always requires heavy computational resources, offering a scalable solution for high-volume ticket processing.
Limitations of the framework include its evaluation on a dataset with strong depth concentration, where most samples are at leaf nodes, which may not generalize to all hierarchical structures without further testing. The paper acknowledges that future work should assess on datasets with balanced depth distributions to determine optimal scoring strategies. Additionally, while the framework supports multi-centroid clustering for categories with high internal variance, this feature was disabled in the current study, potentially limiting performance on more diverse datasets. The researchers also note that the approach has not been tested on other hierarchical taxonomies beyond ITSM, suggesting a need for broader validation to confirm its versatility across different applications.
Original Source
Read the complete research paper
About the Author
Guilherme A.
Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.
Connect on LinkedIn