AIResearch AIResearch
Back to articles
Data

AI Models Learn Tasks, Not Users, for Better Privacy

A new federated learning method trains specialized AI models for specific tasks across distributed data, improving performance by up to 136% when handling multiple or unseen tasks without compromising privacy.

AI Research
April 02, 2026
4 min read
AI Models Learn Tasks, Not Users, for Better Privacy

In the world of artificial intelligence, training large language models often requires vast amounts of data, but accessing sensitive information from multiple sources while preserving privacy remains a major hurdle. Federated learning has emerged as a solution by allowing models to learn from decentralized data without sharing raw information, yet it struggles when clients have diverse or conflicting tasks. A new approach called FedRouter shifts the focus from personalizing models for each user to specializing them for each task, addressing key s in data heterogeneity and generalization. This could enhance applications in healthcare, mobile devices, and law, where data privacy is critical but performance on varied tasks is essential.

FedRouter tackles two core problems identified in personalized federated learning: generalization and intra-client tasks interference. Generalization refers to the difficulty models face when making predictions on unseen tasks or dealing with changes in data distributions at test time, often leading to significant performance drops. Intra-client tasks interference occurs when a single client's dataset contains multiple tasks that conflict during training, similar to trying to optimize for contradictory goals simultaneously. The researchers found that traditional client-centric s degrade under these conditions, but FedRouter's task-centric design mitigates these issues by creating specialized models for each task rather than each client, as shown in their experiments.

Ology behind FedRouter involves a three-component system: local clustering, global clustering, and an evaluation router mechanism. First, each client computes embeddings from its local data using a pre-trained base model and applies local clustering, such as K-Means, to partition the data into task-specific subsets, training a specialized adapter for each cluster. These adapters and centroids are then sent to a server, which performs global clustering to group similar tasks across different clients and aggregates the corresponding adapters through averaging. This process is coordinated in a round-robin fashion to manage communication and computation costs, with clients retraining adapters based on global centroids in subsequent rounds. For inference, an evaluation router uses either local or global centroids to route new samples to the most appropriate adapter, enabling both personalized and generalized evaluation modes depending on whether test-time distribution shifts occur.

From extensive experiments demonstrate FedRouter's superior performance in challenging scenarios. In evaluations using a subset of four tasks from the FLAN dataset—QQP, WebNLG, Samsum, and GigaWord—with ROUGE-1 as the metric, FedRouter outperformed baselines like FedIT, FedDPA, and FedSA. Under tasks interference, where clients have multiple tasks, FedRouter achieved up to 3.5% absolute improvement (approximately 6.1% relative) compared to other s, as summarized in Table 1. For generalization, when tested on unseen tasks, FedRouter showed a 33.6% absolute improvement (approximately 136% relative), as reported in Table 3, highlighting its robustness to distribution shifts. Additional analyses, including t-SNE visualizations in Figure 4, confirmed effective task separation, with clustering accuracy reaching up to 100% in simpler scenarios.

Of this research extend to real-world applications where data privacy and task diversity are paramount. By enabling AI models to specialize in tasks rather than users, FedRouter could improve performance in fields like mobile computing, where devices handle varied functions, or healthcare, where patient data must remain confidential but models need to adapt to multiple medical tasks. 's scalability was validated through ablation studies, showing consistent performance across model sizes from 1 billion to 8 billion parameters, as illustrated in Figure 5, and improved with more clients due to increased data availability, as seen in Figure 6. This approach not only enhances efficiency but also supports broader adoption of federated learning in sensitive environments.

Despite its strengths, FedRouter has limitations that the paper acknowledges. relies on clustering algorithms like K-Means, and its performance can be affected by the accuracy of local and global clustering, particularly in scenarios with many overlapping tasks, where clustering accuracy dropped to 95.4% in the All scenario. Hyperparameter selection, such as the number of clusters, is critical and was addressed using the Silhouette Score , as shown in Figures 7 and 8, but this may not generalize to all datasets. Future work could explore scenarios with even more tasks and cross-task collaboration s to further enhance robustness and applicability in complex, real-world settings.

Original Source

Read the complete research paper

View on arXiv

About the Author

Guilherme A.

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn