Julia AI Framework Speeds Up Scientific Computing 7x

TL;DR

A new Julia-based system automatically restructures computational graphs, cutting time on physics simulations and HPC tasks by up to 7 times.

High-performance computing often faces a fundamental : complex scientific problems, such as simulating particle interactions in quantum electrodynamics (QED), consist of many smaller tasks with vastly different hardware requirements. To run efficiently, each subtask should be scheduled on the best-suited hardware, whether it's a CPU, GPU, or other accelerator, while managing dependencies and data transfers. Traditionally, this requires manual optimization or specialized frameworks that lack flexibility. A new software framework, developed by researchers at Helmholtz-Zentrum Dresden-Rossendorf and the Center for Advanced Systems Understanding, addresses this by automatically optimizing computational workflows represented as directed acyclic graphs (DAGs). This approach, implemented in the Julia programming language, enables significant speedups—up to 7 times faster on GPUs—for tasks like calculating scattering processes in particle physics, which are critical for Monte Carlo simulations in fields like high-energy physics.

The core innovation lies in representing computations as Computable DAGs (CDAGs), where nodes correspond to compute tasks (like matrix multiplications) or data tasks (handling data movement), and edges denote dependencies. The framework, called ComputableDAGs.jl, analyzes these graphs and applies optimizations such as node reduction, which merges multiple nodes performing the same computation into a single node to reduce redundant work. For example, in QED calculations, many Feynman diagrams share similar subcomputations, and merging them can cut down on floating-point operations and data transfers. The system is domain-agnostic, meaning it can be applied to various scientific fields, but the researchers demonstrated its effectiveness with an implementation for QED, a quantum field theory known for producing highly accurate predictions in physics.

To build this framework, the researchers chose Julia for its meta-programming capabilities, which allow dynamic code generation and compilation without leaving the language environment. This enables static scheduling—where the entire computation graph is optimized and compiled before execution—rather than dynamic scheduling at runtime. The system includes modules for graph generation, cost estimation, optimization, scheduling, and code generation. For instance, in the QED application, a generator creates a CDAG from a scattering process definition, optimizers apply node reductions based on estimated costs, and a scheduler maps tasks to hardware devices. The final step generates executable code, which can run on CPUs or GPUs seamlessly, thanks to packages like KernelAbstractions.jl that support multiple architectures.

, Detailed in the paper, show compelling performance gains. For QED scattering processes like e− γ n → e− γ, where n is the number of incoming photons, the CDAG size scales super-exponentially with n, but optimizations significantly reduce execution time. On an NVIDIA A30 GPU, optimized CDAGs ran up to 7 times faster than unoptimized versions for processes with four incoming photons, as shown in Figure 9. CPU performance also improved, with speedups of about 5.5 times. The researchers measured execution times using BenchmarkTools.jl, with GPU execution being 200-400 times faster than CPU for smaller processes, converging to a factor of 200 for larger ones. Importantly, the optimization time pays off quickly: for a process with two incoming photons, the break-even point—where the time saved from optimization outweighs the optimization cost—was reached after just 6 samples on CPU and 179 on GPU, as illustrated in Figure 10.

This framework has broad for scientific computing, particularly in fields requiring repetitive, compute-intensive simulations. In particle physics, for instance, matrix element calculations for scattering processes are used in Monte Carlo event generation, which often requires billions of samples. By automating optimization and enabling portability across hardware, ComputableDAGs.jl could make high-performance computing more accessible to researchers without deep expertise in parallel programming. The domain-agnostic design means it could be adapted for other applications, such as data processing or simulations in climate science or bioinformatics, where workflows can be represented as DAGs. Future work may focus on adding heterogeneous scheduling to use multiple device types simultaneously, more sophisticated estimators that account for memory usage, and operations like node vectorization for better parallelization.

However, the approach has limitations. The CDAG representation requires that the graph structure be static and known before execution, which may not suit problems with dynamic dependencies. Additionally, very large CDAGs can lead to long compilation times; for example, generating a function for an 8-photon Compton scattering process took 895 seconds, as noted in Table I. The optimization benefits also depend on the use case: they are most valuable when many samples are needed, as the overhead of optimization must be amortized over repeated executions. The paper acknowledges that further improvements are needed to handle even larger graphs, possibly by splitting generated functions into smaller pieces to aid compilers. Despite these s, the framework represents a significant step toward more efficient and portable scientific computing, leveraging AI-like graph optimization techniques to harness modern hardware effectively.

About the Author

Guilherme A.

Former dentist (MD) from Brazil, 41 years old, husband, and AI enthusiast. In 2020, he transitioned from a decade-long career in dentistry to pursue his passion for technology, entrepreneurship, and helping others grow.

Connect on LinkedIn