Decentralization AI Training New Paradigm: Prime Intellect and Pluralis Explore Cutting-Edge Technology

2025-08-06 13:33:33

The Holy Grail of Crypto AI: Cutting-edge Exploration of Decentralization Training

In the AI full value chain, model training is the most resource-intensive and technically challenging stage, directly determining the capability ceiling and actual application effects of the model. Compared to the lightweight calls in the inference phase, the training process requires sustained large-scale computing power investment, complex data processing workflows, and intensive optimization algorithm support, making it the true "heavy industry" of AI system construction. From an architectural paradigm perspective, training methods can be divided into four categories: centralized training, distributed training, federated learning, and decentralized training, which is the focus of this article.

Centralized training is the most common traditional method, where a single institution completes the entire training process within a local high-performance cluster. All components, from hardware, underlying software, cluster scheduling systems, to training frameworks, are coordinated by a unified control system. This deeply collaborative architecture optimizes the efficiency of memory sharing, gradient synchronization, and fault tolerance mechanisms, making it very suitable for training large-scale models like GPT and Gemini, with advantages of high efficiency and controllable resources. However, it also presents issues such as data monopolization, resource barriers, energy consumption, and single point of risk.

Distributed training is the mainstream method for training large models. Its core is to decompose the model training tasks and distribute them to multiple machines for collaborative execution in order to overcome the computing and storage bottlenecks of a single machine. Although it physically possesses "distributed" characteristics, it is still controlled and scheduled by a centralized institution as a whole, often running in a high-speed local area network environment. Through NVLink high-speed interconnect bus technology, the master node coordinates various sub-tasks. Mainstream methods include:

Data parallelism: each node trains different data with shared parameters, requiring matching model weights.
Model parallelism: Deploying different parts of the model on different nodes to achieve strong scalability.
Pipeline parallelism: Staged serial execution to improve throughput
Tensor Parallelism: Fine-grained segmentation of matrix computations to enhance parallel granularity

Distributed training is a combination of "centralized control + distributed execution", analogous to the same boss remotely directing multiple "office" employees to collaborate on completing tasks. Currently, almost all mainstream large models are trained in this way.

Decentralization training represents a future path that is more open and resistant to censorship. Its core feature is: multiple mutually distrusting nodes collaborate to complete training tasks without a central coordinator, usually driven by protocols for task distribution and cooperation, and leveraging cryptographic incentive mechanisms to ensure the honesty of contributions. The main challenges faced by this model include:

Difficulty in device heterogeneity and task partitioning: High difficulty in coordinating heterogeneous devices, low efficiency in task partitioning.
Communication efficiency bottleneck: Network communication is unstable, and gradient synchronization bottleneck is obvious.
Lack of Trusted Execution: Absence of a trusted execution environment makes it difficult to verify whether nodes are truly participating in the computation.
Lack of unified coordination: No central dispatcher, complex task distribution and exception rollback mechanisms.

Decentralization training can be understood as: a group of global volunteers contributing computing power to collaboratively train models. However, "truly feasible large-scale decentralization training" remains a systemic engineering challenge, involving multiple aspects such as system architecture, communication protocols, cryptographic security, economic mechanisms, and model validation. Whether it can achieve "effective collaboration + honest incentives + correct results" is still in the early prototype exploration stage.

Federated learning, as a transitional form between distributed and Decentralization, emphasizes local data retention and centralized aggregation of model parameters, making it suitable for scenarios that prioritize privacy compliance. Federated learning has the engineering structure of distributed training and local collaboration capabilities, while also benefiting from the data decentralization advantages of Decentralization training. However, it still relies on trusted coordinators and does not possess fully open and censorship-resistant characteristics. It can be seen as a "controlled Decentralization" solution in privacy-compliant scenarios, relatively moderate in terms of training tasks, trust structures, and communication mechanisms, making it more suitable as a transitional deployment architecture in the industry.

Decentralization training boundaries, opportunities, and realistic paths

From the perspective of training paradigms, Decentralization training is not suitable for all types of tasks. In certain scenarios, due to the complexity of task structures, extremely high resource requirements, or significant collaboration challenges, it is inherently unsuitable for efficient completion among heterogeneous, trustless nodes. For example, large model training often relies on high memory, low latency, and high bandwidth, making it difficult to effectively partition and synchronize in an open network; tasks that are heavily restricted by data privacy and sovereignty limitations are constrained by legal compliance and ethical considerations, preventing open sharing; while tasks that lack foundational collaborative incentives lack external participation motivation. These boundaries collectively constitute the current realistic limitations of Decentralization training.

However, this does not mean that Decentralization training is a false proposition. In fact, in task types that are lightweight in structure, easy to parallelize, and incentivizable, Decentralization training shows clear application prospects. This includes but is not limited to: LoRA fine-tuning, behavior alignment post-training tasks, data crowdsourcing training and labeling tasks, resource-controlled small foundational model training, and collaborative training scenarios involving edge devices. These tasks generally possess characteristics of high parallelism, low coupling, and tolerance for heterogeneous computing power, making them very suitable for collaborative training through methods such as P2P networks, Swarm protocols, and distributed optimizers.

Decentralization training classic project analysis

Currently, in the forefront fields of Decentralization training and federated learning, the representative blockchain projects mainly include Prime Intellect, Pluralis.ai, Gensyn, Nous Research, and Flock.io. In terms of technological innovation and engineering implementation difficulty, Prime Intellect, Nous Research, and Pluralis.ai have proposed numerous original explorations in system architecture and algorithm design, representing the frontier direction of current theoretical research; whereas Gensyn and Flock.io have relatively clear implementation paths, and initial engineering progress can already be observed. This article will sequentially analyze the core technologies and engineering architecture behind these five projects, and further explore their differences and complementary relationships in the Decentralization AI training system.

Prime Intellect: A pioneer of verifiable training trajectory in reinforcement learning collaborative networks.

Prime Intellect is committed to building a trustless AI training network, allowing anyone to participate in training and earn credible rewards for their computational contributions. Prime Intellect aims to create a verifiable, open, and fully incentivized AI Decentralization training system through the three main modules: PRIME-RL + TOPLOC + SHARDCAST.

01、Prime Intellect Protocol Stack Structure and Key Module Value

02. Detailed Explanation of the Key Mechanisms of Prime Intellect Training

#PRIME-RL: Decoupled Asynchronous Reinforcement Learning Task Architecture

PRIME-RL is a task modeling and execution framework customized by Prime Intellect for decentralized training scenarios, specifically designed for heterogeneous networks and asynchronous participation. It adopts reinforcement learning as the primary adaptation object, structurally decoupling the training, inference, and weight upload processes, allowing each training node to independently complete task loops locally and collaborate with validation and aggregation mechanisms through standardized interfaces. Compared to traditional supervised learning processes, PRIME-RL is more suitable for implementing flexible training in environments without centralized scheduling, reducing system complexity while laying the foundation for supporting multi-task parallelism and policy evolution.

#TOPLOC: Lightweight Training Behavior Verification Mechanism

TOPLOC is a core mechanism for training verifiability proposed by Prime Intellect, used to determine whether a node has indeed completed effective policy learning based on observational data. Unlike heavyweight solutions such as ZKML, TOPLOC does not rely on full model recomputation, but instead completes lightweight structural verification by analyzing the local consistency trajectory between "observation sequence ↔ policy update". It transforms the behavioral trajectory during the training process into a verifiable object for the first time, which is a key innovation for achieving trustless training reward distribution and provides a feasible path for building an auditable and incentivized Decentralization collaborative training network.

#SHARDCAST: Asynchronous Weighted Aggregation and Propagation Protocol

SHARDCAST is a weight propagation and aggregation protocol designed by Prime Intellect, optimized for real network environments that are asynchronous, bandwidth-constrained, and have variable node states. It combines a gossip propagation mechanism with local synchronization strategies, allowing multiple nodes to continuously submit partial updates while in unsynchronized states, achieving progressive convergence of weights and multi-version evolution. Compared to centralized or synchronous AllReduce methods, SHARDCAST significantly enhances the scalability and fault tolerance of decentralized training, serving as a core foundation for building stable weight consensus and continuous training iterations.

#OpenDiLoCo: Sparse Asynchronous Communication Framework

OpenDiLoCo is a communication optimization framework independently implemented and open-sourced by the Prime Intellect team based on the DiLoCo concept proposed by DeepMind. It is specifically designed to address common challenges in decentralized training, such as bandwidth limitations, device heterogeneity, and node instability. Its architecture is based on data parallelism, constructing sparse topologies like Ring, Expander, and Small-World to avoid the high communication overhead of global synchronization, relying only on local neighbor nodes to achieve model collaborative training. By combining asynchronous updates with fault tolerance mechanisms, OpenDiLoCo enables consumer-grade GPUs and edge devices to stably participate in training tasks, significantly enhancing the inclusivity of global collaborative training and making it one of the key communication infrastructures for building decentralized training networks.

#PCCL: Collaborative Communication Library

PCCL is a lightweight communication library tailored for the decentralized AI training environment by Prime Intellect, aimed at addressing the adaptation bottlenecks of traditional communication libraries in heterogeneous devices and low-bandwidth networks. PCCL supports sparse topology, gradient compression, low-precision synchronization, and checkpoint recovery, and can run on consumer-grade GPUs and unstable nodes, serving as the underlying component supporting the asynchronous communication capability of the OpenDiLoCo protocol. It significantly enhances the bandwidth tolerance and device compatibility of the training network, paving the way for building a truly open and trustless collaborative training network by bridging the "last mile" of communication infrastructure.

03, Prime Intellect Incentive Network and Role Distribution

Prime Intellect has built a permissionless, verifiable training network with economic incentives, allowing anyone to participate in tasks and receive rewards based on real contributions. The protocol operates based on three core roles:

Task Initiator: Define the training environment, initial model, reward function, and validation criteria.
Training Node: Execute local training, submit weight updates and observation trajectories
Validator Nodes: Use the TOPLOC mechanism to verify the authenticity of training behavior and participate in reward calculation and strategy aggregation.

The core process of the protocol includes task publishing, node training, trajectory verification, weight aggregation, and reward distribution, forming an incentive closed loop centered around "real training behavior."

04, INTELLECT-2: The launch of the first verifiable Decentralization training model

Prime Intellect released INTELLECT-2 in May 2025, which is the world's first large model of reinforcement learning trained by asynchronous, trustless decentralized nodes, with a parameter scale of 32B. The INTELLECT-2 model was collaboratively trained by over 100 GPU heterogeneous nodes distributed across three continents, using a fully asynchronous architecture, with a training duration of over 400 hours, demonstrating the feasibility and stability of the asynchronous collaboration network. This model not only represents a breakthrough in performance but also marks the first systematic implementation of the "training is consensus" paradigm proposed by Prime Intellect. INTELLECT-2 integrates core protocol modules such as PRIME-RL, TOPLOC, and SHARDCAST, signifying that the decentralized training network has achieved openness, verifiability, and an economic incentive closed loop in the training process for the first time.

In terms of performance, INTELLECT-2 is based on QwQ-32B training and has undergone specialized RL training in code and mathematics, placing it at the forefront of current open-source RL fine-tuning models.

PRIME5.67%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

7 Likes