Composite quantile regression for a distributed system with non-randomly distributed data

The composite quantile regression estimator is widely acknowledged for its robustness and efficiency, offering a compelling alternative to both ordinary least squares and quantile regression estimators in linear models. However, when data are not randomly distributed across different workers in distributed settings, existing methods for composite quantile regression become statistically inefficient. To address this limitation, we present a novel one-step upgraded pilot composite quantile regression method. Our proposed approach involves two essential steps. In the first step, we obtain a pilot estimator by leveraging a small random sample collected from different workers. Subsequently, in the second step, we perform one-step updating based on the pilot estimator, involving the summarization of sample moment quantities on each worker. The resulting estimator is theoretically proven to be as statistically efficient as the composite quantile regression estimator using the entire sample, without relying on restrictive assumptions about randomness. Furthermore, the resulting estimator inherits the robustness and efficiency advantages of the composite quantile regression estimator, while also being computationally efficient in terms of communication cost and storage usage. To validate the practical performance of our proposed method, we conduct numerical studies using simulated and real data, demonstrating its effectiveness in real-world scenarios.

Recommended citation: Jin J, Hao C, and Chen Y. (2025). "Composite quantile regression for a distributed system with non-randomly distributed data." Statistical Papers. 66(1): 1.
Download Paper | Download Bibtex

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Yewen Chen

Share on