DeepSeek-R1: Testing Stability Under Influence Across Platforms

By 24matins.uk, published 24 February 2025 at 19h21, updated on 24 February 2025 at 19h21.

Tech

An assessment across 18 platforms shows significant variations in the stability of DeepSeek-R1, with differences in completeness, accuracy, and inference timing depending on the hosting choice, highlighting the effect of paid services.

A Detailed Assessment of DeepSeek-R1 Stability

To establish a benchmark, researchers conducted a series of tests on the DeepSeek-R1 solution across 18 third-party platforms, using the same set of 20 elementary-level mathematical reasoning problems developed by SuperCLUE team.

Evaluation Criteria

The assessment focused on three main criteria: response rate, accuracy, and reflection time. Additionally, the impact of different pricing models (free vs. paid) on the reliability of DeepSeek-R1 was analyzed.

Variable Performance Across Platforms

Results revealed significant variability, especially in the rate of complete responses. Platforms like Perplexity, together.ai, and ByteDance’s Volcengine achieved a 100% response rate, while others such as Baidu AI Cloud, Tencent Cloud TI Platform, and Silicon Flow’s basic edition scored below 50%, indicating they have “room for improvement in stability”.

Paid Platforms Outperform Free Ones

Interestingly, foreign paid platforms tend to outperform domestic ones in response rate and inference time, although domestic platforms excel in accuracy. Moreover, opting for a paid subscription significantly enhances stability, with paid versions averaging an 88% complete response rate compared to 65% for free versions.

Choosing the Right Platform Is Crucial

The stability of DeepSeek-R1 heavily depends on the chosen platform. It’s crucial for users to carefully consider their specific needs such as response rate, inference time, and other indicators when selecting their hosting platform. Despite observed disparities, many platforms have shown excellent performance in terms of model output reliability and integrity. Consider these factors carefully when choosing the platform for deploying DeepSeek-R1.

Le Récap

A Detailed Assessment of DeepSeek-R1 Stability
Evaluation Criteria
Variable Performance Across Platforms
Paid Platforms Outperform Free Ones
Choosing the Right Platform Is Crucial