<?xml version='1.0'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss" xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
	<title><![CDATA[PublMe - Space: Posted Reaction by PublMe bot in PublMe]]></title>
	<link>https://publme.space/reactions/v/49831</link>
	<atom:link href="https://publme.space/reactions/v/49831" rel="self" type="application/rss+xml" />
	<description><![CDATA[]]></description>
	
	<item>
	<guid isPermaLink="true">https://publme.space/reactions/v/49831</guid>
	<pubDate>Mon, 27 Jan 2025 22:00:11 +0100</pubDate>
	<link>https://publme.space/reactions/v/49831</link>
	<title><![CDATA[Posted Reaction by PublMe bot in PublMe]]></title>
	<description><![CDATA[
<p>New Open Source DeepSeek V3 Language Model Making Waves</p>
<div><img width="800" height="467" src="https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png?w=800" alt="" srcset="https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png 1702w, https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png?resize=250, 146 250w, https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png?resize=400, 234 400w, https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png?resize=800, 467 800w, https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png?resize=1536, 897 1536w" data-attachment-id="756523" data-permalink="https://hackaday.com/2025/01/27/new-open-source-deepseek-v3-language-model-making-waves/deepseek-v3_benchmark/" data-orig-file="https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png" data-orig-size="1702,994" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="deepseek-v3_benchmark" data-image-description="" data-image-caption="" data-medium-file="https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png?w=400" data-large-file="https://hackaday.com/wp-content/uploads/2025/01/deepseek-v3_benchmark.png?w=800"></div><p>In the world of large language models (LLMs) there tend to be relatively few upsets ever since OpenAI barged onto the scene with its transformer-based GPT models a few years ago, yet now it seems that Chinese company DeepSeek has upended the status quo. Its new <a rel="nofollow" href="https://github.com/deepseek-ai/DeepSeek-V3" target="_blank">DeepSeek-V3 model</a> is not only open source, it also claims to have been trained for only a fraction of the effort required by competing (open &amp; closed source) models, while <a rel="nofollow" href="https://arxiv.org/abs/2412.19437v1" target="_blank">performing significantly better</a>.</p><p>The full training of DeepSeek-V3’s 671B parameters is claimed to have only taken 2.788M hours on NVidia H800 (<a rel="nofollow" href="https://www.techpowerup.com/gpu-specs/h800-sxm5.c3975" target="_blank">Hopper-based</a>) GPUs, which is almost a factor of ten less than others. Naturally this has the LLM industry somewhat up in a mild panic, but for those who are not investors in LLM companies or NVidia can partake in this new OSS model that has been released under the MIT license, along with the <a rel="nofollow" href="https://github.com/deepseek-ai/DeepSeek-R1" target="_blank">DeepSeek-R1 reasoning model</a>.</p><p>Both of these models can be run locally, using both AMD and NVidia GPUs, as well as using the online APIs. If these models do indeed perform as efficiently as claimed, they stand to massively reduce the hardware and power required to not only train but also query LLMs.</p>]]></description>
	<dc:creator>PublMe bot</dc:creator>
</item>

</channel>
</rss>