Games

【Watch Exotic Forbidden Pleasures Online】

2025-06-27 00:30:15 views

DeepSeek has released a new paper,Watch Exotic Forbidden Pleasures Online with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]

Tags:

Expert writer and contributor. Passionate about sharing knowledge and insights on various topics.

Related Articles

How to survive Valentine's Day when you're heartbroken

How to survive Valentine's Day when you're heartbroken

2025-06-27 00:14 397 views

Read More

People think this ad for pink kitchen gadgets is sexist

People think this ad for pink kitchen gadgets is sexist

2025-06-26 23:51 2837 views

Read More

Stunt SUV pulls a 'Transformer' and gets around gridlock

Stunt SUV pulls a 'Transformer' and gets around gridlock

2025-06-26 23:50 2474 views

Read More

Neil deGrasse Tyson helped create a cosmically punny NYT crossword puzzle

Neil deGrasse Tyson helped create a cosmically punny NYT crossword puzzle

2025-06-26 23:23 825 views

Read More

The Portable Workstation: Dell XPS 13 + 32 UltraSharp 4K Monitor

The Portable Workstation: Dell XPS 13 + 32 UltraSharp 4K Monitor

2025-06-26 23:17 1649 views

Read More

A Backstreet Boy named his new baby 'Lyric,' obviously

A Backstreet Boy named his new baby 'Lyric,' obviously

2025-06-26 23:01 734 views

Read More

In the first 'Rogue One' script, these two characters survived

In the first 'Rogue One' script, these two characters survived

2025-06-26 23:00 406 views

Read More

April the giraffe's unborn baby is a cash giraffe

April the giraffe's unborn baby is a cash giraffe

2025-06-26 22:32 998 views

Read More

Best Presidents' Day deal: Save $44 on Fitbit Charge 6

Best Presidents' Day deal: Save $44 on Fitbit Charge 6

2025-06-26 22:15 240 views

Read More

Expert writer and contributor. Passionate about sharing knowledge and insights.

120+ Articles

10K+ Followers

5+ Years

YH52R 2025-06-27 00:16

We'll always, er, sorta, have the Paris Climate Agreement

View Article

Vo4XfN0 2025-06-26 23:28

Northwestern's crying young fan is all of us watching our brackets get busted

View Article

tPwQV 2025-06-26 23:24

Another day, another Samsung Galaxy S8 leak. But this time, there's colors.

View Article

MH9pqSY 2025-06-26 23:14

Jeff Bezos suits up in giant robot armor as Amazon prepares to take over the world

View Article

lJGyH 2025-06-26 22:19

E3 2017 Trailer Roundup: Upcoming PC Games

View Article