DeepSeek has gone viral.
DeepSeek, a Chinese AI lab, emerged into the mainstream consciousness last week when their chatbot app topped the Apple App Store charts. DeepSeek’s AI models, which were trained using compute-efficient methodologies, have prompted Wall Street analysts—and technologists—to question if the United States can maintain its lead in the AI race and whether demand for AI chips will remain strong.
But where did DeepSeek come from, and how did it swiftly gain international fame?
DeepSeek’s trader origins
DeepSeek is supported by High-Flyer Capital Management, a Chinese quantitative hedge fund that leverages AI to guide its trading decisions. Liang Wenfeng, an AI enthusiast, co-founded High-Flyer in 2015 and launched High-Flyer Capital Management as a hedge fund in 2019 that focuses on creating and implementing AI algorithms. In 2023, High-Flyer established DeepSeek as a lab devoted to researching AI tools apart from its financial business, with High-Flyer as one of its investors. The lab then spun off into its own company, also named DeepSeek.
DeepSeek created its own data center clusters for model training right away. However, DeepSeek has been impacted by U.S. hardware export restrictions, just like other AI firms in China. The company was compelled to employ Nvidia H800 processors, a less potent variant of the H100 chip that is accessible to American businesses, in order to train one of its more current models.
It is stated that the technological staff at DeepSeek is primarily young. According to reports, the corporation actively seeks out PhD AI researchers from prestigious Chinese universities. According to The New York Times, DeepSeek also employs non-computer scientists to help its tech better understand a variety of topics.
DeepSeek’s strong models
In November 2023, DeepSeek released its initial set of models, which included DeepSeek Coder, DeepSeek LLM, and DeepSeek Chat. However, the AI industry didn’t start paying attention until this spring, when the startup unveiled its next-generation DeepSeek-V2 family of models.
In addition to doing well on a number of AI benchmarks, DeepSeek-V2, a general-purpose text and picture analysis system, was far less expensive to operate than similar models at the time. It compelled ByteDance and Alibaba, two of DeepSeek’s domestic rivals, to lower the usage fees for some of their models and make others totally free.
The December 2024 release of DeepSeek-V3 only increased DeepSeek’s reputation.
DeepSeek V3 performs better than both “closed” models that are only accessible via an API, such as OpenAI’s GPT-4o, and downloadable, publicly available models, such as Meta’s Llama, according to DeepSeek’s internal benchmark testing.
The R1 “reasoning” model of DeepSeek is equally outstanding. According to DeepSeek’s January release, R1 outperforms OpenAI’s o1 model on important metrics.
R1 successfully fact-checks itself since it is a reasoning model, which helps it stay clear of some of the common mistakes that models make. In comparison to a standard non-reasoning model, reasoning models typically take a little longer to arrive at solutions, ranging from seconds to minutes. On the plus side, they are typically more trustworthy in fields like math, science, and physics.
However, R1, DeepSeek V3, and the other DeepSeek models have drawbacks. Since the AI was created in China, China’s internet regulator is able to benchmark it to make sure that its responses “embody core socialist values.” For instance, R1 in DeepSeek’s chatbot software won’t respond to inquiries concerning Taiwan’s autonomy or Tiananmen Square.
ICYMT: 258 Ghanaians deported from the U.S. between 2021 and 2024
A disruptive approach
It’s unclear exactly what DeepSeek’s business model is, if it has one. The business offers certain of its goods and services for free while pricing others far below market value.
According to DeepSeek, it has been able to sustain exceptional cost competitiveness through efficiency advancements. However, several experts contest the numbers provided by the corporation.
In any event, developers have embraced DeepSeek’s models, which are accessible under permissive licenses that permit commercial use but aren’t open source in the traditional sense of the word. Clem Delangue, the CEO of Hugging Face, one of the platforms that houses DeepSeek’s models, claims that over 500 “derivative” models of R1 have been developed on Hugging Face and have received a total of 2.5 million downloads.
It has been said that DeepSeek’s victory over bigger and more established competitors is “upending AI” and bringing about “a new era of AI brinkmanship.” The company’s success was at least partially to blame for Monday’s 18% decline in Nvidia’s stock price and for prompting OpenAI CEO Sam Altman to address the public.
It’s unclear what the future holds for DeepSeek. Better models are inevitable. However, it seems that the U.S. administration is becoming more cautious about what it considers to be detrimental foreign influence.
TechCrunch has a newsletter about AI! To receive it in your inbox every Wednesday, sign up here.
SOURCE: TECH CRUNCH