OpenAI improves the reasoning power of o3 multimodal model

图片2

On June 4, the o3 reasoning model launched by OpenAI broke the boundaries of the traditional text thinking chain – the multimodal model directly integrated images into the reasoning process for the first time.

According to reports, it not only “looks at pictures”, but also “thinks with pictures”, opening up a problem-solving method that deeply integrates visual and text reasoning. This “Thinking with Images” ability has enabled o3 to achieve an accuracy rate of 95.7% on the visual reasoning benchmark V* Bench, refreshing the reasoning upper limit of multimodal models.

DeepSeek R1 model update
Not only that, DeepSeek recently announced the completion of the R1 model minor version trial upgrade, and invited users to test the official website, APP and mini-program functions, and the API interface and usage remain unchanged.

R1 is based on the reproduction of DeepSeek-V3 model capabilities, while R2 may need to wait for the successful development of V4. The new version of the model uses an average of 23K tokens per question, a significant increase from the old version of 12K. Nvidia CEO Jensen Huang predicts that Agentic AI will drive computing power demand to surge by at least 100 times.

In addition, DeepSeek distilled DeepSeek-R1-0528-Qwen3-8B, which performed second only to DeepSeek-R1-0528 in the AIME 2024 math test, surpassing Qwen3-8B and with the same accuracy as Qwen3-235B. After enhanced post-training, the hallucination rate was reduced by 45% to 50%. Previously, R1 was criticized for its high hallucination rate.

DeepSeek said that this update improved the model’s thinking depth and reasoning ability through post-training, although there is still room for evolution in capabilities such as tool calls. Tencent (TCEHY) responded quickly to the R1 update, and many products were connected to DeepSeek-R1-0528.

Today, open source and open protocols have emerged as new AI competitiveness. DeepSeek’s open source success has prompted the industry to lean towards open source, and OpenAI is also considering open source. Many companies have already started the open source strategy. At the same time, the open protocol of large models is like the Internet HTTP protocol, which allows large models to easily call tools and complete various tasks.

WiMi opens up a new industrial pattern
According to the data, 5G+AI visual manufacturer Wimi Hologram Cloud Inc (WIMI) has fully accelerated the iteration of large model technology and industrial implementation. Focusing on the large model strategy, it has comprehensively upgraded the AI ​​matrix, actively adopted the “self-research + embrace open source” dual-track model, focused on the layout of multimodal large models (text, image, audio, video native fusion), and plans to provide real-time multimodal AI model experience.

In the industry ecology, WiMi improves multimodal data processing capabilities, strengthens the application potential of commercial scenarios, and strives to accelerate the integration of “model + application”. For developers, it provides multimodal interaction prediction needs and integrated software and hardware open source application solutions, which are expected to achieve further leaps in application fields such as full sensory interaction, scenario memory, and distributed collaboration. At the same time, it uses low-cost, high-performance multimodal models to lower the threshold for developers and promote the prosperity of the application ecology.

Conclusion
It is worth mentioning that many professionals have said that studies have shown that since this year, domestic and foreign technology giants have bet on AI Agent in the AI ​​industry. Technology, ecology, market, policy and other factors have prompted the current focus of AI development to evolve from large models to intelligent entities.

In short, open source technology has accelerated the development of the industry ecosystem. It not only lowers the training threshold, but also significantly improves generalization ability and overall performance, providing a practical new path for multimodal intelligent exploration in the open world. In addition, these trends in the second half of the big model also point out the direction for technological development and industry change. Enterprises and developers need to keep up with the trend, seize opportunities, meet challenges, find their position in the new era driven by big models, and achieve innovative development.

Eric Lee

Eric Lee

Eric Lee is a speaker, business advisor, and authority in financial field.In his diverse and accomplished career, he has been traintee in a small company at the high school and college levels, worked as an economic analysist in a listed company , and worked in management as a VP at the corporate level, overseeing agencies throughout North America.