OpenAI o3 or DeepSeek-R1: Which Is the Better Reasoning Model?

Compare OpenAI o3 and DeepSeek-R1 on different reasoning tasks such as coding, logic and problem-solving to analyze their performance.

In previous lessons, we compared various aspects of DeepSeek models against other competitors, including OpenAI, Gemini, Llama, and Mistral models. In this lesson, we will conduct our own experiments, testing DeepSeek’s R1 and OpenAI’s o3-mini (high)—currently among the best models for coding and reasoning, as shown in our comparisons in the previous lessons.

We will run multiple experiments to evaluate both models in coding, logical reasoning, and STEM-based problem-solving. For each task, we will provide the same prompt to both models and analyze their responses.

Coding

Let’s start with a coding example. We want to create an interactive physics-based animation using JavaScript. The animation will simulate a galaxy of stars moving under the influence of gravity while incorporating dynamic behaviors such as merging, color blending, and supernova explosions.

The prompt is given below:

Prompt:

Generate a JavaScript animation that should simulate a galaxy of stars moving in a gravitational field inside a container with the following features:

  • Randomly placed stars with different masses and colors (white, blue, yellow, green, and red)

  • Gravity simulation: Stars attract each other based on a simple Newtonian gravity model

  • Star merging: If two stars get close enough, they merge into a larger star, blending their colors using additive color mixing

  • Supernova effect: When a star reaches a certain mass threshold, it explodes into multiple smaller stars

  • Smooth physics updates with realistic-looking gravitational motion

First of all, in terms of time, o3-mini-high took around 30 seconds to generate a response, whereas DeepSeek-R1 took almost 6 minutes. R1 kept on thinking and rethinking about the prompt. The slow response might frustrate some users.

Get hands-on with 1400+ tech skills courses.