DeepSeek V3

671B

DeepSeek's 671B MoE model with 37B active parameters. Matches GPT-4o on many benchmarks.

open-sourcemoereasoningcode128k-context

Architecture

deepseek

Parameters

671B

Context Length

131,072 tokens

License

DeepSeek License

About DeepSeek V3

DeepSeek V3 uses a Mixture-of-Experts architecture with auxiliary-loss-free load balancing. Trained on 14.8T tokens with multi-token prediction, it achieves state-of-the-art performance.

Author

Community

Downloads

1,567,890

License

DeepSeek License

Back to Models

DeepSeek V3

671B

DeepSeek's 671B MoE model with 37B active parameters. Matches GPT-4o on many benchmarks.

open-sourcemoereasoningcode128k-context

Architecture

deepseek

Parameters

671B

Context Length

131,072 tokens

License

DeepSeek License

About DeepSeek V3

DeepSeek V3 uses a Mixture-of-Experts architecture with auxiliary-loss-free load balancing. Trained on 14.8T tokens with multi-token prediction, it achieves state-of-the-art performance.

Author

Community

Downloads

1,567,890

License

DeepSeek License

DeepSeek V3

Author

Category

Downloads

License

DeepSeek V3

Author

Category

Downloads

License