Mixtral 8x22B

141B

Sparse mixture-of-experts model with 141B total / 39B active parameters. Outstanding efficiency.

open-sourcemoeefficientapache-2.0code

Architecture

mistral

Parameters

141B

Context Length

65,536 tokens

License

Apache 2.0

About Mixtral 8x22B

Mixtral 8x22B uses a sparse MoE architecture with 8 experts, activating only 2 per token. This provides the quality of a much larger model with the speed of a smaller one.

Author

Community

Downloads

1,876,543

License

Apache 2.0

Back to Models

Mixtral 8x22B

141B

Sparse mixture-of-experts model with 141B total / 39B active parameters. Outstanding efficiency.

open-sourcemoeefficientapache-2.0code

Architecture

mistral

Parameters

141B

Context Length

65,536 tokens

License

Apache 2.0

About Mixtral 8x22B

Mixtral 8x22B uses a sparse MoE architecture with 8 experts, activating only 2 per token. This provides the quality of a much larger model with the speed of a smaller one.

Author

Community

Downloads

1,876,543

License

Apache 2.0

Mixtral 8x22B

Author

Category

Downloads

License

Mixtral 8x22B

Author

Category

Downloads

License