The Arabic RAG Leaderboard
The only leaderboard you will require for your RAG needs 🏆
For technical details, check our blog post here.
10 | 72.86 | 1,381.55 | 32,768 | 1,024 | 83.38 | 62.33 |
Evaluation Status
Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet | cc-by-nc-4.0 | dcf86e284785c825570c5fd512ddd682b386fa3d | bf16 | 2209 |
ALJIACHI/bte-base-ar | mit | main | f32 | 149 |
Abdelkareem/abjd | main | f32 | 438 | |
Abdelkareem/ara-qwen3-18 | main | f32 | 438 | |
Abdelkareem/zaraah_jina_v3_64D | mit | main | f32 | 16 |
Abdelkareem/zaraah_jina_v3 | mit | main | f32 | 64 |
AhmedZaky1/DIMI-embedding-v2 | main | f32 | 305 | |
AhmedZaky1/DIMI-embedding-v4 | main | f32 | 305 | |
AhmedZaky1/arabic-bert-nli-matryoshka | apache-2.0 | main | f32 | 135 |
AhmedZaky1/arabic-bert-sts-matryoshka | main | f32 | 135 | |
AhmedZaky1/dimi-sts-matryoshka-arabic-v2 | main | f32 | 135 | |
Alibaba-NLP/gme-Qwen2-VL-2B-Instruct | apache-2.0 | main | f32 | 2209 |
Alibaba-NLP/gte-multilingual-base | apache-2.0 | main | f16 | 305 |
NAMAA-Space/AraModernBert-Base-STS | apache-2.0 | main | f32 | 149 |
OmarAlsaabi/e5-base-mlqa-finetuned-arabic-for-rag | main | f32 | 278 | |
Omartificial-Intelligence-Space/Arabert-all-nli-triplet-Matryoshka | apache-2.0 | main | f32 | 135 |
Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet | apache-2.0 | main | f32 | 118 |
Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2 | apache-2.0 | main | f32 | 135 |
Omartificial-Intelligence-Space/Arabic-labse-Matryoshka | apache-2.0 | main | f32 | 471 |
Omartificial-Intelligence-Space/GATE-AraBert-v1 | Open | main | f16 | |
Qwen/Qwen3-Embedding-0.6B | apache-2.0 | main | bf16 | 596 |
Snowflake/snowflake-arctic-embed-l-v2.0 | apache-2.0 | dcf86e284785c825570c5fd512ddd682b386fa3d | f32 | 568 |
Snowflake/snowflake-arctic-embed-m-v2.0 | apache-2.0 | 95c2741480856aa9666782eb4afe11959938017f | f32 | 305 |
ibm-granite/granite-embedding-107m-multilingual | apache-2.0 | 5c793ec061753b0d0816865e1af7db3f675d65af | bf16 | 107 |
ibm-granite/granite-embedding-278m-multilingual | apache-2.0 | 6ecb2d4c423c03f21c3a511f4227ee7d10d0facd | bf16 | 278 |
ibm-granite/granite-embedding-278m-multilingual | apache-2.0 | main | bf16 | 278 |
intfloat/multilingual-e5-base | mit | main | i64 | 278 |
intfloat/multilingual-e5-large-instruct | mit | main | f16 | 560 |
jinaai/jina-colbert-v2 | cc-by-nc-4.0 | main | bf16 | 559 |
jinaai/jina-embeddings-v3 | cc-by-nc-4.0 | main | bf16 | 572 |
metga97/Modern-EgyBert-Base | main | f32 | 160 | |
metga97/Modern-EgyBert-Embedding | main | f32 | 159 | |
mixedbread-ai/mxbai-embed-large-v1 | apache-2.0 | main | f16 | 335 |
mohamed2811/Muffakir_Embedding_V2 | main | f32 | 362 | |
mohamed2811/Muffakir_Embedding | main | f32 | 135 | |
nomic-ai/nomic-embed-text-v2-moe | apache-2.0 | main | f32 | 475 |
omarelshehy/Arabic-Retrieval-v1.0 | apache-2.0 | main | f32 | 135 |
omarelshehy/Arabic-STS-Matryoshka-V2 | Open | main | f16 | |
omarelshehy/Arabic-STS-Matryoshka-V2 | open | main | f32 | 135 |
omarelshehy/Arabic-STS-Matryoshka | apache-2.0 | main | f32 | 560 |
omarelshehy/arabic-english-sts-matryoshka-v2.0 | open | main | f32 | 560 |
omarelshehy/arabic-english-sts-matryoshka | apache-2.0 | main | f32 | 560 |
sayed0am/arabic-english-bge-m3 | mit | main | f32 | 362 |
sentence-transformers/LaBSE | apache-2.0 | main | i64 | 471 |
sentence-transformers/all-MiniLM-L6-v2 | apache-2.0 | main | i64 | 23 |
sentence-transformers/all-mpnet-base-v2 | apache-2.0 | main | i64 | 109 |
silma-ai/silma-embeddding-matryoshka-v0.1 | apache-2.0 | main | f32 | 135 |
silma-ai/silma-embeddding-sts-v0.1 | apache-2.0 | main | f32 | 135 |
Abdelkareem/zaraah_jina_v3_256D_int8 | mit | main | f32 | 475 |
Abdelkareem/khatib-v0.1 | main | f32 | 475 | |
Abdelkareem/zaraah_jina_v3_16D_int8 | mit | main | i8 | 4 |
Abdelkareem/zaraah_jina_v3_256D_int8 | mit | main | i8 | 64 |
Abdelkareem/zaraah_jina_v3_32D_int8 | mit | main | i8 | 8 |
Abdelkareem/zaraah_jina_v3_4D_int8 | mit | main | i8 | 1 |
Abdelkareem/zaraah_jina_v3_64D_int8 | mit | main | i8 | 16 |
Abdelkareem/zaraah_jina_v3_8D_int8 | mit | main | i8 | 4 |
AhmedZaky1/DIMI-embedding-v1 | main | f32 | 305 |
About Retrieval Evaluation
The retrieval evaluation assesses a model's ability to find and retrieve relevant information from a large corpus of Arabic text. Models are evaluated on:
Web Search Dataset Metrics
- MRR (Mean Reciprocal Rank): Measures the ranking quality by focusing on the position of the first relevant result
- nDCG (Normalized Discounted Cumulative Gain): Evaluates the ranking quality considering all relevant results
- Recall@5: Measures the proportion of relevant documents found in the top 5 results
- Overall Score: Combined score calculated as the average of MRR, nDCG, and Recall@5
Model Requirements
- Must support Arabic text embeddings
- Should handle queries of at least 512 tokens
- Must work with
sentence-transformers
library
Evaluation Process
- Models process Arabic web search queries
- Retrieved documents are evaluated using:
- MRR for first relevant result positioning
- nDCG for overall ranking quality
- Recall@5 for top results accuracy
- Metrics are averaged to calculate the overall score
- Models are ranked based on their overall performance
How to Prepare Your Model
- Ensure your model is publicly available on HuggingFace Hub (We don't support private model evaluations yet)
- Model should output fixed-dimension embeddings for text
- Support batch processing for efficient evaluation (this is default if you use
sentence-transformers
)
10 | 87.44 | 2,165.81 | 40,960 | 1,024 | 81.27 | 87.05 |
1 | 87.44 | 2165.81 | 8192 | 1024 | 81.27 | 87.05 | |
2 | 85.82 | 2165.81 | 8192 | 1024 | 80.18 | 85.19 | |
3 | 85.03 | 582.44 | 8192 | 768 | 76.76 | 86.05 | |
4 | 83.96 | 515.72 | 512 | 768 | 77.02 | 77.61 | |
5 | 82.28 | 515.72 | 512 | 768 | 76.43 | 75.05 | |
6 | 79.93 | 515.72 | 512 | 768 | 74.08 | 85 | |
7 | 76.26 | 568.19 | 8192 | 768 | 71.68 | 67.06 | |
8 | 68.89 | 421.97 | 512 | 768 | 63.6 | 63.27 | |
9 | 59.62 | 127.26 | 512 | 384 | 62.81 | 60.42 | |
10 | 56.75 | 515.72 | 512 | 768 | 51.19 | 58.99 | |
11 | 53.46 | 1067.91 | 512 | 1024 | 52.6 | 67.73 | |
12 | 52.66 | 515.72 | 512 | 768 | 53.68 | 56.13 | |
13 | 52.44 | 417.64 | 512 | 768 | 47.92 | 76.33 | |
14 | 51.83 | 1060.65 | 512 | 768 | 50.2 | 55.01 | |
15 | 51.55 | 1164.89 | 8192 | 768 | 50.83 | 53.54 | |
16 | 50.7 | 2165.81 | 8192 | 1024 | 47.45 | 49.72 | |
17 | 49.99 | 530.33 | 512 | 768 | 48.65 | 52.75 | |
18 | 49.88 | 1798.7 | 256 | 768 | 51.04 | 52.54 | |
19 | 49.74 | 1798.7 | 256 | 768 | 48.34 | 60.29 | |
20 | 49.55 | 2135.81 | 512 | 1024 | 48.29 | 49.86 | |
21 | 49.33 | 515.72 | 512 | 768 | 50.03 | 49.1 | |
22 | 48.68 | 1409.24 | 512 | 1024 | 46.56 | 45.88 | |
23 | 48.18 | 1136.35 | 40960 | 1024 | 46.14 | 53.24 | |
24 | 47.93 | 448.81 | 128 | 384 | 50.18 | 43.49 | |
25 | 47.86 | 1060.65 | 128 | 768 | 47.51 | 41.96 | |
26 | 47.73 | 1381.55 | 8192 | 1024 | 47.98 | 54.45 | |
27 | 47.37 | 2135.81 | 512 | 1024 | 46.48 | 47.87 | |
28 | 44.05 | 515.72 | 512 | 768 | 47.17 | 54.47 | |
29 | 43.7 | 515.73 | 512 | 768 | 47.98 | 50.42 |
Evaluation Status
Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet | cc-by-nc-4.0 | main | bf16 | 150 |
ALJIACHI/Mizan-Rerank-v1 | apache-2.0 | main | f32 | 150 |
AhmedZaky1/DIMI-embedding-v4 | main | f32 | 305 | |
Alibaba-NLP/gte-multilingual-reranker-base | apache-2.0 | main | f16 | 306 |
BAAI/bge-reranker-v2-m3 | apache-2.0 | main | f32 | 568 |
Lajavaness/bilingual-embedding-large | apache-2.0 | main | f32 | 560 |
NAMAA-Space/GATE-Reranker-V1 | apache-2.0 | main | f32 | 135 |
NAMAA-Space/Namaa-ARA-Reranker-V1 | apache-2.0 | main | f32 | 568 |
OmarAlsaabi/e5-base-mlqa-finetuned-arabic-for-rag | Open | main | f16 | |
OmarAlsaabi/e5-base-mlqa-finetuned-arabic-for-rag | main | f32 | 278 | |
Omartificial-Intelligence-Space/Arabic-MiniLM-L12-v2-all-nli-triplet | apache-2.0 | main | f32 | 118 |
Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2 | apache-2.0 | main | f32 | 135 |
Omartificial-Intelligence-Space/Arabic-all-nli-triplet-Matryoshka | apache-2.0 | main | f32 | 278 |
Omartificial-Intelligence-Space/Arabic-labse-Matryoshka | apache-2.0 | main | f32 | 471 |
OrdalieTech/Solon-embeddings-large-0.1 | mit | main | f32 | 560 |
Qwen/Qwen3-Reranker-0.6B | apache-2.0 | main | bf16 | 596 |
Snowflake/snowflake-arctic-embed-l-v2.0 | apache-2.0 | main | f32 | 568 |
anondeb/arabertv02_reranker_2021 | cc-by-nc-4.0 | main | f32 | 135 |
asafaya/bert-base-arabic | main | f32 | 111 | |
aubmindlab/bert-base-arabert | main | f32 | 136 | |
aubmindlab/bert-large-arabertv2 | main | i64 | 371 | |
colbert-ir/colbertv2.0 | mit | main | i64 | 110 |
cross-encoder/ms-marco-MiniLM-L-12-v2 | apache-2.0 | main | i64 | 33 |
intfloat/multilingual-e5-large-instruct | mit | main | f16 | 560 |
mohamed2811/Muffakir_Embedding_V2 | main | f32 | 362 | |
mohamed2811/Muffakir_Embedding | main | f32 | 135 | |
oddadmix/arabic-reranker-v1 | main | f32 | 135 | |
omarelshehy/Arabic-Retrieval-v1.0 | apache-2.0 | main | f32 | 135 |
sentence-transformers/LaBSE | apache-2.0 | main | i64 | 471 |
silma-ai/silma-embeddding-matryoshka-v0.1 | apache-2.0 | main | f32 | 135 |
Abdelkareem/zaraah_jina_v3_256D_int8 | cc-by-nc-4.0 | main | bf16 | 475 |
Abdelkareem/khatib-v0.1 | main | f32 | 475 | |
Abdelkareem/zaraah_jina_v3_16D_int8 | mit | main | i8 | 4 |
Abdelkareem/zaraah_jina_v3_256D_int8 | mit | main | i8 | 64 |
Abdelkareem/zaraah_jina_v3_32D_int8 | mit | main | i8 | 8 |
Abdelkareem/zaraah_jina_v3_4D_int8 | mit | main | i8 | 1 |
Abdelkareem/zaraah_jina_v3_64D_int8 | mit | main | i8 | 16 |
Abdelkareem/zaraah_jina_v3_8D_int8 | mit | main | i8 | 4 |
jinaai/jina-embeddings-v3 | cc-by-nc-4.0 | main | bf16 | 572 |
About Reranking Evaluation
The reranking evaluation assesses a model's ability to improve search quality by reordering initially retrieved results. Models are evaluated across multiple unseen Arabic datasets to ensure robust performance.
Evaluation Metrics
- MRR@10 (Mean Reciprocal Rank at 10): Measures the ranking quality focusing on the first relevant result in top-10
- NDCG@10 (Normalized DCG at 10): Evaluates the ranking quality of all relevant results in top-10
- MAP (Mean Average Precision): Measures the overall precision across all relevant documents
All metrics are averaged across multiple evaluation datasets to provide a comprehensive assessment of model performance.
Model Requirements
- Must accept query-document pairs as input
- Should output relevance scores for reranking (has cross-attention or similar mechanism for query-document matching)
- Support for Arabic text processing
Evaluation Process
- Models are tested on multiple unseen Arabic datasets
- For each dataset:
- Initial candidate documents are provided
- Model reranks the candidates
- MRR@10, NDCG@10, and MAP are calculated
- Final scores are averaged across all datasets
- Models are ranked based on overall performance
How to Prepare Your Model
- Model should be public on HuggingFace Hub (private models are not supported yet)
- Make sure it works coherently with
sentence-transformers
library