Recently Updated
benchmarks 39
- MHPP May 19, 2024
- EvoCodeBench Mar 31, 2024
- LiveCodeBench Mar 12, 2024
- CodeAgentBench Jan 14, 2024
- OOP Jan 12, 2024
- DevEval Jan 12, 2024
- CRUXEval Jan 5, 2024
- TACO Dec 22, 2023
- CodeFuseEval Nov 24, 2023
- SWE-bench Oct 10, 2023
- HumanEval+ May 2, 2023
- CoderEval Feb 1, 2023
- HumanEval-ET MBPP-ET Jan 22, 2023
- ODEX Dec 20, 2022
- Vgen Dec 13, 2022
- DS-1000 Nov 18, 2022
- SecurityEval Nov 9, 2022
- TorchDataEval Oct 31, 2022
- MBXP Oct 26, 2022
- MBXP-MathQA Oct 26, 2022
- MBXP-HumanEval Oct 26, 2022
- HumanEval-X Sep 19, 2022
- MultiPL-MBPP Aug 17, 2022
- MultiPL-HumanEval Aug 17, 2022
- HumanEval-Infilling Jul 28, 2022
- PandasEval Jul 14, 2022
- NumpyEval Jul 14, 2022
- CodeComplex Jun 1, 2022
- GSM8K-Python Apr 5, 2022
- BIG-Bench Apr 5, 2022
- MTPB Mar 25, 2022
- MCoNaLa Mar 16, 2022
- CodeContests Feb 8, 2022
- DSP Jan 30, 2022
- JigsawDataset Dec 6, 2021
- MBPP Aug 16, 2021
- HumanEval Jul 7, 2021
- APPS May 20, 2021
- Metrics Jan 1, 2000