AI Aims For Perfect Coding And With Multiple Solutions

Meet pass@k

Photo by ThisisEngineering RAEng on Unsp

AI Aims For Perfect Coding And With Multiple Solutions

Meet pass@k

And, so, the AI bot are fighting with each to be the one that gets everything right. So, no matter what problem we give them, they are competing to show that they can cope with your complex world of learning. The two core data sets used for the problems are HumanEval pass@1 accuracy and MBPP Pass@1.

With this, we have the pass@k metric, which can relate to a dataset of hand-written problems (“HumanEval”). The AI bot then reads the challenge and creates code. This code is then evaluated within a sandboxed environment for its success. The k element [2] relates to the generation of k code samples for every problem. Thus pass@1 produces one solution, while pass@10 produces 10 solutions.

Three examples of coding challenges for HumanEval pass@1 are:

At the current time, the leaderboard is [here]:

The leader is Reflexion: Language Agents with Verbal Reinforcement Learning, and which is based on GPT-4 [here]:

From the leaderboard, we can see the top contenders do well in getting one solution, but we see that CODE-T, is able to produce the best score for 10 solutions (Pass@10). With this, we see that the actual pass@1 rate is not as high, but it can produce correct solutions (as 10 are created for every solution) [here]:

and for 100 solutions, we see that PaLM 2-S has the best results [here]:

MIT recently found that they achieved 100% success for every question in the module assessments:

GPT-4 Is The Perfect Student: Will Education As We Know It, Come Crumbling Down?
And, so, we all know that the days of machines generating text which is poorly defined is gone. With Chat-GPT we get…medium.com

The whole code writing integration will thus make development so much easier:

Conclusions

It is happening. It’s Us v Them. I think I know who is going to win.

Reference

[1] Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. D. O., Kaplan, J., … & Zaremba, W. (2021). Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.

[2] Kulal, S., Pasupat, P., Chandra, K., Lee, M., Padon, O., Aiken, A., & Liang, P. S. (2019). Spoc: Search-based pseudocode to code. Advances in Neural Information Processing Systems, 32.