What would you do if you wanted to change a particular code from one programming language to another? It’s simple enough if it is a small code and you can just write the core logic in a different language. But what about large companies that have a codebase consisting of millions of lines of code? They can’t just hire someone to easily covert these lines from one language to another! This process is expensive and it may take multiple years and millions of dollars. But this is also a necessary process for companies sometimes. For example, If a company has a codebase in an older language, then they need to change this into a newer and more relevant language. In fact, the Commonwealth Bank of Australia spent $750 million for five years starting in 2012 to convert their codebase from COBOL to Java.
It would have been much easier if they could have used a trans compiler to automatically convert code from one programming language to another instead of starting from scratch. But that’s not easy! All programming languages have different syntax, variable types, standard library functions, etc. and so it’s not a piece of pie to convert code automatically. Luckily for future companies looking to covert their codebase from a legacy language to a more modern one, Facebook has just announced the creation of a TransCoder that can convert code from a programming language like C++, Java, and Python into another language.
What is TransCoder AI?
The TransCoder AI uses an unsupervised learning algorithm in Machine Learning to translate code between C++, Java, and Python. This algorithm identifies the common elements, known as tokens, between the input and output languages. These tokens can include common keywords such as “for,” “if,”, “else”, “while,” “try”, etc. and also mathematical digits and operators that are common no matter the language. Some other tokens are the common strings that are a part of the code itself.
The algorithm also uses back translation to improve the translation quality of the TransCoder. This means that source code to target code model and target code to source code model are trained simultaneously and then coupled together to create the final output. So a target code to source code model creates the code in the source language from the target language and then the source code to target code model converts this into the target language and this process is repeated till the code obtained for the target language is the same as the original target language code.
Performance of TransCoder AI
Facebook researchers trained the TransCoder AI algorithm using code from GitHub repositories. There are more than 2.8 million open-source repositories that focus on translating function from one programming language to another. So researchers created around 6,000 tokens or common elements in the programming languages and used these to train the TransCoder AI algorithm to translate the functions.
After training the algorithm, Facebook researchers tested its accuracy using 852 parallel functions in C++, Java, and Python from GeeksforGeeks! Since GeeksforGeeks has almost all functions available in multiple languages like C, C++, Java, C#, Python, etc. it was the perfect place to obtain function code for a source language and then check if the code generated in the target language by the TransCoder AI algorithm matched the code available on GeeksforGeeks. A new metric known as computational accuracy was used to gauge the accuracy of the algorithm while translating a function code from one programming language to another.
The computational accuracy obtained while translating in between C++, Java, and Python is given here:
- Computational Accuracy of C++ to Java: 74.8%
- Computational Accuracy of C++ to Python: 67.2%
- Computational Accuracy of Java to C++: 91.6%
- Computational Accuracy of Java to Python, 68.7%
- Computational Accuracy of Python to Java: 56.1%
- Computational Accuracy of Python to C++: 57.8%
Facebook researchers concluded that while many source functions translated by the TransCoder AI algorithm were not perfectly accurate, the Computational Accuracy was still relatively high as compared to previous attempts. The TransCoder was able to understand and differentiate between the syntax of all the languages and it assigned the correct data structures, methods, and libraries in the source code of the target language as compared to the source language. Researchers also claimed that the TransCoder could easily be generalized to any programming language apart from C++, Java, and Python without any expert knowledge. All in all, this experiment was a big success and it definitely outperformed the current commercial solutions to convert from one language to another manually.
- Best 5 Programming Languages For a Getting a Job
- Top 10 Programming Languages of 2015
- Introduction to Programming Languages
- 5 Best Programming Languages For Newbies
- Popular Programming Languages Supported by AWS
- 5 Most Difficult Programming Languages of the World
- Top 10 Programming Languages for Blockchain Development
- Top 5 Most Loved Programming Languages in 2020
- Comparing Ruby with other programming languages
- Top 10 Best Embedded Systems Programming Languages
- Programming languages one should learn in 2018
- Top Programming Languages for Android App Development
- A Categorical List of programming languages
- Top 10 Programming Languages of the World – 2019 to begin with…
- Top Programming Languages for Data Science in 2020
- Top 5 best Programming Languages for Artificial Intelligence field
- Difference Between Programming, Scripting, and Markup Languages
- Top 5 Programming Languages and their Libraries for Machine Learning in 2020
- Socket Programming in C/C++: Handling multiple clients on server without multi threading
- What is Google Code-In?
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.