Searching code on GitHub is currently limited to keyword search. This assumes either the user knows the syntax, or can anticipate what keywords might

Towards Natural Language Semantic Code Search | The GitHub Blog

submited by
Style Pass
2021-07-02 23:30:05

Searching code on GitHub is currently limited to keyword search. This assumes either the user knows the syntax, or can anticipate what keywords might be in comments surrounding the code they are looking for. Our machine learning scientists have been researching ways to enable the semantic search of code.

To fully grasp the concept of semantic search, consider the below search query, “ping REST api and return results”:

Note that the demonstrated semantic search returns reasonable results even though there are no keywords in common between the search query and the text (the code & comments found do not contain the words “Ping”, “REST” or “api”)! The implications of augmenting keyword search with semantic search are profound. For example, such a capability would expedite the process of on-boarding new software engineers onto projects and bolster the discoverability of code in general.

In this post, we want to share how we are leveraging deep learning to make progress towards this goal. We also share an open source example with code and data that you can use to reproduce these results!

Leave a Comment