Sumy - module for automatic summarization of text documents and HTML pages. | sumy

submited by
Style Pass
2021-05-30 05:30:05

Sumy was created as my diploma thesis and the need for articles length reduction in Czech/Slovak language. Although it’s source code was always available publicly on Github I didn’t expect to adopt it by so many people. Don’t get me wrong. I am happy for it, but that’s also why the lack of documentation and sometimes hardcoded features for Slovak/Czech languages may be found in the codebase. Because the thesis is written in the Slovak language I will try to write some practical parts here for people using it.

Sumy is able to create extractive summary. That means that it tries to find the most significant sentences in the document(s) and compose it into the shortened text. There is another approach called abstractive summary but to create it one needs to understand the topic and create new shortened text from it. This is out of the scope of Sumy’s current capabilities.

Even I focused on Czech/Slovak language in my work I wanted Sumy to be extendable for other languages from the start. That’s why I created it as a set of independent objects that can be replaced by the user of the library to add better or new capabilities to it.

Leave a Comment