This project is made to make it as easy as possible to scrape presentations (pptx files) from the internet and extract their text (body and notes). It

yazeed44 / pptxer

submited by
Style Pass
2021-06-17 18:30:02

This project is made to make it as easy as possible to scrape presentations (pptx files) from the internet and extract their text (body and notes). It can be used in Python or command line.

This mode scrapes presentations that contains a specific keywords from search engines, and downloads them to a directory. The texts within these files will be extracted automatically unless otherwise specified.

This mode extracts texts from pptx files and outputs a dict with each slide body and note texts. If command line is used then a json file will be outputted.

As of now, we're using third-party search engines to look up files, and almost all search engines throttle or soft ban if they detected automated queries coming from your IP. The soft ban usually lasts about a day, and you will not be able to use pptxer in meanwhile, but you can use any search engines on your browser normally.

Leave a Comment
Related Posts