We needed a way of extracting page information from a website (Including meta tags and the main information on a page) and then translating that infor

Got The Shirts – Page Translator

submited by
Style Pass
2021-07-21 07:30:07

We needed a way of extracting page information from a website (Including meta tags and the main information on a page) and then translating that information in to many languages.

We encountered a number of issues with the first being – How do we just extract the information pertaining to the page and not all the header information? The solution to that was by leveraging the existing nuget package BoilerPipe for .NET. This package essentially parses the website page and removes any clutter surrounding the main body of text such as navigation bars and advertisements etc.

Next was the challenge of translating the page information. Enter Amazon Translate which is incidentally allows translation of two million characters for free per month with the AWS Free Tier. If you wish to use the translation feature of this software then you will need to create an account and then create an AWSAccessKey and also an AWSSecretKey. Enter these two keys as the login and password in the Page Translator software and press the ‘Save’ button (Not this is optional, if you do not need the translation services then there is no need for any AWS Keys).

From there, enter in the top url the page where you would like to download information from and then press the Download button

Leave a Comment