I use Grafana to create graphs that show me various business and performance metrics for Bank Statement Converter. One of the graphs I created tracks the number of Internal Server Errors the server returns to its clients. I do this by writing a record into the database whenever a 500 is sent to the client. This graph has been really helpful for ironing out bugs I didn’t anticipate. Last Thursday at 12:55 AM HKT my servers started throwing Java’s infamous OutOfMemoryErrors.
A few months ago I ran into a few of these errors and I handled them by upgrading my servers from 1 GB instances to 4 GB instances. Back then I was running tesseract on my servers to OCR image based PDFs. Tesseract uses quite a bit of RAM, so I figured I would need servers with more RAM. I recently replaced tesseract with Amazon’s textract, so I no longer needed extra RAM for OCRing images. When I saw these errors last week I thought “I’m running out of RAM on a 4 GB server? Surely 4 GB is enough to process a PDF file”. I decided to optimise my code instead of throwing more hardware at the problem.
When a user uploads a PDF and presses the convert button, the UI moves to the /converted page. At this page the UI calls an API which tries to automatically detect the transaction data in the PDF. If that API fails to find the transaction data, the UI moves to the /previewPDF page. At this page the user can select regions to extract. There’s a small bug here that causes the activity in the image below.