A while back I was researching the most efficient way to check if a number is prime. This lead me to find the following piece of code: I was intrigued

Demystifying The Regular Expression That Checks If A Number Is Prime

submited by

Style Pass

2024-10-31 11:30:15

A while back I was researching the most efficient way to check if a number is prime. This lead me to find the following piece of code:

I was intrigued. While this might not be the most efficient way, it’s certainly one of the less obvious ones, so my curiosity kicked in. How on Earth could a match for the .?|(..+?)\1+ regular expression tell that a number is not prime (once it’s converted to its unary representation)?

If you’re interested, read on, I’ll try to dissect this regular expression and explain what’s really going on. The explanation will be programming language agnostic, I will, however, provide Python, JavaScript and Perl versions of the Java code above and explain why they are slightly different.

I will explain how the regular expression ^.?$|^(..+?)\1+$ can filter out any prime numbers. Why this one and not .?|(..+?)\1+ (the one used in Java code example above)? Well, this has to do with the way String.matches() works, which I’ll explain later.

While there are some blog posts on this topic, I found them to not go deep enough and give just a high level overview, not explaining some of the important details well enough. Here, I’ll try to lay it out with enough detail so that anyone can follow and understand. The goal is to make it simple to understand for any one - whether you are a regular expression guru or this is the first time you’ve heard about them, anyone should be able to follow along.