Compute Visual Similarity of Top-Level Domains

submited by
Style Pass
2024-04-23 14:00:09

Compute the visual similarity between a possible new Generic Top-Level Domain (TLD) and other proposed TLDs, current TLDs, and reserved words.

This web site describes experimental software developed at the National Institute of Standards and Technology (NIST). No algorithms, code, or descriptions in whole or in part are recommended, used, or endorsed by the Internet Corporation for Assigned Names and Numbers (ICANN) or any other entity.

Computers connect the Internet with numbers, IP addresses. However, people use strings ending in short segments like .edu, .uk, .tv, and .com, called Top-Level Domains (TLDs) to navigate the World Wide Web. Ensuring the ongoing security and stability of the Domain Name System is one activity of the Internet Corporation for Assigned Names and Numbers (ICANN). With the growth of the Web, there is a possibility of a lot of new TLDs. As one method to implement the recommendation that, "Strings must not be confusingly similar to an existing top-level domain ...", this web page invokes an algorithm developed at NIST "to provide an open, objective, and predictable mechanism for assessing the degree of visual confusion" between proposed or existing TLDs.

The algorithm takes into account varying degrees of similarity between some 60 pairs, like 0 and O (zero and oh), 1 and l (one and L), Z and 2, h and n, rn (R and N) and m, and w and vv (v v). Even insertions or deletions may cause confusion, for example .aaaah and .aaaaah look very much alike. The task is all the more challenging because domain names are not case sensitive.

Leave a Comment