A fingerprint is a molecular representation that omits certain kinds of structural information with the goal of increasing computational speed. The success of this approach is evidenced by numerous modern applications ranging from structure search to property prediction. A good fingerprint trades just enough structural information to achieve the desired computational goal, so flexibility matters. Given the versatility and power of fingerprints, it's not surprising to find a continuous line of research extending back more than five decades. What is surprising, however, is how much of the early research has been forgotten — and perhaps prematurely so. This article presents a case in point.
A Penny code, first described by Robert Penny in 1965, represents the extended connectivity environment around an individual atom within a molecular graph. All of the atoms located within a three-atom radius of perception contribute to the code, but in different ways. A molecular Penny code is the set of all atomic Penny codes for a given molecular graph.
Penny codes take as their starting point the graph theoretical construct of the planted tree. A planted tree is a tree whose root node has degree one. The planted trees used to compute Penny codes are comprised of up to four generations: one parent (root), one child, zero or more grandchildren, and zero or more great-grandchildren.