The T-cell receptor (TCR) repertoire of the peripheral T cells is sculpted from stochastically generated TCRs on developing thymocytes in the thymus by a series of selection steps that delete cells bearing TCRs not recognizing self-peptide–MHC (pMHC) complexes or recognizing them with high affinity, known as positive selection or negative selection. There are two models to explain how TCR signals determine selection outcome. In the threshold model, thymocytes bearing TCRs that signal above a negative selection threshold will be deleted, while T cells experiencing low to intermediate TCR signaling strengths will survive through positive selection. In the sustained signaling model, high-affinity and low-affinity interactions between TCRs and pMHC complexes trigger biochemically different signaling cascades; low-affinity TCRs induce sustained signaling while TCR signaling after high-affinity stimulation is intense but short . Little is known about how the TCR sequence determines the outcomes of selection. In the current issue of genes & immunity, Ostmeyer et al.  develop an approach to identify how TCR protein sequences influence the selection fate using machine learning.
A large number of functional T lymphocytes in the periphery contain both productive and non-productive TCR genes . The authors use those productive and non-productive TCRB genes from mature T cells to define unselected and selected repertoires, assuming that the sequence of the non-productive TCR protein is closely related to a non-selected TCR. Thus, the authors develop an algorithm to computationally repair non-productive TCR genes to obtain productive copies with the fewest alteration, which maximally preserve the original biological sequences. This approach allows to exclude known biases from VDJ recombination in the unselected repertoire from the model. Moreover, it does not rely on obtaining the repertoire of thymocytes expressing solely the TCRB gene, giving the approach potentially broader applicability. The authors used both sets of TCR protein sequences to train a machine-learning model. The model returns a probability of PSURVIVE to any TCRB sequence; with PSURVIVE > 0.5, the TCRB gene predicted to be a productive one and PSURVIVE < 0.5, the TCRB gene predicted to be repaired.