Powerful new technologies are reshaping the biomedical research landscape, enabling scientists to map and decipher the 3 billion chemical letters that make up the human genome and unravel the molecular mysteries of all kinds of diseases. It is becoming possible to identify the hundreds of environmental stimuli and chemicals people are exposed to each day. Electronic medical records contain warehouses of patient information and clinical databases house details on genomic and environmental variants that can affect disease susceptibility.

While these technologies produce massive amounts of data, the pace of data generation has largely outstripped researchers’ ability to make sense of the results. Ideally, scientists could easily mine data from different sources to gain new insights into disease causes and biology as well as the relationship between disease biology and clinical signs and symptoms. However, disconnected data sources and lack of understanding of how disparate data types — such as genomic, cellular and patient — relate to each other has hindered the pace of progress.

To address this, NCATS launched the Biomedical Data Translator program, called “Translator” for short. This multiyear, iterative effort will culminate in the development of a relational, N-dimensional Biomedical Data Translator that integrates multiple types of existing data sources, including objective signs and symptoms of disease, drug effects, and intervening types of biological data relevant to understanding pathophysiology.