BaseCamp Research is a London-based company whose work revolves around its proprietary biology database, BaseData™, built specifically for AI training1. Founded in 2019 by two University of Oxford alumni, the company received backing from major technology companies, including NVIDIA and Microsoft.
The database was designed to overcome the current limitations of public science databases, which were not built with AI modelling in mind. These include limited size, diversity, standardisation, metadata, biological context, and data sourcing clarity2. BaseCamp procured the data from a network of over 150 partners across the globe, engaging in equal and benefit-sharing partnerships. The data collection process follows a standardised protocol, designed with scalability, diversity, and quality in mind. The data are annotated to give taxonomic and evolutionary context, thereby enabling the models to learn from natural evolution.
BaseData™ is the largest biological database to date – x10 larger than its closest competitor2. It is also the fastest-growing one and is still expanding. At present, BaseCamp is enabling selected researchers and commercial partners to access its database. In the future, the company is likely to adopt a pay-for-access model.
Crucially, the company is using its own database to develop foundation AI models. These are models that were trained on rich data to learn the underlying pattern/structure and use this insight to perform generalised tasks. In this context, BaseCamp’s AI models learn the rules governing evolution and can perform a range of different tasks for bioscience R&D.
Several models have been developed so far. These include ZYMCTRL for enzyme design, BASEFOLD for predicting protein folding, and HIFI-NN for genome annotation. Most notably, the latest EDEN family of AI models can design therapeutics across multiple modalities and have received industry attention for their potential. One feat was the de novo design of enzymes capable of programmable gene insertion – the ability to insert large DNA fragments into user-defined sites in the genome. In lab tests, these enzymes were able to target over 10,000 disease-related sites3. With proven clinical efficacy and safety, these enzymes could offer a novel gene therapy option, potentially treating previously untreatable diseases. EDEN has also designed several antimicrobial peptides that could help combat the most clinically-critical multidrug-resistant pathogens.
BaseCamp Research is an interesting example of how entrepreneurs and businesses could adapt in the age of AI. The development and commercialisation of a database for AI training is akin to the ‘sell shovels during a gold rush’ strategy. But BaseCamp did not just stop at selling the shovels. The company identified the flaws of the traditional shovels, designed an ergonomic electric shovel, and uses its own shovel to dig for gold.
That said, the company is still in its early development stage. As of this writing, all scientific publications relating to the database and models are pre-clinical and pre-print (not yet peer-reviewed). Watchers of the space should continue to keep a keen eye on this venture for any further developments.