The scientific community, industry, and general public have become increasingly concerned about a lack of
replicability among published discoveries. Prominent institutions and journals have published policies on the
problem, yet there is still confusion and debate regarding solutions. This proposal presents a statistical
approach to the challenge of research replicability. The approach will help circumvent extensive and costly
efforts and delays in the initial reporting of important findings caused by conventional solutions, while
facilitating changes in how scientists evaluate and communicate research. Existing solutions include
standardization with proof of replication across laboratories, systematic variation and heterogenization of
experiments within laboratories, aggregation of convergent evidences and meta-analysis of highly
heterogeneous data. These solutions are not always practical or even feasible, adding significant cost, time
and complexity to the execution of experimental work. Furthermore, in the face of rigorous standardization,
there can be discrepancies in results obtained between laboratories due to normal, unavoidable variation
among laboratories in which studies are executed. We propose a solution that involves community data
sharing built around the Mouse Phenome Database (MPD) to estimate and model the impact of laboratory
variation on replicability. This project uses data-driven and informatics-based approaches that exploit public,
large-scale, heterogeneous and complex data. The proposed project will advance knowledge by using
practical and rigorous quantitative and statistical approaches to develop methods for investigators to
evaluate replicability of their results prior to publication. The project is intended to provide an approach,
guidelines and publicly available data resources to reduce the number of irreproducible studies that are
published and improperly used as foundational research, ultimately restoring confidence in the public's
investment in research through timely, cost-effective improvements in the scientific process. Validation
studies will include analysis of data from multi-lab replication of behavioral genetics experiments, such as
those that are being generated by addiction scientists in NIDA Centers of Excellence. The approach is
readily extended to research in the natural and physical sciences far beyond behavioral genetics. The
publicly available datasets and methods will enable other biostatisticians, including trainees and early career
scientists, to model the problem of laboratory variation and replicability within their own research endeavors.