Project Summary
Increasing attention is being cast toward high list prices for health care services given uninsured patients'
prospects of potentially paying such prices on their own. The recent emergence of the price transparency move-
ment in medicine has contributed to a surge of interest in prices charged for and payments made for health care
services. Therefore, it is extremely important for both insured patients and uninsured patients to have first-hand
information of the charge-to-payment ratios of health care services.
Physician and Other Supplier Public Use File (PUF) provides information on services and procedures pro-
vided to Medicare beneficiaries by physicians and other healthcare professionals. The Physician and Other
Supplier PUF contains information on utilization, payment (allowed amount and Medicare payment), and sub-
mitted charges organized by National Provider Identifier (NPI), Healthcare Common Procedure Coding System
(HCPCS) code, and place of service. The currently available data in the Physician and Other Supplier PUF
covers calendar years 2012 through 2015. These growing, large amount of data provide us an unprecedented
opportunity to examine the charge-to-payment ratios of health care services.
With opportunity comes with challenges. The year 2012 dataset is of size 1.7GB and contains more than
9 million records, 2013 dataset 1.7GB and 9 million records, 2014 dataset 1.9GB and 9 million records, and
2015 dataset 2.0GB and 10 million records. With years to come, new datasets will be available. However,
scalable statistical methods for analyzing such growing large-scale data are lacking. In this projet, we will develop
novel scalable statistical methods and scalable inference procedures for analyzing growing large-scale data. In
particular, we will develop quantile regression approaches via stochastic gradient decent algorithms, along with
scalable inference procedures based on random perturbation. Moreover, computation implementation algorithms
will be proposed and theoretical properties will be derived.
The results from this project will benefit 27 million uninsured Americans by providing them the charge-to-
payment ratios of health care services. At the same time, the project will expose graduate students in the Depart-
ment of Mathematical Sciences at New Jersey Institute of Technology to the research of large-scale data and big
data analyses, and it will strengthen the Masters Program of Data Science, a brand-new program jointly formed
by the Departments of Mathematical Sciences and Computer Science at New Jersey Institute of Technology.