Data Mining Medicare Data – What Can We Find?
Medicare released detailed reimbursement data for 2012: $77 billion paid to more than 880,000 health care providers, by doctor and procedure.We take an initial look and find large variances and potential indicators of fraud.
By Ran Bi, Apr 23, 2014.
The Medicare reimbursement data was publicly released this month.
It covers data for 2012, and includes information for the 100 most common inpatient services, 30 most common outpatient services, and all physician and other supplier procedures and services performed on 11 or more Medicare beneficiaries. The data details the payments individual providers received from Medicare for the treatments they administered to America’s seniors and certain younger people with disabilities.
We can see how Medicare program paid out $77 billion to more than 880,000 health care providers. The data also shows the providers' names, addresses, specialties, billing rates, averaged reimbursed amount, number of Medicare beneficiaries served and number of services provided for every Medicare provider.
The map I created with cartodb shows distribution of payments by states, with Florida, California, Texas, Alabama having highest reimbursements. Link: http://cdb.io/1f2oLWY
Medicare data release can lead to a better understanding of health costs, and providers can deliver better care. The accuracy and transparency of the process will be improved by making it public.
One clear finding is that the variance in reimbursements is very large. It ranges from nearly $21 million to a single Florida ophthalmologist to the $2,984 for the average certified nurse midwife. The average amount paid per provider varies a lot across the specialties. The three specialties with highest average amounts were clinical laboratory at $1,758,701, radiation therapy at $1,293,347, and Portable X-ray at $707,306.
The data could also be used to identify potential fraud or waste. Although the number alone does not tell a story, the insiders will see whether the physician’s billings are out-of-line with the rest of the industry. Some individual physicians received particularly high sums. It may reflect fraudulent doctor behavior or perverse incentives that lead doctors to overuse a procedure.
But there are also concerns of misunderstanding by the public. Since the data don’t show details of patients’ diagnoses or dates of procedures, the patients may be misinformed by raw data. The high cost may be due some specialists having sicker patients and performing necessary, but costly surgeries.

Ran Bi is a master student in Data Science program at New York University. She has done several projects in machine learning, deep learning and also big data analytics during her study at NYU. With the background in Financial Engineering for undergrad study, she is also interested in business analytics.
See also
- Lawyers start mining the Medicare data for clues to fraud
- First look at Medicare data in 35 years - analyze data by state
The Medicare reimbursement data was publicly released this month.
We can see how Medicare program paid out $77 billion to more than 880,000 health care providers. The data also shows the providers' names, addresses, specialties, billing rates, averaged reimbursed amount, number of Medicare beneficiaries served and number of services provided for every Medicare provider.
The map I created with cartodb shows distribution of payments by states, with Florida, California, Texas, Alabama having highest reimbursements. Link: http://cdb.io/1f2oLWY
One clear finding is that the variance in reimbursements is very large. It ranges from nearly $21 million to a single Florida ophthalmologist to the $2,984 for the average certified nurse midwife. The average amount paid per provider varies a lot across the specialties. The three specialties with highest average amounts were clinical laboratory at $1,758,701, radiation therapy at $1,293,347, and Portable X-ray at $707,306.
But there are also concerns of misunderstanding by the public. Since the data don’t show details of patients’ diagnoses or dates of procedures, the patients may be misinformed by raw data. The high cost may be due some specialists having sicker patients and performing necessary, but costly surgeries.
Ran Bi is a master student in Data Science program at New York University. She has done several projects in machine learning, deep learning and also big data analytics during her study at NYU. With the background in Financial Engineering for undergrad study, she is also interested in business analytics.
See also
- Lawyers start mining the Medicare data for clues to fraud
- First look at Medicare data in 35 years - analyze data by state