In my previous posts analyzing the DocGraph dataset I have looked at the geographic connections between referrals and out of state referrals. In this post I decided to change directions from geographic analysis and instead focus on the different types of providers involved in patient referrals. In particular I wanted to take a crack at answering the question: “what type of doctor will you see next?”
As always the first step to answering this question was to enrich the DocGraph data with data from the NPI database. In particular we need the taxonomy code for each of the nodes in the DocGraph dataset. After the we have added taxonomy code to all the nodes we then want to aggregate all referrals by taxonomy code to shrink the size of our dataset down into something that is more easily managed on a single machine. In order to achieve this goal with short iteration time I used Amazon EMR and Apache Hive. The distributed nature of Hadoop and Hive enabled me to join the large NPI database with the even larger DocGraph dataset and perform the necessary aggregation all in under 20 minutes with a cost of only $1.04. You can find the Hive script I used to perform the join and aggregation on Github.
Once the Docgraph dataset had been aggregated by taxonomy code it was a simple matter of converting the taxonomy code to the human readable provider type. This was achieved using the Health Care Provider Taxonomy dataset and some R code that you can find on Github. A little bit more aggregation to account for the multiple levels of taxonomy codes, including specialization, and the data needed to answer our question was ready. I also removed any referrals where both nodes were of the same provider type as these are most likely noise in the dataset caused by the billing method through which the DocGraph data was collected.
Below is a table showing the top 20 referrals between provider types. Not surprisingly we can see that the vast majority of patients are being referred for to Radiology for various types of test such as X-rays, CT scans, and MRIs. They are then referred back to an Internal Medicine doctor which I hypothesize is the physician acting as primary care. Another interesting, but not all that surprising, relationship is the number of referrals between Emergency Medicine and Internal Medicine. Here I hypothesize that patients are being seen for some emergency medical condition and then receive follow-up care from their primary care provider.
Perhaps the most interesting observation from this top 20 list is the number of times Internal Medicine - Cardiovascular Disease appears. I always knew that America had a problem with heart disease but I was still a bit surprised at the volume of this type of referral. I would love to hear if anyone else has a hypothesis for why there are so many referrals related to Internal Medicine - Cardiovascular Disease.
|Provider Type Seen First||Provider Type Seen Second||Number of Patients|
|Radiology - Diagnostic Radiology||Internal Medicine - General||115,602,860|
|Internal Medicine - General||Radiology - Diagnostic Radiology||91,632,055|
|Internal Medicine - Cardiovascular Disease||Internal Medicine - General||54,260,749|
|Radiology - Diagnostic Radiology||Internal Medicine - Cardiovascular Disease||49,406,691|
|Internal Medicine - Cardiovascular Disease||Radiology - Diagnostic Radiology||47,820,945|
|Internal Medicine - General||Internal Medicine - Cardiovascular Disease||47,351,852|
|Radiology - Diagnostic Radiology||Family Medicine - General||45,078,839|
|Family Medicine - General||Radiology - Diagnostic Radiology||40,181,846|
|Emergency Medicine - General||Radiology - Diagnostic Radiology||33,797,598|
|Emergency Medicine - General||Internal Medicine - General||32,236,140|
|Radiology - Diagnostic Radiology||Specialist - General||27,710,610|
|Specialist - General||Internal Medicine - General||26,478,301|
|Internal Medicine - General||Specialist - General||24,876,128|
|Specialist - General||Radiology - Diagnostic Radiology||23,929,823|
|Radiology - Diagnostic Radiology||Emergency Medicine - General||23,424,750|
|Internal Medicine - General||Family Medicine - General||22,561,522|
|Radiology - Diagnostic Radiology||Internal Medicine - Nephrology||22,479,825|
|Radiology - Diagnostic Radiology||Internal Medicine - Pulmonary Disease||21,796,186|
|Family Medicine - General||Internal Medicine - General||20,872,086|
|Internal Medicine - Cardiovascular Disease||Family Medicine - General||19,047,613|
If you would like to see more than just the top 20 referrals by provider type you can download the complete list here
Finally, I can’t resist a sexy visualization that helps to convey the elegance of the DocGraph dataset. Below you will find a visualization of the referrals between provider types. The thickness of the edge reflects the number of patients that are referred between the two provider types. To create the visualization I used the open source Gephi graph visualization platform.
click on the image below to see the full size version
I hope you have enjoyed my analysis. I am always open to feedback and would love to collaborate on analysis related to DocGraph or open health data in general. If you are interested in collaborating please email me ryan [at] weald.com or message me on twitter @rweald