Ryan Weald's Blog

Analyzing DocGraph - What Type of Doctor Will You See Next?

| Comments

In my previous posts analyzing the DocGraph dataset I have looked at the geographic connections between referrals and out of state referrals. In this post I decided to change directions from geographic analysis and instead focus on the different types of providers involved in patient referrals. In particular I wanted to take a crack at answering the question: “what type of doctor will you see next?”

As always the first step to answering this question was to enrich the DocGraph data with data from the NPI database. In particular we need the taxonomy code for each of the nodes in the DocGraph dataset. After the we have added taxonomy code to all the nodes we then want to aggregate all referrals by taxonomy code to shrink the size of our dataset down into something that is more easily managed on a single machine. In order to achieve this goal with short iteration time I used Amazon EMR and Apache Hive. The distributed nature of Hadoop and Hive enabled me to join the large NPI database with the even larger DocGraph dataset and perform the necessary aggregation all in under 20 minutes with a cost of only $1.04. You can find the Hive script I used to perform the join and aggregation on Github.

Once the Docgraph dataset had been aggregated by taxonomy code it was a simple matter of converting the taxonomy code to the human readable provider type. This was achieved using the Health Care Provider Taxonomy dataset and some R code that you can find on Github. A little bit more aggregation to account for the multiple levels of taxonomy codes, including specialization, and the data needed to answer our question was ready. I also removed any referrals where both nodes were of the same provider type as these are most likely noise in the dataset caused by the billing method through which the DocGraph data was collected.

Below is a table showing the top 20 referrals between provider types. Not surprisingly we can see that the vast majority of patients are being referred for to Radiology for various types of test such as X-rays, CT scans, and MRIs. They are then referred back to an Internal Medicine doctor which I hypothesize is the physician acting as primary care. Another interesting, but not all that surprising, relationship is the number of referrals between Emergency Medicine and Internal Medicine. Here I hypothesize that patients are being seen for some emergency medical condition and then receive follow-up care from their primary care provider.

Perhaps the most interesting observation from this top 20 list is the number of times Internal Medicine - Cardiovascular Disease appears. I always knew that America had a problem with heart disease but I was still a bit surprised at the volume of this type of referral. I would love to hear if anyone else has a hypothesis for why there are so many referrals related to Internal Medicine - Cardiovascular Disease.

Provider Type Seen First Provider Type Seen Second Number of Patients
Radiology - Diagnostic Radiology Internal Medicine - General 115,602,860
Internal Medicine - General Radiology - Diagnostic Radiology 91,632,055
Internal Medicine - Cardiovascular Disease Internal Medicine - General 54,260,749
Radiology - Diagnostic Radiology Internal Medicine - Cardiovascular Disease 49,406,691
Internal Medicine - Cardiovascular Disease Radiology - Diagnostic Radiology 47,820,945
Internal Medicine - General Internal Medicine - Cardiovascular Disease 47,351,852
Radiology - Diagnostic Radiology Family Medicine - General 45,078,839
Family Medicine - General Radiology - Diagnostic Radiology 40,181,846
Emergency Medicine - General Radiology - Diagnostic Radiology 33,797,598
Emergency Medicine - General Internal Medicine - General 32,236,140
Radiology - Diagnostic Radiology Specialist - General 27,710,610
Specialist - General Internal Medicine - General 26,478,301
Internal Medicine - General Specialist - General 24,876,128
Specialist - General Radiology - Diagnostic Radiology 23,929,823
Radiology - Diagnostic Radiology Emergency Medicine - General 23,424,750
Internal Medicine - General Family Medicine - General 22,561,522
Radiology - Diagnostic Radiology Internal Medicine - Nephrology 22,479,825
Radiology - Diagnostic Radiology Internal Medicine - Pulmonary Disease 21,796,186
Family Medicine - General Internal Medicine - General 20,872,086
Internal Medicine - Cardiovascular Disease Family Medicine - General 19,047,613

If you would like to see more than just the top 20 referrals by provider type you can download the complete list here

Finally, I can’t resist a sexy visualization that helps to convey the elegance of the DocGraph dataset. Below you will find a visualization of the referrals between provider types. The thickness of the edge reflects the number of patients that are referred between the two provider types. To create the visualization I used the open source Gephi graph visualization platform.

click on the image below to see the full size version

visualization of referrals by provider

I hope you have enjoyed my analysis. I am always open to feedback and would love to collaborate on analysis related to DocGraph or open health data in general. If you are interested in collaborating please email me ryan [at] weald.com or message me on twitter @rweald