Unlocking the data together – HHS’ Community Health Data Initiative

Although I never tire of promoting the advantages of Washington, DC, the #epicenter of health care transformation, there are some things we don’t have, such as a fancy report card about the performance of our hospitals, like the Californians do (CalHospitalCompare.org), courtesy of the California Healthcare Foundation.

A lot of the hospital data for the District is available; we just don’t have a Washington, DC version of CHCF to liberate it for us. So why not reduce the barrier to making this data usable so enterprising people can do it themselves?

Enter the Community Health Data Initiative. Making available data usable in multiple Web 2.0 modern formats, add in a contest (I am a fan of the contest), and maybe the crowd, the people who live the impact of great or poor quality care can help this data work for society.

I decided to see if I could make sense of part of this, by collaborating with a patient (of course), in this case @ReginaHolliday, to prepare an entry for the Design for America contest, which is due by May 17, 2010.

Since Regina’s family experience involved hospital care, I took a closer look at the hospital datasets, although I combed the long list available at the interim work page to see what was relevant.

Data: Lots of it, all over the place, but efforts to make it more useful

There’s a lot of data out there. And, it’s quite the scavenger hunt to figure out where data has come from and how to apply it to one’s personal experience. HHS has its work cut out for it.

All of that said, I can tell from the interim work page that there’s been a concerted effort to package the raw data in ways that developers and ordinary citizens can use. Several of the ZIPped files have “CDHI” in their filename – this tells me that these are fresh compilations that didn’t exist before. This is a good thing.

Web2.0-izing at Medicare

The hospital experience data has really been tuned for Web 2.0, over at data.medicare.gov. There, you can slice, dice, tweet, access via API, via multiple scripting/programming approaches. What this means is that with some programming experience, a person doesn’t have to download the static dataset to their computer everytime they want to do something cool with it. Their web/smartphone app can query the server efficiently in real time, a la Twitter, to create whatever visualization a person wants.

Say a person wanted to create a tripadvisor-type app on an iPhone for rating hospitals by their quality. Now they could.

And here’s an example, of pneumonia process of care data for Washington, DC hospitals. See if you can tell which hospitals are better at delivering infection-fighting antibiotics within 6 hours of admission for pneumonia:

Pneumonia process of care – DC

(Isn’t this cool?)

I have to say, I wonder if some families might see this and wonder if, in some hospitals, they should offer to drive to another hospital to pick up an IV and bring it to their loved one. I’m being facetious, but 6 hours is a really long time, and, some hospitals are able to start infection fighting pretty quickly, so why not learn from them.

I believe that to an extent, any hospitalization is a potentially devastating situation, maybe making this information more widely available would make it less so.

Next steps

I’m going to do my best to help Regina get to a great submission to the Design for America contest, because I have learned from her the power of art in telling a story that words cannot.

And speaking of art, it’s useful to remember that changing the world is art and science. There’s no dataset in here that tells us that some of these hospitals charge 73 cents a page for medical records with a 21 day wait. We needed Regina for that.


Ted, this *is* cool (to us health data geeks).

The Pew Internet Project recently released an in-depth look at "Government Online" in which we found that 40% of internet users (age 18+) have gone online for raw data about government spending and activities. Just 16% of internet users have visited a site like data.gov to actually download & work with government data, however. Please see:

Some would say this is just the beginning. Indeed, that is why we wanted to get a survey in & out of the field now, hoping to pick the topic while it's ripening.

Others would say there's a limit to the number of people who will ever want to crunch data. See, for example, Adam Bosworth's speech, in which he says most Americans don't want data per se, but want to know how it fits in to their lives: http://e-patients.net/archives/2010/05/health-gee

Then again, I am struck (once again) by the power of one person to make a difference. Carlos Rizo posted this very intriguing tweet on May 2:

"The power of open data: To find problems in complicated environments, and possibly even to prevent them from emerging."

The accompanying link led me to this essay:

Case Study: How Open data saved Canada $3.2 Billion

Here's the quote that is sticking with me:

"When data is made readily available in machine readable formats, more eyes can look at it. This means that someone on the ground, in the community (like, say, Toronto) who knows the sector, is more likely to spot something a public servant in another city might not see because they don't have the right context or bandwidth."

I'm going to keep thinking about this — thanks so much for adding fuel to this fire.

Amy Romano (@midwifeamy) sent me over here.

The CalHospitalCompare looks very fancy but it paints a highly inaccurate picture of maternity care in California. There's a huge discrepancy between the cesarean rates presented on CHC and those available on the OSHPD web site (where some of the data is scrubbed before being reported and some isn't– very inconsistent).

My particular focus these days is what consumers can access and it's dismal.

I am pretty sure that Cal Hospital Compare publishes the nulliparous, term, singleton, vertex cesarean rate (NTSV, a proxy for low-risk first time moms) but doesn't tell the user what the denominator is so we're left thinking that it's an overall cesarean rate. I actually favor NTSV rates because I think they are a more reliable measure of maternity care quality and are used by NQF, Joint Commission, and Healthy People, but don't sell it as the overall c-section rate, California!

Ted, I'm interested in whether you have any tips for nudging HHS to free data that isn't medicare data. Most of the conditions, procedures, and outcomes of interest to childbearing women (the population Jill and I are active advocates for) are covered by private insurance and Medicaid, with Medicare contributing only a tiny proportion. We have plenty of "data appliers" at the ready but as Jill's amazing work has shown, tracking down maternity care quality data is, as she calls it, "a fool's errand"

Hi Amy, Jill, and Susannah!

If I could combine the sentiments in your insightful comments, I would say that not every American needs to process raw data. However, some Americans need it processed differently so that they can make good decisions, and I think the example of cesearean rates is a great one.

In a world where raw data is truly available, why not have a user-generated app or website that sits along other report cards, with clear explanations, so people can make informed decision?

Amy and Jill, the folks at California Healthcare Foundation are definitely interested in the health of all Californians – have you discussed this with them and/or would you like me to forward your comments?

On the issue of tips for nudging – the first I suggest is to ask! You could start by posting a comment on their blog about this issue ( http://www.hhs.gov/open/plan/opengovernmentplan/i… ). It sounds like this data is not reported to the level of accuracy that you feel best for women. I might comb through what's there now, with your data appliers to demonstrate what's possible and why it falls short. This creates a compelling case for the next step. It's easier to tell people, "We did our best with what you gave us, if you just gave us a little more…" than "we can't do anything until you give us something else."

I am not familiar with your work to date on this issue so I don't want to presume that you haven't already done this. Feel free to add details here as you'd like. I think the goal, regardless of what the data is, is for people to be able to make it actionable to improve the health of communities,


Thanks for your response, Ted. I am headed to California next week and have a meeting with a colleague at the agency that developed the NTSV measure (California Maternal Quality Care Collaborative) and they are closely aligned with the California Healthcare Foundation (which I agree is certainly on the side of improving health and doing amazing work – let my flip comment earlier not lead any readers to believe that I think otherwise!) Anyway, all that is to say that I will suggest to my colleague that she make that recommendation to the foundation, although if you want to forward this comment stream I am certainly happy for you to do that.

I did make a recommendation to the OpenHHS team to publish the NTSV Cesarean rates when they were initially collecting input for priority areas. (here is my comment) I used good old social media to recruit others and they came along and supported my suggestion. So I'm hopeful that it is on someone's radar at HHS, and will keep beating my drum.

The maternity care community has many many passionate volunteers who are willing to dig up the data and many brilliant leaders who know good data from bad and understand how to communicate about risk and decision making with consumers. What I think we are lacking is a partner who can help us build consumer tools and applications with those data. I think childbearing women as a group have many unique characteristics that make them an ideal population with which to "do" transparency. They are generally healthy and have a long interval of time between their "diagnosis" and their hospitalization. They're also online and generally pretty savvy with social media. Etc.

Anyway, thanks for this thought provoking post. I work for Lamaze International and consult for a few other organizations working on maternity care issues, so if anyone reading this blog wants to partner up with the technology expertise, I would be more than happy to connect them with the advocates on the ground pushing for consumer-driven quality improvement.

There are lots of ways better data can be used. As this article points out some people have the skill and interest to use it; most of us don't. But many of us will be interested in bits and pieces. If I need a particular operation, I would like to know more about the comparative performance of hospitals in my area as shown in the example. If I were an employer, I would like to know more about the hospitals my insurance plan allows my family and my employees to use. If I read about a medical problem at a local hospital I would like to know whether that is an isolated incident or more serious. I might not take the time to find out, but my local newspaper can. People making large donations will want to donate to the hospital that will make the best use of their money and reflect well on their generosity.

And, there is the pressure that will be put on the executive suite at local hospitals just knowing that the impact of their decisions will not only affect patients and staff, as they do today, but will be available to the community and the plaintiff's bar.


Agreed, and I think you've done a nice job of putting the request in the public record. I'm happy to forward this thread to the California Healthcare Foundation and will do so now. My assumption, and hope would be that a movement toward openness would potentiate movement among dedicated (and at times, vulnerable) patient populations to use data to empower their communities to make safe, informed health care choices, and I can tell that's what you're after.


Agreed with you and thanks for your comment. I have always said that if there's something I'm not doing well as a doctor, I'd like to know about it, so I can stop doing it not well,


The California HealthCare Foundation (CHCF) appreciates your comments about CalHospitalCompare.org. Consumers can read more about the c-section measure presented on CalHospitalCompare.org by clicking on the "?" to the right of the word "Maternity" when they are looking at an individual hospital. We put the longer explanations under the "?" link because we have learned from many years of experience that the more text you include on a health care web site, the more difficult it is for consumers to use. Based on your comments, we made some edits to the c-section measure explanation; we hope it is clearer now. CalHospitalCompare.org and OSHPD report the same c-section metrics (OSHPD calls what we report the "Primary c-section rates age adjusted"). We appreciate your comments about the NSVT rate. The California Hospital Assessment and Reporting Taskforce (CHART)– which is the multistakeholder collaborative that provides the data for CalHospitalCompare.org, directed by R. Adams Dudley, MD, MBA at UCSF– agrees that using the NSVT rate would be best. However, CHART does not think that the extra refinement is worth either: 1) the confusion that being different from OSHPD might cause, or 2) the resources CHART and hospitals would have to expend to get the data about whether or not the mom is a first time mom. Should you have any other questions/comments about the website, feel free to contact me directly. We welcome your feedback. Sincerely, Stephanie Teleki, Ph.D., Senior Program Officer, California HealthCare Foundation ([email protected])

Hi Stephanie,

“…agrees that using the NSVT rate would be best.”

It is arguably the best comparative measure across the board but remains very misleading to the average consumer, in my opinion, as it appears to be “the” cesarean rate. Many other states have no problem making total cesarean rates available on their public websites with an explanation of their methodology. I urge my state of California to follow suit as soon as possible.

“However, CHART does not think that the extra refinement is worth either: 1) the confusion that being different from OSHPD might cause, or 2) the resources CHART and hospitals would have to expend to get the data about whether or not the mom is a first time mom.”

I’m really sorry to hear that CHART doesn’t feel that transparency and sharing data is worth their resources, but we all heard the governor’s latest “low hanging fruits” remarks. As a former State of California employee, I am well aware of the perpetual tug of war for funding. Nevertheless, it’s really not that hard to explain the difference between the OSHPD data and what the state chooses to share.

The risk that you take, which I’m sure you are aware from a PR perspective, is that it looks like you are underreporting cesarean rates. The unfortunate assumption that people make is that California is trying to deliberately conceal their cesarean rates from the public.

That said, the fact that ANY rates are available still puts California ahead of many other states that report nothing to the public at all.

Here are the two blurbs I found on CalHospitalCompare.org regarding cesarean reporting:

Maternity Data

The Cesarean-section measure is not assigned a performance rating because there is no generally agreed upon approach for rating this data. However, the data can be useful for an expecting mother or couple. If you want to do everything possible to have a vaginal delivery, you may prefer to use a hospital with a low C-section rate. You should discuss this concern with your obstetrician.


C-Section Rate

This is the percentage of mothers whose babies were delivered by Cesarean section (C-section). C-section is the surgical removal of the baby through the mother's abdomen. Whether or not this procedure is necessary and appropriate depends largely on each individual's clinical characteristics. The decision is usually a joint one between the patient and her doctor. Because C-sections are more common if the baby is in the breech position (buttocks first), is coming too early, or has died in the womb; or if the mother is having multiple babies or has had a prior C-section, we exclude all these instances when calculating the C-section rates reported here. Hospitals that serve as referral centers for high-risk pregnancies, those with intensive care units for very sick babies, and those serving mothers who have not had the benefit of prenatal care can be expected to have higher C-section rates. The C-section measure is not assigned a performance rating because there is no generally agreed upon approach for rating the data. However, the data can be useful for expectant parents. A woman who wants to do everything possible to have a vaginal delivery may prefer a hospital with a low C-section rate. She should discuss this concern with her obstetrician.

Let’s look at Cedar Sinai in L.A.

CalHospitalCompare.org says their rate is 21% and the state average is 17%.

OSHPD data shows an overall 36.7% rate and a state average of 32.7%.

Or Corona Regional Med Ctr (Main):

CalHospitalCompare.org says 26%

OSHPD says 49.8%

Furthermore, searching for the five hospitals with the highest cesarean rates on CalHospitalCompare.org yielded a “Chose not to participate” message. So the women planning to give birth at Corona Regional Med Ctr (70.5%, 1414 total births), Los Angeles Community Hospital (64.9%, 510 total births), Community and Mission Hospital of Huntington Park (56.6%, 260 total births), East Valley Hospital Med Ctr (51.6%, 401 total births) and El Centro Regional Med Ctr (51.6%, 1579 total births) are not served by the site in any way.

Ted Eytan, MD