The Importance of Understanding Data Before Using it

The Importance of Understanding Data Before Using it

I was recently performing some analysis using cost report data. At first pass, it looked like hospitals in the United States had made a huge profit, according to cost reports filed in 2021. Looking deeper, however, I noticed a few hospitals with net income in 2021 many times larger than in 2020. Unreasonably larger. In digging, I noted a few hospitals had not input correct data.

It is a common occurrence when working with large datasets, particularly those that rely on self-reported or manually entered information, to find issues with the data that impacts analysis.

Here are a few steps you can take before and after you find issues with data you are using:

  1. Data Validation: Implement some validation rules on your analysis to identify potential errors in the data. This could be as simple as looking for values that are several standard deviations away from the mean, or comparing changes in net income year over year and flagging those that exceed a certain threshold.
  2. Contact the Data Source: If the data comes from a specific provider, like a government agency or a private company, you might want to reach out to them to notify them of the potential errors. They may be able to correct the issue at the source, or at least provide you with more accurate data. In this case, it would involve mentioning the issue to the Centers for Medicare & Medicaid Services (CMS).
  3. Data Cleaning: Depending on the nature and scale of the inaccuracies, you might be able to correct them yourself. If it’s just a few outliers, you might be able to replace them with more reasonable estimates. If it’s a systemic issue, you might need to apply some kind of transformation to the data to account for it.
  4. Sensitivity Analysis: In some cases, you might want to run your analysis both with and without the questionable data to see how much it affects your results. This can give you a sense of how sensitive your conclusions are to these potential errors.
  5. Disclose Uncertainty: Finally, when you present your results, be sure to disclose these potential inaccuracies. It’s always better to be upfront about the limitations of your analysis than to overstate your confidence in the results.

Remember that the goal of data analysis is not to produce the perfect answer, but to make the best possible use of the information available. By being diligent and transparent in your work, you can help ensure that your conclusions are as reliable and useful as possible.

Always tell anyone you provide analysis to how you got the data and whether you had to adjust the data.

Print Friendly, PDF & Email

You May Also Like

Leave a Reply

Please log in to your account to comment on this article.


Subscribe to receive our News, Insights, and Compliance Question of the Week articles delivered right to your inbox.

Resources You May Like

Trending News

Happy National Doctor’s Day! Learn how to get a complimentary webcast on ‘Decoding Social Admissions’ as a token of our heartfelt appreciation! Click here to learn more →

Happy World Health Day! Our exclusive webcast, ‘2024 SDoH Update: Navigating Coding and Screening Assessment,’  is just $99 for a limited time! Use code WorldHealth24 at checkout.

SPRING INTO SAVINGS! Get 21% OFF during our exclusive two-day sale starting 3/21/2024. Use SPRING24 at checkout to claim this offer. Click here to learn more →