Major Challenges and Opportunities in Data and Analytics Strategy and Execution for Large Scale Participant Engaged Health Research

November 22, 2024 | By Divy Kangeyan, PhD

Over the past decade, the percentage of U.S. adults who have a great deal / fair amount of confidence in medical scientists and scientists to act in the best interest of the public has decreased, according to Pew Research Center. This is due to a multitude of reasons. However, as scientists we have to bear the responsibility to change this trend and increase the confidence of the public, especially since the taxpayers provide the means for all the great research conducted at agencies such as the National Institutes of Health (NIH). One of the mechanisms to alleviate this mistrust is to include the public as participants and consider them as stakeholders in scientific research. Large participant-engaged health research initiatives, such as the “All of Us” program, is a primary example of such endeavors. One of the primary objectives of these studies is to be inclusive and make the participants feel that the research is being conducted with them instead of on them.

Various breakthroughs in technology such as Artificial Intelligence (AI), Digital Health Technologies (DHT), and efficient data storage systems can lead to study various outcomes in a community. These advancements, when coupled with features of community based research, could present many challenges. Addressing those challenges could, however, lead to numerous opportunities. In this article, we will explore three emerging issues in participant engaged research: data privacy, novel datasets and methods for analyzing those datasets, and disseminating the knowledge of data and analytics within the community. 

Information has become one of the valuable assets in this century and many institutions have made fortune from their data and stand testament to the saying whoever possesses the data controls the narrative. This has also shed light into one of the cornerstones of data, namely data privacy. Data collected from community engaged research is often used for secondary purposes in addition to their primary data collection purpose. There might be proper regulation put in place for primary use; however, as secondary and tertiary users access this data, the privacy of it might not be clear. Secondary usage of data could also harm the community if it was used as a way to target it. Therefore, there need to be clear specification on data ownership and primary and further usage of the data. Individual consent is crucial in this setting, in addition Sabatello et al. also recommend community level consent where privacy of the data is determined at the community level in addition to individual consent. A clear data sharing plan should also be put together with the input of the community, and there also needs to be language on allowing the community to have the power to veto when needed on sharing the data. 

With evolving technology, the way we collect data has also improved dramatically. Now, various markers can be measured via wearables and tracking devices, consultation sessions could be conducted virtually, and the language barrier has become ever smaller with AI enabled technologies. All of these updates have propelled participant engaged health research further, however many of the devices and software are proprietary, often accompanied by lengthy terms and conditions. Therefore, when using such technologies, research and community leaders have to work together to ensure that well-being and health of the community is upheld as one of the highest priorities in these studies. 

Over the past century, statistical and analysis methods have evolved depending on the application of these methods. Ronald Fisher, a towering figure in modern statistics, invented many statistical methods due to his work involving botany and statistical genetics. This has been the general trend in statistics and now in machine learning and AI. Almost all analysis methods are built under various assumptions, for example regression methods and many statistical tests assume the samples that we assess are independent of each other. Data collected from the same community will have some level of correlation and that cannot be ignored during analysis due to convenience. Therefore, there is a need to invent new methods and techniques to analyze the data and generate insights. Some other interesting scenarios include: mix language translation where non-native English speakers use both English and their native language to respond to questions, unique type of missingness, censoring, and truncation in the data that arise in community and individual level datasets, and high levels of unstructured data that require verified data pipeline and quality control measures to ensure the quality of the data. Although this might be a technical topic, requesting and providing feedback to the community is essential in this task as well. A qualitative survey study by Han et al. has shown that after data collection, researchers are less likely to engage with the community, although the communities prefer to receive feedback and provide input in all stages of research.

One of the interesting observations made in Grayson et al.’s paper, through their surveys, was that some community members found the term Big Data to be elitist and exclusionary. This perception likely stems from not providing sufficient context to the community participants in the study and not engaging them in data analysis and discussions about data storage, privacy and various other related topics. Mitigating the negative perception that exists regarding data is essential for conducting proper participant engaged research that benefits the community. Workshops on data related topics would be a great first step for all the community members. Providing additional training for interested participants would equip them to analyze data and formulate questions and hypotheses.

Receiving input from the community early and often is essential to address challenges associated with data and analytics, as in all other facets of community engaged research. Many of the major challenges require strong collaboration between subject-matter experts and the community, in which the study is conducted.

References

Kennedy, B. (2023, November 14). Americans’ trust in scientists, positive views of science continue to decline. Pew Research Center Science & Society. https://www.pewresearch.org/science/2023/11/14/americans-trust-in-scientists-positive-views-of-science-continue-to-decline/ 

Sabatello, M., Martschenko, D. O., Cho, M. K., & Brothers, K. B. (2022). Data sharing and community-engaged research. Science, 378(6616), 141-143.

Han, H. R., Xu, A., Mendez, K. J., Okoye, S., Cudjoe, J., Bahouth, M., … & Dennison-Himmelfarb, C. (2021). Exploring community engaged research experiences and preferences: a multi-level qualitative investigation. Research Involvement and Engagement, 7(1), 1-9