Protecting the Integrity of Online Patient Engagement Surveys
Julia B. Kim
National Minority Quality Forum
George Washington University School of Public Health
S

urveys are an important tool for gathering diverse patient insights and engaging patients in healthcare decisions. With the ubiquitous use of the internet, there are greater opportunities for patients to engage with researchers, but also greater vulnerability for scammers to interfere with response collection and results. This article describes the importance of protecting survey instruments as a tool for patient engagement along with best practices for reducing vulnerability to fraudulent online activity.

What is the Problem?

Surveys are useful instruments to obtain data such as perspectives, attitudes, and insights from various populations, including patients, caregivers, and providers. With any scientific study, these efforts may require incentives to elicit voluntary responses. However, incentives may attract scammers, who capitalize on the anonymity provided by online surveys and deploy “bots,” software programs that perform repetitive tasks and appear as traffic on the internet. Consider the following reported incidents:

  • A 2020 psychological study where a researcher promoted a questionnaire on Twitter to obtain perceptions of LGBTQ+ youth regarding eating behaviors and eating disorders. The incentive was 15 USD. Within 24 hours of launch, this questionnaire received 386 responses but after cleaning the data, only 11 responses could be used for analysis. The survey was quickly closed and had to be relaunched after participants were directly recruited.
  • In a 2021 study assessing COVID-19 impacts on cancer survivors, reported bot activity after the survey was shared on social media platforms. This led to 1408 fraudulent responses from a total sample of 1977. In this study, there was a small financial incentive of 25 USD, which was the magnet for scammers.

Fraudulent bot activity is not specific to one field of study. Neither is it limited by geography. Researchers are finding that IP addresses associated with bot activity can be traced back to countries outside of the U.S., indicating that online bot activity is a global phenomenon.

Researchers, patient advocates, and others seeking input from patient and community populations need to follow precautionary measures to protect the integrity of their online survey data and validity of their findings.

Background

Online or web-based surveys provide many benefits over other methods of data collection, with low cost of implementation, wide coverage of respondents, and ease of data collection and analysis. These benefits come at the cost of lower response rates by percentage as well as higher levels of nonresponses to some of the questions compared to paper surveys. Other challenges include potential breaches in confidentiality and technological difficulties for participants.

Researchers have identified additional problems with implementing online surveys in the new age of social media. A concern with online surveys that become available to the general public is potentially fraudulent activity that would lower the validity of the study findings. Fraudulent activity in surveys can be thought of as the knowing or willful deceit of participants in order to receive a reward (e.g., financial incentives such as gift cards or nonfinancial incentives such as free pens or books). The risk of bot activity associated with online survey instruments increases with financial incentives. This is a caveat especially for surveys that are distributed publicly on social media where anyone can gain access to the survey. The level of anonymity provided by online surveys also makes them prone to fraudulent activity.

A few publications have outlined the challenges with fraudulent activity found with online surveys and offer best practices for safeguarding online surveys while also ensuring a robust response. Recommended precautions can be built into the Survey Design, Dissemination, and Data Cleaning phases.

Precautions Related to Survey Design
The selection and design of survey instruments is important in any public health study because of the variability in engaging participants, including patients, in providing accurate responses. Security measures need to be weighed in terms of participant accessibility. If security precautions are too stringent or make the survey too cumbersome, they may potentially alienate some participants and lead to response bias.

Survey design measures to inhibit fraudulent activity may relate to access, such as CAPTCHA verification, question type and order, response requirements, and time stamps. These are described in more detail below.

Access Measures

  • CAPTCHA verification: The standard usage of Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is meant to tell humans and machines apart related to reducing online spam and fraudulent activity. CAPTCHA options are included in many online survey software programs. While useful, CAPTCHA is no longer enough by itself to prevent fraudulent activity given that bots have been seen to bypass this measure and submit multiple responses.
  • Using commercially available survey software can protect the integrity of the survey and provide an additional security measure. Software is available that can analyze the questionnaire in real time and track respondent behavior such as time spent on a section, answer patterns, or attempts to skip questions that can help end the survey attempt early if needed. This would help not only identify bots but also prevent excess traffic onto the website that may potentially prevent valid participants from submitting responses. Other measures include reCAPTCHA v3, which is seamlessly integrated into a survey and does not require additional user-interface interaction like the CAPTCHA question. Instead, it assigns a score for each question response that indicates the likelihood of it being taken by a bot.

Question Type and Order

  • “Honeypot” items are questions that are programmed into the survey so that they are not visible to a human participant but a bot would answer. Responses with completed honeypot questions indicate a nonhuman respondent.
  • Skip Logic involves follow-up questions based on the previous response, which a bot would not recognize and indiscriminately answer.
  • “Hard” and “soft” check questions can also be used as benchmarks in determining whether a survey response is valid. Hard check questions would result in immediate elimination, whereas soft check questions would flag a response that may need further verification. For example, a hard check question would be to ask the participant to choose a specific answer choice (e.g., please select the response “strongly agree”) or to have duplicate questions in different sections (e.g., select your gender) and see if the answers are consistent. A soft check question would be an open-ended question. The response can be analyzed for grammar consistency and content to ensure that an eligible participant is taking the survey. To avoid confusion, a disclaimer at the beginning of the survey should state that there are attention checks throughout the survey.

Response Requirements

  • Require answers to open-ended questions.
  • Create a question asking participants to type out a specific word.
  • Place a duplicate question (e.g., gender) at another point in the survey to see if participants respond differently.
  • Require participants to list an email or utilize survey programs that track paradata such as IP addresses to confer some degree of accountability.
  • Include a statement of acknowledgement at the end of the survey that states no fraudulent or multiple submissions will receive the incentive.
  • Require participants to provide a valid phone number or email address so that they can be reached to verify eligibility for the incentive.

Time stamps

For the overall survey, time stamps for each submitted response should be recorded in order to weed out any “speeding” responses that are likely to be inattentive participants, and also as a check for bot activity. Bots tend to attack surveys at once and will provide multiple submissions with similar time durations. By having a time stamp, it is easier during survey run to pinpoint when the suspicious activity started so that corrective action can be taken.

Precautions Related to Dissemination
When distributing questionnaires and surveys, the most common methods are phone, mail, and e-mail. Researchers have shown the difference in response rates between the three methods, with telephone surveys eliciting the highest response rates in patient populations. One concern for social health studies utilizing surveys is the challenge of low response rates and the risk of nonresponse bias. Nonresponse bias occurs when participants choose to not respond to specific questions in a survey, thereby creating some bias in the data collected. Survey response rates are also typically higher in nonphysicians compared to physicians, although physicians tend to respond better to online surveys compared to mailed surveys. This would indicate differences in response depending on the target population and is important to consider for patient engagement purposes.

To reduce the chances of potential bot activity, the best method is to prevent bots from noticing the survey. To this end, it is important to take measures to restrict the visibility of the survey to only the relevant participant pool. Dissemination strategies include opting for more specific outreach channels, such as membership listservs, or by directly targeting specific panels or populations with validated mail groups.

Researchers also recommend using personalized links to the survey as a safeguard to restrict inappropriate access. Social media outreach and promotion should be handled carefully, since it potentially opens the survey to anyone with access to the post and makes it more vulnerable to fraudulent activity. Bot activity not only results in a false positive response rate but also takes extra time and resources for researchers to sift through and validate responses.

If social media promotion is necessary, then financial incentives for participation should be avoided to deter fraudulent activity. Additional measures would be to open the survey to a smaller pool of participants initially until the target count is reached and then broaden the survey as needed. This staggering survey launch approach would help restrict the level of accessibility to the survey to relevant participants only.

Eliminating Fraudulent Responses During Data Cleaning

There are useful methods to check for suspicious activity during the data cleaning stage:

  • Check time stamps in aggregate to see any familiar patterns or times of suspiciously high activity.
  • For individual responses, conduct a pre-survey run to determine the standard completion time for surveys. Then, compare the length of time each response took compared to the pre-survey standard and establish a cutoff point to eliminate any “speed” responses.
  • Assess free-response questions for proper context and grammar usage.
  • Analyze multiple-choice questions to see if there are any overarching response patterns (e.g., “zig-zag” or if all the answers were the first choice).
  • Check for email address/phone number overlaps or patterns between individual responses.

Conclusion

Online surveys are important tools for health researchers, patients, providers, and other stakeholders eager to engage diverse patients, collect insights, and deliver meaningful interventions to improve care and outcomes across populations. The best defense against bots for online surveys is prevention. Once bots infiltrate an online survey, they not only skew the raw data but also call into question the integrity of the overall findings. Online surveys offer benefits in terms of low cost and ease of compilation and distribution. However, the potential for bot activity needs to be carefully considered during the study design and planning stage. Avoiding social media platforms or large financial incentives during dissemination will help lower the chances of bot activity. Although security measures are recommended, they should be tailored to the individual survey based on patient engagement factors such as ease of access and estimated time spent on the survey.
References available upon request.