Quality is once again growing as a central issue in data collection/sample fulfilment. It is often overlooked, or the responsibility is assumed to be on another layer of the market research supply chain. The ultimate client entrusts an agency or boutique, who then almost always entrusts the data collection to a supplier resource, and so forth.
Quality arguably should be the responsibility of everyone in the chain; everyone should do their part to ensure quality outcomes that can be controlled. After four months and over 100 customer and supplier meetings across Asia and Australia & New Zealand, three distinct problem segments emerged. Among these meetings were representatives from traditional global MRAs, branding and media agencies, large-scale independents, and progressive, technology-enabled consultancies and end clients.
Not surprisingly, while the same data collection challenges appeared in both regions, the proportion of specific challenges was very much in line with the maturity of the online research offering in those markets. Fraud and survey bots, for example, were more prevalent in markets that have embraced online/mobile data collection for some 10+ years, whereas the lack of red herrings and gamification, and the presence of leading questions, were more apparent in less adoptive markets. This is by no means a suggestion of fact, but more an implication of the sample set. Emerging markets are adopting technology and techniques faster than any of those that came before, despite the impact of COVID-19, so for the sake of an overview, no regional distinction is given.
When we developed our core offering, it was with quality and automation in mind. Along with the objective of making the jobs of market researchers easier, it was important to have a simplistic approach to quality control for the continued democratisation of insight buying. But operating at the forefront of research technology, providers, platforms, and clients are now faced with a multitude of challenges. These challenges have been segmented as follows.
In the design and creation stage, steps should be taken to prevent panellist fatigue, fraud, and ghost activity. There should also be some responsibility for combatting survey bots and, the industry’s favourite, professional respondents. How can this be achieved?
Survey design should consider the attention span of the audience.
- Is the survey too long? Limiting survey length is the source of much debate, but everybody is aware that data quality deteriorates after 20 minutes.
- Is it utilising proven gamification solutions to break up the monotony?
- Are proper termination logics in place?
- Are you leading the respondent with your screeners upfront?
- Are you using well-placed traps (red herrings) and attention questions?
If these considerations are not being addressed, then the problem is simply being passed further along the chain to those delivering the fieldwork.
Given the number of providers in APAC markets – 40+ in Australia alone – multi-panel membership is an accepted situation. Having the ability to identify unique respondents regardless of their provider affiliation is key. While mostly arising in discussion with our Australia-based clients, bots are once again causing a multitude of problems in the wider region and it is important that steps are taken to stem this surge.
The most common methods for identifying unique respondents are:
- Device fingerprinting for deduplication
- Fraud prevention for the presence of VPNs, proxy networks, and GeoIP location
- Link hashing to ensure that only respondents with a valid signature in the redirect will count towards a survey\’s completes
These are very effective measures, but they need to be constantly refreshed and updated with new technologies. Asking and knowing what prevention measures are in place is imperative to avoid any doubt when collecting data internally or via a third party.
At PureSpectrum we had the desire to go further than the accepted standards, and to build on industry predecessors like TrueSample, we developed PureScore™.
PureScore™ is a machine learning model that finds patterns in the profiles and behaviours of respondents. Its predictive quality rating system is built on respondent behaviour, and it uses over 18 billion data points a year to improve quality and maximise feasibility. The model works by finding patterns among behaviour, including screening consistency, completion rate, and group behaviour based on demography. In short, this means that we do not designate good and bad providers, but the respondents themselves.
Moving forward, the use of honey-pot questions and open-ended algorithms (NLP) will strengthen the ‘in-survey’ tools at our disposal, but there will always be room for improvement.
The post-survey analysis reveals the work done in the earlier stages of the cycle. At this stage the work has been delivered, and in the raw data there may be cases of poor/incoherent OEs. This is why there is a need to oversample, agencies have the ability to reconcile unwanted respondents, and providers need to work closely with supply sources.
Double opt-in panels/sources, as well as deeply profiled audiences, are the ‘norm’; however, this too perhaps requires review given the cumbersome nature of what is now a 20+-year standard.
Every time a standard is accepted, it becomes a target for finding weaknesses. Now that our industry is technology-based, this should be no surprise, and it is why dedication to solving the problem is so important at all stages of the chain.
Data quality should be front and centre in every leadership, product, data, and IT meeting. It should present itself in every discussion with clients, every internal roadmap, every release meeting, and be a living conversation in order to improve. Quality is a constantly moving goal and we should all strive for it.