The Consumer Data Research Council (CDRC) (established by the ESRC) held the CDRC Data Partner Forum on the 6th May at the Saïd Business School, University of Oxford. The key aim of the CDRC is to help organisations maximise the potential of innovation by opening up their data to trusted researchers so that they can provide solutions that drive economic growth and improve our society. During the day, the presentations were based around three themes of missing data, data sources and research design.
For the retail demand modellers, the inclusion of seasonal demand, especially for seaside locations, being able to account for natural barriers and include travel times based upon real journey times were seen as important. It was useful to see how different data values were being clustered to form classifications, as this is something that needs to be done with the footfall data available to IPM, in the big data project we are just about to start with Springboard.
The importance of data representation was an important theme. Missing data, both spatially and temporally was identified as a challenge and a number of techniques were identified to ‘fill-in’ missing data. A recurring theme was the problem of using time constrained census data when analysing concurrent data that is updated more frequently. Also identified was the accuracy problem of end-user supplied outcome codes, in this case failed delivery reasons.
With any spatial and temporal data, there is the challenge of providing a digestible visual display. With so much data available, this was acknowledged as a challenge that most of the presenters using geographical mappings faced.
As a data source, supermarket loyalty cards were discussed. Interestingly, it was found that loyalty card usage was least likely to occur for small and frequent purchases, no matter what type of store was visited or the socio-demographic classification of the customer. The map of users of a store showed a more dispersed geographical spread around the UK than expected. This highlighted the problem of customers failing to update their home address details when moving home and the subsequent difficulties in interpreting loyalty card spatial data.
However, when problems in the data were identified, this fed into the recurring observation that so called bad data, that is data identified statistically to be problematic, should not always be removed or cleansed using missing data techniques. Alternatively, this so called bad data could be the most interesting data of all for a researcher and/or commercial organisation. For example, people who don’t update their loyalty card details could lead to some very useful insights into such customers. Perhaps they are a very profitable segment?
Useful resources identified during the presentations included: http://maps.cdrc.ac.uk which includes views of geodemographic, retail and general metrics for the larger towns and cities. Various views are provided, one that seemed a useful barometer of high street health was the retail view which for some towns (presumably only a few have the data available) provides changes to retailer types and vacancy rates over a set period of time.
Overall, it was a very good day. The presentations were very interesting and there was also the opportunity to meet and mix with other academics and business representatives.
About Me (Ed Dargan): For the last 10 years, I have worked in multi-channel organisations. In 2012, I started an MSc in Internet Retailing at Manchester Metropolitan University (MMU) and it was here I met Professor Cathy Parker who led the marketing strategy module. I support local food producers and retailers and Cathy provided the opportunity to take a look at some footfall data for a number of locations throughout the UK. At the time, the data sample was too small to investigate statistically but the initial view of the data revealed some interesting monthly patterns. I had the choice for my MSc dissertation of either pursuing the footfall data or look into internet options for local food retailers and producers and I took the latter option. However, my interest in the footfall data remained so when the opportunity came to investigate the footfall data as part of a PhD, after a slight hesitation and sanity check, I grabbed the opportunity. My PhD is part-time and I’ve just finished the first year researcher course at Manchester Metropolitan University and am raring to get going exploring the data.
Below is a list of the sessions and presentations:
Session 1: Missing Data and Missing People
• Thomas Waddington: Modelling the temporal variation in supermarket revenue estimates
• Eusebio Odiari – Infilling missing values in consumer Big Data
• Michail Pavlis – The geography of non-delivery
• Emily Sheard – Enumerating the ambient population in the context of crime
Session 2: Novel Data Sources and their Geographic Integration
• Hai Nguyen, Oliver O’Brien – naming conventions and ethnicity
• Anastasia Ushakova – Temporal patterns of energy consumption and vulnerable consumers
• Tim Rains – Data linkage of store loyalty cards
Session 3: Big Data and Research Design
• Mark Birkin – Spatial microsimulation, big data and policy analysis: an example from the UK travel market consumer data
• Phani Chintakayala – Do green attitudes and demographics drive sustainable product consumption?