I. Introduction
A social network is a structure consisting of individual users connected by a social relationship such as friendship. Social networking services build on real-world social networks through online platforms, providing ways for users to share ideas, activities, events, and interests. For example, users can share location-tagged images with their friends (e.g., Instagram), rate restaurants/bars, recommend them to their friends (e.g., Foursquare), or log jogging and bicycle trails for sports analysis and experience sharing (e.g., Bikely). This dimension of location bridges the gap between the physical world and online social networking services. As location is one of the most important components of user context, extensive knowledge about an individual’s interests, behaviors, and relationships with others can be learned from their locations. These kinds of location-tagged and location-driven social structures are known as location-based social networks (LBSNs). Research on LBSNs has become a vivid topic in the last decade, enabled by many practical applications (surveyed in Section II) and rooted in the mobile data management community (e.g., [37], [49], [59] –[61], [67], [73], [73], [91], [103], [107]). Publicly available real-world data sets have been the driving force for LBSN research in recent years, but such data sets exhibit certain weaknesses:
Data sparsity: LBSN data exhibits an extreme long-tail distribution of user behavior. In all existing available data sets, the vast majority of users has less than ten check-ins [43]. Besides, the number of locations visited by a user is usually only a small portion of all locations that user has visited. This results in the density of the data used in experimental studies on LBSNs to be only usually around 0.1% [43].
Small data sets: Existing data sets used to train models are small, as detailed in Section II-B. They tend to only cover a short period of time, a small number of users, or a small number of check-ins.
Privacy Concerns: Most LBSN data was published by users and consented for public use. However, some users may revoke this consent, for instance, by deleting their LBSN account. Such changes will not be reflected in existing LBSN data sets and thus creating severe privacy concerns.
No ground-truth: There is no way to assess, in existing LBSN data, whether check-ins are missing or if the social network is correct and complete. Without knowing the ground truth, it is difficult to assess the accuracy and robustness of existing experimental results using LBSN data.