Yue Qin (Indiana University Bloomington & Central University of Finance and Economics), Yue Xiao (Indiana University Bloomington & IBM Research), Xiaojing Liao (Indiana University Bloomington)
In privacy compliance research, a significant challenge lies in comparing specific data items in actual data usage practices with the privacy data defined in laws, regulations, or policies. This task is complex due to the diversity of data items used by various applications, as well as the different interpretations of privacy data across jurisdictions. To address this challenge, privacy data taxonomies have been constructed to capture relationships between privacy data types and granularity levels, facilitating privacy compliance analysis. However, existing taxonomy construction approaches are limited by manual efforts or heuristic rules, hindering their ability to incorporate new terms from diverse domains. In this paper, we present the design of GRASP, a scalable and efficient methodology for automatically constructing and expanding privacy data taxonomies. GRASP incorporates a novel hypernym prediction model based on granularity-aware semantic projection, which outperforms existing state-of-the-art hypernym prediction methods. Additionally, we design and implement Tracy, a privacy professional assistant to recognize and interpret private data in incident reports for GDPR-compliant data breach notification. We evaluate Tracy in a usability study with 15 privacy professionals, yielding high-level usability and satisfaction.