Back to Search Start Over

Automating Governing Knowledge Commons and Contextual Integrity (GKC-CI) Privacy Policy Annotations with Large Language Models

Authors :
Chanenson, Jake
Pickering, Madison
Apthorpe, Noah
Publication Year :
2023

Abstract

Identifying contextual integrity (CI) and governing knowledge commons (GKC) parameters in privacy policy texts can facilitate normative privacy analysis. However, GKC-CI annotation has heretofore required manual or crowdsourced effort. This paper demonstrates that high-accuracy GKC-CI parameter annotation of privacy policies can be performed automatically using large language models. We fine-tune 50 open-source and proprietary models on 21,588 GKC-CI annotations from 16 ground truth privacy policies. Our best performing model has an accuracy of 90.65%, which is comparable to the accuracy of experts on the same task. We apply our best performing model to 456 privacy policies from a variety of online services, demonstrating the effectiveness of scaling GKC-CI annotation for privacy policy exploration and analysis. We publicly release our model training code, training and testing data, an annotation visualizer, and all annotated policies for future GKC-CI research.<br />Comment: 28 pages, 18 figures, 10 tables; revised version

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.2311.02192
Document Type :
Working Paper