Richard Plant, Dimitra Gkatzia, Mario Valerio Giuffrida

Conference on Empirical Methods in Natural Language Processing (2021)

Richard Plant, Dimitra Gkatzia, Valerio Giuffrida (2021). CAPE: Context-Aware Private Embeddings for Private Language Learning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.

wp-content/uploads/2020/10/tex.png
Get Paper
Conference Poster
Code on GitHub
@inproceedings{plant2021,
    title = "{CAPE}: Context-Aware Private Embeddings for Private Language Learning",
    author = "Plant, Richard  and
      Gkatzia, Dimitra  and
      Giuffrida, Valerio",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.628",
    pages = "7970--7978",
    abstract = "Neural language models have contributed to state-of-the-art results in a number of downstream applications including sentiment analysis, intent classification and others. However, obtaining text representations or embeddings using these models risks encoding personally identifiable information learned from language and context cues that may lead to privacy leaks. To ameliorate this issue, we propose Context-Aware Private Embeddings (CAPE), a novel approach which combines differential privacy and adversarial learning to preserve privacy during training of embeddings. Specifically, CAPE firstly applies calibrated noise through differential privacy to maintain the privacy of text representations by preserving the encoded semantic links while obscuring sensitive information. Next, CAPE employs an adversarial training regime that obscures identified private variables. Experimental results demonstrate that our proposed approach is more effective in reducing private information leakage than either single intervention, with approximately a 3{\%} reduction in attacker performance compared to the best-performing current method.",
}


Abstract

Deep learning-based language models have achieved state-of-the-art results in a number of applications including sentiment analysis, topic labelling, intent classification and others. Obtaining text representations or embeddings using these models presents the possibility of encoding personally identifiable information learned from language and context cues that may present a risk to reputation or privacy. To ameliorate these issues, we propose Context-Aware Private Embeddings (CAPE), a novel approach which preserves privacy during training of embeddings. To maintain the privacy of text representations, CAPE applies calibrated noise through differential privacy, preserving the encoded semantic links while obscuring sensitive information. In addition, CAPE employs an adversarial training regime that obscures identified private variables. Experimental results demonstrate that the proposed approach reduces private information leakage better than either single intervention.