0% found this document useful (0 votes)
63 views

Offensive Comment Detection Using Zero-Shot Learning: Nikhil Chilwant

The document discusses using zero-shot learning for offensive comment detection. It proposes using a domain adaptation approach with BERT, where BERT is used to select data points from a source domain that are similar to a target domain for offensive comment detection about deceased people. The approach involves measuring domain discrepancy with Maximum Mean Discrepancy and designing a learning curriculum from easy to harder samples based on domain similarity probabilities from a domain classifier.

Uploaded by

Nikhil Chilwant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views

Offensive Comment Detection Using Zero-Shot Learning: Nikhil Chilwant

The document discusses using zero-shot learning for offensive comment detection. It proposes using a domain adaptation approach with BERT, where BERT is used to select data points from a source domain that are similar to a target domain for offensive comment detection about deceased people. The approach involves measuring domain discrepancy with Maximum Mean Discrepancy and designing a learning curriculum from easy to harder samples based on domain similarity probabilities from a domain classifier.

Uploaded by

Nikhil Chilwant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Offensive comment detection using zero-shot

learning
Advisor: Prof. D. Klakow • In collaboration with Eternio GmbH.

Nikhil Chilwant
Matriculation no. : 2577689

April 16, 2021

Master seminar Offensive comment detection April 16, 2021 1 / 22


Overview

Text only offensive comments

Hateful meme detection

Master seminar Offensive comment detection April 16, 2021 2 / 22


The problem statement

I eternio.de helps you to remember your loved ones.

Master seminar Offensive comment detection April 16, 2021 3 / 22


The problem statement (cont.)

I We want to identify the ‘inappropriate’ comment.


• Ich bin glücklich! (I am happy!)
• Jetzt hat der Tod ihn an der Backe Und wir sind ihn zum Glück los
(Now he is dead, we are happy to get rid of him)
• Traumhaft eine zukunft ohne ihn (A dream future without him)

I We consider such ‘inappropriate’ comments as offensive.


I Unfortunately, there no exclusive dataset with offensive comments
for deceased people.

Master seminar Offensive comment detection April 16, 2021 4 / 22


Preliminary experiment

I GermEval 2018 [1] dataset is a good starting point.

Master seminar Offensive comment detection April 16, 2021 5 / 22


Preliminary experiment

I GermEval 2018 [1] dataset is a good starting point.


I The results were as good as a random classifier.
Precision Recall F1-score Support
non-offensive 0.44 1.00 0.61 212
offensive 0.00 0.00 0.00 268
accuracy 0.44 480
macro avg 0.22 0.50 0.31 480
weighted avg 0.20 0.44 0.27 480

Master seminar Offensive comment detection April 16, 2021 5 / 22


Few shot learning [2]

I Few shot learning (FSL) is a special case of machine learning, which


targets at obtaining good learning performance given a limited
supervised information provided in the training set.

Master seminar Offensive comment detection April 16, 2021 6 / 22


Few shot learning [2]

I Few shot learning (FSL) is a special case of machine learning, which


targets at obtaining good learning performance given a limited
supervised information provided in the training set.
I When the training set does not contain any example with the
supervised information for the task, FSL becomes a zero-shot
learning (ZSL).

Master seminar Offensive comment detection April 16, 2021 6 / 22


Few shot learning [2]

I Few shot learning (FSL) is a special case of machine learning, which


targets at obtaining good learning performance given a limited
supervised information provided in the training set.
I When the training set does not contain any example with the
supervised information for the task, FSL becomes a zero-shot
learning (ZSL).
I Transfer learning methods are popularly used in FSL.

Master seminar Offensive comment detection April 16, 2021 6 / 22


Few shot learning [2]

I Few shot learning (FSL) is a special case of machine learning, which


targets at obtaining good learning performance given a limited
supervised information provided in the training set.
I When the training set does not contain any example with the
supervised information for the task, FSL becomes a zero-shot
learning (ZSL).
I Transfer learning methods are popularly used in FSL.
I ‘Domain adaptation’ is a type of transfer learning in which the
source/target tasks are the same but the source/target domains are
different.

Master seminar Offensive comment detection April 16, 2021 6 / 22


The selection of the approach [3]

I The methods and architectures proposed before are mainly either of


the type ‘discrepancy-based’ or the type ‘adversarial-based’.

Master seminar Offensive comment detection April 16, 2021 7 / 22


The selection of the approach [3]

I The methods and architectures proposed before are mainly either of


the type ‘discrepancy-based’ or the type ‘adversarial-based’.
I We should use the BERT - one of the best pre-trained language
model.

Master seminar Offensive comment detection April 16, 2021 7 / 22


The selection of the approach [3]

I The methods and architectures proposed before are mainly either of


the type ‘discrepancy-based’ or the type ‘adversarial-based’.
I We should use the BERT - one of the best pre-trained language
model.
I Ma et al. found that the adversarial approach is hard to train and
the performance is unsteady.

Master seminar Offensive comment detection April 16, 2021 7 / 22


The selection of the approach [3]

I The methods and architectures proposed before are mainly either of


the type ‘discrepancy-based’ or the type ‘adversarial-based’.
I We should use the BERT - one of the best pre-trained language
model.
I Ma et al. found that the adversarial approach is hard to train and
the performance is unsteady.
I Ma et al.’s Domain Adaptation + Data Selection approach is
promising.

Master seminar Offensive comment detection April 16, 2021 7 / 22


The selection of the approach [3]

I The methods and architectures proposed before are mainly either of


the type ‘discrepancy-based’ or the type ‘adversarial-based’.
I We should use the BERT - one of the best pre-trained language
model.
I Ma et al. found that the adversarial approach is hard to train and
the performance is unsteady.
I Ma et al.’s Domain Adaptation + Data Selection approach is
promising.
I Inspired by the ‘curriculum learning’ and ’data selection’ techniques.

Master seminar Offensive comment detection April 16, 2021 7 / 22


The selection of the approach [3]

I The methods and architectures proposed before are mainly either of


the type ‘discrepancy-based’ or the type ‘adversarial-based’.
I We should use the BERT - one of the best pre-trained language
model.
I Ma et al. found that the adversarial approach is hard to train and
the performance is unsteady.
I Ma et al.’s Domain Adaptation + Data Selection approach is
promising.
I Inspired by the ‘curriculum learning’ and ’data selection’ techniques.
I Curriculum learning uses the prior knowledge about the difficulty of
the training examples.

Master seminar Offensive comment detection April 16, 2021 7 / 22


The selection of the approach [3]

I The methods and architectures proposed before are mainly either of


the type ‘discrepancy-based’ or the type ‘adversarial-based’.
I We should use the BERT - one of the best pre-trained language
model.
I Ma et al. found that the adversarial approach is hard to train and
the performance is unsteady.
I Ma et al.’s Domain Adaptation + Data Selection approach is
promising.
I Inspired by the ‘curriculum learning’ and ’data selection’ techniques.
I Curriculum learning uses the prior knowledge about the difficulty of
the training examples.
I Data selection removes the irrelevant data points.

Master seminar Offensive comment detection April 16, 2021 7 / 22


Domain adaptation with BERT [3]

I Use BERT to select data points from the ‘source domain’ similar to
the ‘target domain’.

Master seminar Offensive comment detection April 16, 2021 8 / 22


Domain adaptation with BERT [3]

I Use BERT to select data points from the ‘source domain’ similar to
the ‘target domain’.
I The probability score from the domain classifier will quantify the
domain similarity.

Master seminar Offensive comment detection April 16, 2021 8 / 22


Domain adaptation with BERT [3]

I Use BERT to select data points from the ‘source domain’ similar to
the ‘target domain’.
I The probability score from the domain classifier will quantify the
domain similarity.
I Design the ‘learning curriculum’ of progressively harder samples.
‘Easy’ →
− high probability.

Master seminar Offensive comment detection April 16, 2021 8 / 22


Domain adaptation with BERT (contd.) [3]

Master seminar Offensive comment detection April 16, 2021 9 / 22


Domain adaptation with BERT (contd.) [3]

I Domain discrepancy will be measured by the Maximum Mean


Discrepancy (MMD).

Master seminar Offensive comment detection April 16, 2021 10 / 22


Domain adaptation with BERT (contd.) [3]

I Domain discrepancy will be measured by the Maximum Mean


Discrepancy (MMD).
I Squared MMD (dk ) between the probability distributions P & Q in
the reproducing kernel Hilbert space (Hk ) with kernel k

dk2 (P, Q) := ||EP [x] − EQ [x]||2Hk (1)

Master seminar Offensive comment detection April 16, 2021 10 / 22


Domain adaptation with BERT (contd.) [3]

I Domain discrepancy will be measured by the Maximum Mean


Discrepancy (MMD).
I Squared MMD (dk ) between the probability distributions P & Q in
the reproducing kernel Hilbert space (Hk ) with kernel k

dk2 (P, Q) := ||EP [x] − EQ [x]||2Hk (1)

I Domain regularized training objective

1 X
min L(xi , yi ; θ) + λ.dk2 (Ds , Dt ; θ) (2)
θ |S|
xi ,yi ∈S

L: Cross-entropy loss
S: collection of the labelled source domain data
λ: regularization parameter
k: rational quadratic kernel

Master seminar Offensive comment detection April 16, 2021 10 / 22


Domain adaptation with BERT (contd.) [3]

Figure 1: Setup for BERT domain adaptation with MMD-based domain


regularization.
xs : labeled source data
xt : unlabeled target data
zs : predicted label for source data
ys : target label for the source data
zs0 : latent domain representation of the source data
zt0 : latent domain representation of the target data

Master seminar Offensive comment detection April 16, 2021 11 / 22


Domain adaptation with BERT (contd.) [3]

I Improve the performance by incorporating multiple source domains


ex. Sentiment classification datasets, offensive comments datasets

Master seminar Offensive comment detection April 16, 2021 12 / 22


Domain adaptation with BERT (contd.) [3]

I Improve the performance by incorporating multiple source domains


ex. Sentiment classification datasets, offensive comments datasets
I Use the Multi-task learning (MTL) technique.

Figure 2: Conceptual overview of MTL [4].

Master seminar Offensive comment detection April 16, 2021 12 / 22


Domain adaptation with BERT (contd.)

Figure 3: The generic MTL algorithm [5]


Master seminar Offensive comment detection April 16, 2021 13 / 22
Hateful meme detection
I The user can post a meme.

Master seminar Offensive comment detection April 16, 2021 14 / 22


Hateful meme detection
I The user can post a meme.

I Dataset? →
− Facebook’s hateful meme challenge [6].

Master seminar Offensive comment detection April 16, 2021 14 / 22


Hateful meme detection
I The user can post a meme.

I Dataset? →
− Facebook’s hateful meme challenge [6].
I Includes ‘benign confounders’

Master seminar Offensive comment detection April 16, 2021 14 / 22


Performance numbers

Master seminar Offensive comment detection April 16, 2021 15 / 22


Performance numbers

Master seminar Offensive comment detection April 16, 2021 15 / 22


The current best approach

I Ensembles VL-BERT, UNITER, VILLA and ERNIE-Vil with slight


modifications.

Master seminar Offensive comment detection April 16, 2021 16 / 22


The current best approach

I Ensembles VL-BERT, UNITER, VILLA and ERNIE-Vil with slight


modifications.
I Additional data source
• Google Web Entity detection: image context
• FairFace classifier: gender, race

Master seminar Offensive comment detection April 16, 2021 16 / 22


The current best approach

I Ensembles VL-BERT, UNITER, VILLA and ERNIE-Vil with slight


modifications.
I Additional data source
• Google Web Entity detection: image context
• FairFace classifier: gender, race
I Extending VL-BERT
• Represent all external labels as a special type of text token and link
it to a special image region using visual feature embedding.

Master seminar Offensive comment detection April 16, 2021 16 / 22


Visual feature embedding[7]

I Inspired by OSCAR.

Master seminar Offensive comment detection April 16, 2021 17 / 22


Visual feature embedding[7]

I Inspired by OSCAR.
I

I Visual region features and associated object tag pair (figure a) are
coupled in the shared space (figure b).

Master seminar Offensive comment detection April 16, 2021 17 / 22


Visual feature embedding[7]

I Inspired by OSCAR.
I

I Visual region features and associated object tag pair (figure a) are
coupled in the shared space (figure b).
I Objects act like an ‘anchor point’ in the semantics space (figure c).

Master seminar Offensive comment detection April 16, 2021 17 / 22


Visual feature embedding[7]

I Inspired by OSCAR.
I

I Visual region features and associated object tag pair (figure a) are
coupled in the shared space (figure b).
I Objects act like an ‘anchor point’ in the semantics space (figure c).
I Train the ‘extended VL-BERT’ using caption text, object entity
tags, race tags and image regions.

Master seminar Offensive comment detection April 16, 2021 17 / 22


The current best approach (contd.)

I If the text and image do not align, then the meme is probably
hateful.

Master seminar Offensive comment detection April 16, 2021 18 / 22


The current best approach (contd.)

I If the text and image do not align, then the meme is probably
hateful.
I Use UNITER with ITM (Image-Text Matching) head.

Master seminar Offensive comment detection April 16, 2021 18 / 22


The current best approach (contd.)

I If the text and image do not align, then the meme is probably
hateful.
I Use UNITER with ITM (Image-Text Matching) head.
I Use ERNIE-ViL without any modification

Master seminar Offensive comment detection April 16, 2021 18 / 22


The current best approach (contd.)

I If the text and image do not align, then the meme is probably
hateful.
I Use UNITER with ITM (Image-Text Matching) head.
I Use ERNIE-ViL without any modification
I AUCROC: 0.845, Accuracy: 73.20%

Master seminar Offensive comment detection April 16, 2021 18 / 22


The current best approach (contd.)

I If the text and image do not align, then the meme is probably
hateful.
I Use UNITER with ITM (Image-Text Matching) head.
I Use ERNIE-ViL without any modification
I AUCROC: 0.845, Accuracy: 73.20%
I Next step: analyze results and try to improve the performance.

Master seminar Offensive comment detection April 16, 2021 18 / 22


Timeline

I Implementation of the BERT based domain adaptation : April


I Propose and implement idea for Hateful meme : May-June
I Thesis write-up : July

Master seminar Offensive comment detection April 16, 2021 19 / 22


Conclusion

I The domain adaptation with data selection approach is promising for


offensive comment detection.

Master seminar Offensive comment detection April 16, 2021 20 / 22


Conclusion

I The domain adaptation with data selection approach is promising for


offensive comment detection.
I Would you like to suggest a better approach?

Master seminar Offensive comment detection April 16, 2021 20 / 22


Bibliography

[1] M. Wiegand, GermEval-2018 Corpus (DE), version V1, 2019. doi:


10.11588/data/0B5VML. [Online]. Available:
https://doi.org/10.11588/data/0B5VML.
[2] Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a
few examples: A survey on few-shot learning,” ACM Computing
Surveys (CSUR), vol. 53, no. 3, pp. 1–34, 2020.
[3] X. Ma, P. Xu, Z. Wang, R. Nallapati, and B. Xiang, “Domain
adaptation with bert-based domain classification and data
selection,” in Proceedings of the 2nd Workshop on Deep Learning
Approaches for Low-Resource NLP (DeepLo 2019), 2019, pp. 76–83.
[4] S. Ruder, “An overview of multi-task learning in deep neural
networks,” arXiv preprint arXiv:1706.05098, 2017.
[5] X. Liu, P. He, W. Chen, and J. Gao, “Multi-task deep neural
networks for natural language understanding,” arXiv preprint
arXiv:1901.11504, 2019.

Master seminar Offensive comment detection April 16, 2021 21 / 22


Bibliography (cont.)

[6] D. Kiela, H. Firooz, A. Mohan, V. Goswami, A. Singh, P. Ringshia,


and D. Testuggine, “The hateful memes challenge: Detecting hate
speech in multimodal memes,” arXiv preprint arXiv:2005.04790,
2020.
[7] X. Li, X. Yin, C. Li, P. Zhang, X. Hu, L. Zhang, L. Wang, H. Hu,
L. Dong, F. Wei, et al., “Oscar: Object-semantics aligned
pre-training for vision-language tasks,” in European Conference on
Computer Vision, Springer, 2020, pp. 121–137.

Master seminar Offensive comment detection April 16, 2021 22 / 22

You might also like