Hi, recently, I am researching about Keyphrase generation. Usually, people use seq2seq with attention model to deal with such problem. Specifically I use the framework: https://github.com/memray/seq2seq-keyphrase-pytorch, which is implementation of http://memray.me/uploads/acl17-keyphrase-generation.pdf . Now I just change its encoder part to BERT, but the result is not good. The experiment comparison of two models is in the attachment. Can you give me some advice if what I did is reasonable and if BERT is suitable for doing such a thing? Thanks. [RNN vs BERT in Keyphrase generation.pdf](https://github.com/huggingface/pytorch-pretrained-BERT/files/2623599/RNN.vs.BERT.in.Keyphrase.generation.pdf)