Skip to content

AMinerOpen/AWOE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWOE

Academic WOrd Embeddings based on AMiner 2 billion publication data and gensim and their applications.

Dependencies

  • Python 3
  • gensim
  • spherecluster

Overview

Pre-trained Models

  • English Paper Keywords (EPK): Download
  • Chinese Paper Keywords (CPK): Download
  • Bilingual Transformation Matrix: Download

For details for these models, see docs/word2vec.md. (If you just want to use these models, ignore them.)

We hvae prepared a download bash script for you, you can use it on your need. For example, if you only need Chinese, just run ./download.sh zh.

chmod +x download.sh
./download.sh zh
./download.sh en
wget https://lfs.aminer.cn/misc/awoe/W_en2zh.pkl -P tmp/

Utils

We provide some utils to use the above models, including tokenization, keyword extraction, sentense to vector, etc. Here are some use examples.

Before using these modules, download the required models first.

Mono-lingual

Docs to complete. You can run test.py for now.

Bi-lingual

Docs to complete.

Citation

If our work helps you in some way, please consider citing the following publication(s):

  • Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD’2008).

About

Academic Word Embedding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published