This repository contains a series of machine learning experiments for link prediction within social networks.
We first implement and apply a variety of link prediction methods to each of the ego networks contained within the SNAP Facebook dataset and to various random networks generated using networkx, and then calculate and compare the ROC AUC, Average Precision, and runtime of each method.
- (Variational) Graph Auto-Encoders: An end-to-end trainable convolutional neural network model for unsupervised learning on graphs
- Node2Vec/DeepWalk: A skip-gram based approach to learning node embeddings from random walks within a given graph
- Spectral Clustering: Using spectral embeddings to create node representations from an adjacency matrix
- Baseline Indexes: Adamic-Adar, Jaccard Coefficient, Preferential Attachment
- python 2.7
- TensorFlow (1.0 or later)
- networkx
- gensim
- scikit-learn
- scipy
- jupyter notebook
python setup.py installfacebook/: Original Facebook ego networks dataset, with added .allfeats files (with both ego and alter features)fb-processed/: Pickle dumps of (adjacency_matrix, feature_matrix) tuples for each ego network, and for combined networkvisualizations/: Visualizations of each network generated by networkx and matplotlibnetwork-statistics/: .txt and .pkl files of pre-calculated network characteristics (with info on connectivity, network size, etc.) for each networktrain-test-splits/: Pickle dumps of pre-processed train-test splits for Facebook ego networks, with varying degrees of visibility (i.e. how many edges are hidden). Includes:adj_train,train_edges,train_edges_false,val_edges,val_edges_false,test_edges,test_edges_falseprocess-ego-networks.py: Script used to process raw Facebook data and generate pickle dumpsprocess-combined-network.py: Script used to combine Facebook ego networks and generate complete network pickle dumpfb-train-test-splits.py: Script used to generate and store train-test splits for each Facebook ego network
link-prediction-baselines.ipynb: Adamic-Adar, Jaccard Coefficient, Preferential Attachmentspectral-clustering.ipynb: Using spectral embeddings for link predictionnode2vec.ipynb: Skip-gram based representation learning for node/edge embeddingsgraph-vae.ipynb: (Variational) Graph Autoencoder, learns node embeddings to recreate adjacency matrix
link_prediction_scores.py: Utility functions for running various link prediction tests
network-visualizations.ipynb: Generate .pdf visualizations for each network, in addition to calculating and storing a variety of network metrics (e.g. transtivity, avg. clustering coefficient, etc.)
nx-graph-experiments.ipynb: Run all link prediction tests on various types of random networks (Erdos-Renyi, Barabasi-Albert, etc.)fb-graph-experiments.ipynb: Run all link prediction tests on each Facebook ego networkrun-all-experiments.py: Run all link prediction experiments (on both Facebook networks and random networkx networks), save results as pickle dumps inresults
results/: Pickle dumps of experiment results, with results (ROC AUC, ROC Curve, Avg. Precision, Runtime) stored for each link prediction method in Python dictionary forminvestigate-results.ipynb: Generate and save bar plots/ROC curve plots for each method and graph type, save network characteristics to.txtfilesresult-plots/: Bar plots for the results of each experiment (ROC AUC, AP, Minimum Runtime), in .pdf form