Skip to content

otrecoding/OptimalTransportDataIntegration.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OptimalTransportDataIntegration.jl

CI codecov

This package implements a statistical matching strategy based on optimal transport theory to integrate different data sources. These data sources are related to the same target population, which share a subset of covariates which each data source has its own distinct subset of variables. After recoding you'll get a unique data set in which all the variables, coming from the different sources, are jointly available.

This package is derived from OTRecod.jl where joint distribution of shared and distinct variables is transported within the data sources. Here the method also transports the distribution of shared and distinct variables and estimates a function to predict the missing variables.

Installation

The package runs on julia 1.1 and above. In a Julia session switch to pkg> mode to add the package:

julia>] # switch to pkg> mode
pkg> add https://github.com/otrecoding/OptimalTransportDataIntegration.jl

To run an example

using OptimalTransportDataIntegration # import the package

params = DataParameters()  # Create the parameters set

rng = DataGenerator(params)  # Create the random generator

data = generate( rng ) # Generate a dataset 

result = otrecod( data, JointOTBetweenBases() ) # Perform the statistical matching 

println(accuracy(result))  # Print accuracies on each distinct variables and the total accuracy.

It is possible tu use continuous explanatory variables by using

rng = DataGenerator(params, discrete = false)

outcomes are always categorical, Y outcome levels are 1:4 and Z outcome levels are 1:3.

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages