Skip to content

epn-ml/AL-Drift-Detection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Detection of magnetopause and bow shock crossings on Mercury based on MESSENGER magnetometer data

The project is intended for detecting concept drift in MESSENGER magnetomoter data, sampling the data according to the detected drifts, and training a CRNN model to predict the crossings. Here, project's file structure and instructions for using it are provided.

Setup

Before starting drift detection, orbits with initially known drifts need to be prepared. Here are several groups of adjacent orbits, each group can be assigned its own drift label:

  • 232-247
  • 380-399
  • 553-570
  • 607-627
  • 750-764
  • 1383-1396
  • 1460-1474
  • 1489-1505
  • 1507-1526
  • 1560-1577

All files corresponding to these orbits need to be placed in data/drifts/. The format of orbit files is df_N.csv, where N is the orbit number. Program was tested with 8 groups of orbits with known drifts, consisting of the following 100 orbits:

1383  1470  242   558  754  
1384  1471  244   559  755  
1385  1474  245   560  756  
1386  ----  247   561  757  
1387  1489  ----  562  758  
1388  1490  380   564  759  
1389  1491  381   565  760  
1390  1492  382   566  761  
1391  1493  384   567  762  
1392  1494  387   569  763  
1393  1495  388   570  
1394  1496  389   ---- 
1395  1497  390   607  
1396  1498  391   612  
----  1499  394   613  
1460  ----  396   619  
1461  233   397   621  
1462  234   398   625  
1464  235   399   626  
1465  236   ----  627  
1466  237   553   ---- 
1467  239   554   751  
1468  240   556   752  
1469  241   557   753  

All other orbits for drift detection and crossing prediction need to be put in data/orbits/. Orbits with known drifts from data/drifts/ are included automatically, so their files do not need to be added to data/orbits/ again. Program was tested on datasets of different sizes between 100 to 3000 orbits.

Drift detection

Before crossing prediction, drift detection has to be performed first with the following command :

python gan.py logs/gan 1 cuda

There are 3 command line arguments:

  1. Directory for logs and output - logs/gan
  2. Dataset number - 1 by default
  3. Device name - cuda or cpu

Dataset number is used later when performing drift detection on several different dataset samples in a row. By default its value should be 1, as it corresponds to full dataset, which means that all orbits from data/orbits/ are used.

Another parameter is the features that are selected to be trained on. These features are written down in data/features_gan.txt file, separated by newline, and they can be changed if needed. Full list of available features: X_MSO, Y_MSO, Z_MSO, BX_MSO, BY_MSO, BZ_MSO, DBX_MSO, DBY_MSO, DBZ_MSO, RHO_DIPOLE, PHI_DIPOLE, THETA_DIPOLE, BABS_DIPOLE, BX_DIPOLE, BY_DIPOLE, BZ_DIPOLE, RHO, RXY, X, Y, Z, VX, VY, VZ, VABS, D, COSALPHA, EXTREMA

Script ./run-gan.sh is used for performing drift detection on multiple samples of the dataset. It also adds a timestamp to log directory. Script is executed with the following command:

./run-gan.sh cuda

The datasets themselves are defined on line 939 of gan.py and can be edited if, for example, the sampling needs to be changed or there are less orbits available in total.

The output of drift detection is stored in 2 text files: log_set1.txt and drifts_set1.txt. The number at the end corresponds to the dataset sample, so there will be more output files if drift detection is performed with ./run-gan.sh. File log_set1.txt is updated as drift detection is happening, and the output contains drift labels that are assigned to sets of orbits with probabilities, an example of which looks like this:

113/2312 orbits 2 - 20 (13) -- drift 4, prob 0.9999923706054688
123/2312 orbits 21 - 33 (10) -- drift 7, prob 0.9974427223205566
127/2312 orbits 35 - 39 (4) -- drift 3, prob 0.9999980926513672

File drifts_set1.txt contains orbit numbers and drift labels assigned to them, in the format of Python dictionary. This is intended for passing the results of drift detection to crossing prediction. A sample of these results looks like this:

30 7
31 7
33 7
35 3
37 3
38 3

A catalogue of drift predictions for a full set of avaiable orbits is stored in drifts_all.txt.

Crossing prediction

After drift detection is finished, its results need to be manually moved to the data directory, so drifts_set1.txt (set2 and so on) file needs to be moved from logs/gan to data. This is done to decouple drift detection from crossing prediction, so that these programs can be executed and tested separately.

Once the files with orbit numbers and drift labels are ready, crossing prediction can be performed with the following command:

python cnn.py logs/cnn 1 23 5

There are 4 command line arguments:

  1. Directory for logs and output - logs/cnn
  2. Dataset number - 1 by default
  3. Plot numbers - 23, which correspond to true and predicted labels of testing set
  4. Value of max_orbits parameter - 5

Dataset number serves the same purpose as in drift detection. Plot numbers are a string with digits that are used to indicate what types of plots need to be drawn. 0 corresponds to true labels of training orbits, 1 - predicted labels of training orbits, 2 - true labels of testing orbits, 3 - predicted labels of testing orbits. By default it's 23 because prediction results of testing orbits are usually of most interest. Last argument is max_orbits, which defines how many training orbits per drift label are going to be sampled. Value of 5 means that only 5 orbits with highest entropy from each drift are selected and used for training the classifier. A total amount of training orbits can be estimated as value of max_orbits multiplied by a number of detected drifts.

As in drift detection, features for training in crossing prediction are selected in data/features_cnn.txt file. Full list of available features remains the same.

Script ./run-cnn.sh also has the same purpose for crossing prediction as ./run-gan.sh for drift detection. Script is executed with the following command:

./run-cnn.sh 23 5

Values 23 and 5 correspond to plot numbers and max_orbits.

The output of crossing prediction consists of text log and plots. Log shows several things:

  1. What orbits are selected for training and testing from each drift (drift 1 training orbits, drift 1 testing orbits, ...)
  2. Metric summary for testing data in = TESTING = section
  3. Metric values for each orbit in = EVALUATION = section

Plots are distributed in subdirectories like test_true, test_pred, test_all (merged plots) and are stored as PNG files like fig240_drift1.png.

After crossing prediction is finished, its performance values and plots can be used for further analysis.

Project organization

├── README.md                  <- Top-level README for using this project
├── data
│   ├── drifts                 <- Orbit files with initially known drifts
│   ├── orbits                 <- Orbit files for drift detection
|   ├── drifts_set1.txt        <- Detected drifts for use during crossing prediction
|   ├── features_cnn.txt       <- Selected features for crossing prediction
|   └── features_gan.txt       <- Selected features for drift detection
│
├── logs                       <- Generated logs
|   ├── cnn                    <- Logs for crossing prediction
|   |   ├── plots              <- Plotted orbits with true and predicted crossings
|   |   └── log_cnn_set1.txt   <- Text log
|   |
|   └── gan                    <- Logs for drift detection
|       ├── drifts_all.txt     <- Catalogue of detected drifts for all available orbits.
|       ├── drifts_set1.txt    <- Detected drift for each orbit
|       └── log_set1.txt       <- Text log
│
├── cnn.py                     <- Main script for crossing prediction
├── gan.py                     <- Main script for drift detection
├── util.py                    <- Helper functions loading data
├── run-cnn.sh                 <- Script for performing crossing prediction with different arguments
└── run-gan.sh                 <- Script for performing drift detection with different arguments

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Shell 0.4%