Skip to content

Conversation

@ASalvail
Copy link
Member

Adds :

  • Datasets loaded lazily of in memory
  • Metadata to keep track of datasets
  • Configuration files to index datasets
  • Utilities to save/load/remove datasets
  • A first basic CSV importer to create datasets
  • Full docstring docs
  • Test suite

ASalvail added 30 commits March 10, 2014 15:51
The storage requirement will instead be controlled by the driver (hdf5).
__iter__() and __getitem__ are false methods. They actually are class methods, which makes their definition on the fly quite tricky. The old solution would reassing the correct definition in the __init__ method. However, since it is a class method, it also affect other object (that might have a need of the other iterator or getter). Thus, the best way to make it work is the naive way that checks each time is dataset.target is None.
.travis.yml Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for Python3.4 is not release yet for Travis. See travis-ci/travis-ci#1989

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the time you'll get around to fix CI, support should be available.

@MarcCote
Copy link
Member

Folder tools should be renamed scripts.

I know we discussed it, but are we planning on supporting both Python2.7.+ and Python3.+ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line failed when $HOME/.mldataConfig does not already exist.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will be fixed.

@MarcCote
Copy link
Member

Why is there a capitalized D in tests/test_Dataset.py ? Is it because of the class with the same name?

@ASalvail
Copy link
Member Author

There is no folder tools...
No, we stick to python 3.
And yes for the capital D.

@MarcCote
Copy link
Member

Oups, you're right about tools, I must have looked at my branch :P .

Regarding the uppercase D, I think it should be put in lowercase because it is refering to the file you are testing: that is dataset.py. Inside the test file, it is fine to use the name of the class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: you mean .mldataConfig

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dataset is not found in the config file, _load_from_file will fail on the os.join with None. Maybe we should display a better error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lazy==False do we want data to be a ndarray? Right now it is a HDF5 dataset but still supports iteration and indexing as numpy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants