Skip to content

Conversation

hamzarabbani
Copy link

There's an error on line 243. Which is resolved with try catch.

@angeloskath
Copy link
Owner

Hi,

Thanks for the pull request! Could you perhaps provide a test that fails without the change? If you don't feel like providing one I will look into it more when I have next week.

Thanks again, it may take some time but I will probably merge it.

Angelos

@hamzarabbani
Copy link
Author

hamzarabbani commented May 13, 2017

Simple example to generate error:

use NlpTools\FeatureFactories\DataAsFeatures;
use NlpTools\Tokenizers\WhitespaceTokenizer;
use NlpTools\Documents\TokensDocument;
use NlpTools\Documents\TrainingSet;
use NlpTools\Models\Lda;

if (file_exists($argv[1])
    $docs = file($argv[1]);
else
    die(1);
$tok = new WhitespaceTokenizer();
$tset = new TrainingSet();
foreach ($docs as $f) {
    $tset->addDocument(
        '', // the class is not used by the lda model
        new TokensDocument(
            $tok->tokenize(
                file_get_contents($f)
            )
        )
    );
}
$lda = new Lda(
    new DataAsFeatures(), // a feature factory to transform the document data
    5, // the number of topics we want
    1, // the dirichlet prior assumed for the per document topic distribution
    1  // the dirichlet prior assumed for the per word topic distribution
);
// run the sampler 50 times
$lda->train($tset,50);
print_r(
    $lda->getDocumentsPerTopicsProbabilities(10)
);

It will generate error. Because at line 243 in Lda.php it is written.

$count_topics_docs = array();
         foreach ($this->count_docs_topics as $doc=>$topics) {
             foreach ($topics as $t=>$c)
                $count_topics_docs[$doc][$t]++;
         }

$count_topics_docs[$doc][$t] has not been assigned any value at this point so we just cannot add ++ to it.
I solved this problem by using try catch

$count_topics_docs = array();
         foreach ($this->count_docs_topics as $doc=>$topics) {
             foreach ($topics as $t=>$c){
                 try {
                     $count_topics_docs[$doc][$t]++;
                 } catch (\Exception $ex){
                     $count_topics_docs[$doc][$t] = 1;
                 }
             }

There's another issue as well. On line 250 & 252 it is using variable named $limit_words. Which is not defined. It should have been $limit_docs (as this variable is being passed into parameter).

if ($limit_words>0) {
 arsort($p);
 $p = array_slice($p,0,$limit_words,true); // true to preserve the keys
}

to

if ($limit_docs > 0) {
 arsort($p);
 $p = array_slice($p,0,$limit_docs,true); // true to preserve the keys
}

@angeloskath
Copy link
Owner

So, do you care about getting the merge or should I just go ahead and fix it?

I am thinking the following if you care to do it yourself for some reason

  • if (!isset(...)) {} would be more PHP-like
  • Let's fix the $limit_docs as well
  • Your example should be made into a test case

Tell me if you want to do this and thanks again for the issue and pull request!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants