Prior to 1.0.24, the following code can work:
from goose import Goose
g = Goose()
article_1 = g.extract(url=...)
article_2 = g.extract(url=...)
For each Goose instance, extract() method can be called multiple times.
But it seems due to #161 fixing, the above code cannot work now. When calling extract() second time, it throws the exception.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../package/lib/python2.7/site-packages/goose_extractor-1.0.24-py2.7.egg/goose/__init__.py", line 56, in extract
return self.crawl(cc)
File ".../package/lib/python2.7/site-packages/goose_extractor-1.0.24-py2.7.egg/goose/__init__.py", line 63, in crawl
parsers.remove(self.config.parser_class)
ValueError: list.remove(x): x not in list
By default, lxml parser is used. When first calling extract(), parsers changes from ['lxml', 'soup'] to ['soup']. The second time error occurs when it tries to remove 'lxml' from ['soup']. https://github.com/grangier/python-goose/blob/develop/goose/__init__.py#L62
Prior to 1.0.24, the following code can work:
For each Goose instance,
extract()method can be called multiple times.But it seems due to #161 fixing, the above code cannot work now. When calling
extract()second time, it throws the exception.By default, lxml parser is used. When first calling
extract(),parserschanges from['lxml', 'soup']to['soup']. The second time error occurs when it tries to remove'lxml'from['soup']. https://github.com/grangier/python-goose/blob/develop/goose/__init__.py#L62