Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How do I set the default parser, the one that is called when no explicit parser available?

called Default explicit parser
0
10 Posted

How do I set the default parser, the one that is called when no explicit parser available?

0
10

Its already setup for you in the default config. Here is what the ‘parse-default’ plugin does. If a resource has a content type for which there is no parser, e.g. if there is no image or audio parser mentioned in the nutch-site.xml plugin.includes, all such resources are passed to the html parser in native nutch (For non-html types it will return failed parse). The way nutch ParserFactory figures which parser to use as default is by looking at the plugin.xml of each parser and the first that it finds that has an empty pathSuffix is the one it uses as default. To change this behavior, we’ve filled in the nutch/src/plugin/parse-html/plugin.xml#pathSuffix with ‘html’ in the html parse plugin that is part of NutchWAX and have added our own default parser, parser-default, to nutch-site.xml in the plugin.includes with an empty pathSuffix in its plugin.xml.

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123