Garbage in, garbage out: An analysis of HTML text extractors and their impact on NLP performance | IEEE Conference Publication | IEEE Xplore