Abstract
The Language Application (LAPPS) Grid project has among its goals promoting reusability and interoperability of Natural Language Processing (NLP) tools. LAPPS Grid processing services are NLP tools such as tokenizers and part of speech taggers, that are packaged as web services to be used on the LAPPS Grid. The processing services can be used to develop and evaluate different sequences of tools. Interoperability of the tools simplifies development and makes experimentation with a large variety of tools more accessible.\r \r \r According to the International Data Corporation’s 2014 report, the size of the digital universe is doubling every two years. As the quantity of available data grows, the use of distributed tools for big data processing is becoming more of a necessity. Hadoop has proven to be a useful tool for large-scale processing in NLP. This paper describes the development of a system for using LAPPS Grid services with Hadoop. This system offers some of the benefits of the LAPPS Grid, interoperability and reusability, in an environment that can be scaled for processing of large data sets.