Teanga is a linked data based platform for natural language processing (NLP).
Teanga enables the use of many NLP services from a single interface, whether the need was to use a single service or multiple services in a pipeline. Teanga focuses on the problem of NLP service interoperability by using linked data to define the types of service input and output. Teanga's strengths include being easy to install and run, easy to use, able to run multiple NLP tasks from one interface and helping users to build a pipeline of tasks through a graphical user interface.
Natural language processing (NLP) tasks typically consist of many individual components that need to be used together in order to solve real-world problems. However, it is frequently the case that these components are developed independently and thus integration of these services is far from trivial. The installation of these services can act as a significant barrier to entry for NLP developers and even once developed these pipelines can be opaque and brittle. These issues are of course endemic to software development and until recently could only be solved by integrating all components within a single development model, for example, such as integrating all NLP tools using the Python language as has been done by NLTK. An alternative model has arisen in the form of Web services that provide integration between multiple components through clear and defined protocols such as REST. However, Web services have generally not been adopted by researchers or industry, in part due to the fact that the remote nature of the computation can lead to issues with the availability of services (as external services are often down) and the speed of these services (as sending requests to servers creates significant bottlenecks).
Our new platform is called Teanga, which aims to achieve the best of both worlds. We use Web services combined by means of novel linked data standards, in particular, JSON-LD, to provide interoperability between services without the need to have particular programming or framework. We also use containerization technology, in particular, Docker, to ensure that these services and the pipelines that are generated by these tools are highly portable and can easily be used at scale. Furthermore, Teanga has an easy-to-use UI that allows users to visualise their pipelines as well as the progress (or failure) of each service individually.
Housam works mainly on building software and demos for the UNLP research, with focus on data and knowledge visualisation in the context of several projects, and designing and developing user interfaces for knowledge presentation, exploration and discovery.
John's research interests include the following: Linked data and the Semantic Web, Ontologies, Collaborative development and publishing of language resources, Machine translation and multilingualism, and Under-resourced languages.
Paul's main research interests are in the development and use of Natural Language Processing methods and solutions for semantic-based information access. He has been involved in a large number of national and international funded projects in this area.