Towards computation of novel ideas from corpora of scientific textTools Liu, Haixia, Goulding, James and Brailsford, Tim (2015) Towards computation of novel ideas from corpora of scientific text. In: Machine Learning and Knowledge Discovery in Databases. Springer Verlag, Cham, Switzerland, pp. 541-556.
AbstractIn this work we present a method for the computation of novel 'ideas' from corpora of scientific text. The system functions by first detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classifying these into sets of problem and solution phrases via a target-word matching approach. By defining an idea as a co-occurring <problem,solution> pair, known-idea triples can be constructed through the additional assignment of a relevance value (computed via either phrase co-occurrence or an `idea frequency-inverse document frequency' score). The resulting triples are then fed into a collaborative filtering algorithm, where problem-phrases are considered as users and solution-phrases as the items to be recommended. The final output is a ranked list of novel idea candidates, which hold potential for researchers to integrate into their hypothesis generation processes. This approach is evaluated using a subset of publications from the journal Science, with precision, recall and F-Measure results for a variety of model parametrizations indicating that the system is capable of generating useful novel ideas in an automated fashion.
Actions (Archive Staff Only)
|