Processing and clustering of ancient Chinese poems with the objective of finding similar sentences

All of RECERCAT

To access the full text documents, please follow this link: http://hdl.handle.net/2099.1/19079

Title:	Processing and clustering of ancient Chinese poems with the objective of finding similar sentences
Author:	Soler Arasanz, Gonzalo
Other authors:	Dai, Liu
Abstract:	The objective of this project is to create a program that processes a set of ancient Chinese poems, reading them from a text file and storing them into data structures, so that they can be used to find similar sentences to a text the user will introduce. In order to achieve this they are broken into sentences, which are clustered (always keeping track of which poem they belong to), using a tf-idf score system between them to establish their similarity. Similar sentences will be found checking the similarity between the words they contain to the provided text. The clusters are calculated with a modification of hierarchical clustering, following the same principles, but limiting clustering to four sentences maximum. This way, a small set of similar sentences can be provided to the user instead of just one sentence similar to the text he inputted. Four clusters will be provided, the ones to which the most similar sentences belong to
Subject(s):	-Àrees temàtiques de la UPC::Informàtica::Programació -Translators (Computer programs) -Chinese poetry--Translations -Traductors (Programes d'ordinador) -Poesia xinesa -- Traducció
Rights:
Document type:	Bachelor Thesis
Published by:	Universitat Politècnica de Catalunya
Share:

Coordination

Supporters