ijaser
IJASER publishes high-quality, original research papers, brief reports, and critical reviews in all theoretical, technological, and interdisciplinary studies that make up the fields of advanced science and engineering and its applications.
In recent years, huge amount of stored information has been enormously
increasing day by day which is generally in the unstructured form and cannot be
used for any processing to extract useful information. Exploring potentially useful information from huge amount of textual data
produced by social networking services has attracted much attention in recent
years. In Micro-blogging web services
such as Twitter, the user is often bombarded with tons of information and raw
data, with user unable to classify it into right category. The solution
to overcome this problem can be derived from automatic text classification
process. Twitter exhibits
several characteristics, including a limited number of features and noisy text
information. Extracting valuable information from Twitter has made hot topic
detection a challenging task. In this paper, a novel four-stage
framework is proposed to improve the performance of topic detection. Data
preprocessing is the first stage. Deep learning is then exploited to enrich
short text information via image understanding. Next, improved latent Dirichlet
allocation is used to optimize the image effective word pairs, which improves
the accuracy of the extracted topic words. Finally, both short text and images
are integrated for topic detection, in which the corresponding topics are mined
based on fuzzy matching of topic words. A large number of experiments show that the proposed framework
significantly improves the performance of topic detection and outperforms the
selected baseline methods.