Publication de 12 numéros par an
ISSN Imprimer: 0040-2508
ISSN En ligne: 1943-6009
Indexed in
SEARCH ENGINE INTELLIGENT ALGORITHM FOR BIG DATA
RÉSUMÉ
How to improve the efficiency of search engine in big data environment has become a hot issue. In this study, for clustering search, key words were extracted by Term Frequency-Inverse Document Frequency (TF-IDF), the defect of K-means algorithm was improved by combining Canopy algorithm to obtain a Canopy-K-means (CKM) algorithm, and its retrieval performance was tested. The results showed that the performance of the algorithm increased with the increase of data volume in searching different key words, the search time shortened, and the recall factor and the pertinency factor improved. The CKM algorithm showed an excellent performance in big data processing and better performance compared to LDA and K-means algorithms. The comparison with the clustering performance of K-means algorithm demonstrated that the clustering result of the CKM algorithm was more similar to the actual number of clusters and its clustering accuracy was higher, indicating that the CKM algorithm was effective in retrieval. The experimental results of this study make some contributions to improve the efficiency of data retrieval and meet the needs of users, which is conducive to the better development of search engines.
-
Selvan, M.P. and Sekar, A.C., (2016) ASE: Automatic Search Engine for Dynamic Information Retrieval, J. Comput. Theor. Nanos., 13(11), pp. 8486-8494.
-
Woo, H., Cho, Y., and Shim, E., (2016) Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea., J. Med Internet. Res., 18(7), e177.
-
Young, S.D. and Zhang, Q., (2018) Using search engine big data for predicting new HIV diagnoses, PLOS ONE, 13(7), pp. 0199527.
-
Li, X., Pan, B., Law, R., and Huang, X., (2017) Forecasting tourism demand with composite search index, Tourism. Manage., 59, pp. 57-66.
-
Kim, S. and Shin, D.H., (2016) Forecasting short-term air passenger demand using big data from search engine queries, Automation in Construction, 70(10), pp. 98-108.
-
Saad, S.S., El-Sayed, M.F., and Hassan, Y.F., (2015) Semantic Clustering of Search Engine Results, The Scientific. World. J., pp. 1-9.
-
Rajkumar, T.D., Raja, S.P., and Suruliandi, A., (2017) Users' Click and Bookmark Based Personalization Using Modified Agglomerative Clustering for Web Search Engine, Int J. Artif. Intell. T., 26(06), 1730002.
-
Kumar, N., (2017) Document Clustering Approach for Meta Search Engine, IOP Conf. Ser. Mater. Sci. Eng., 225, pp. 012291.
-
Dimri, N., Kaul, H., and Gupta, D., (2018) Meta Xplorer: an intelligent and adaptable metasearch engine using a novel ordered weighted averaging operator, Int J. Syst. Assur. Eng. Manag, 9(6), pp. 1315-1325.
-
Song, D., Sun, F., and Liao, L., (2013) A hybrid approach for content extraction with text density and visual importance of DOM nodes, Knowl. Inf. Syst., 42(1), pp. 75-96.
-
Kay, S., Zhao, B., and Sui, D., (2015) Can Social Media Clear the Air? A Case Study of the Air Pollution Problem in Chinese Cities, Prof. Geogr., 67(3), pp. 351-363.
-
Cong, Y., Chan, Y.B., and Ragan, M.A., (2016) A novel alignment-free method for detection of lateral genetic transfer based on TF-IDF, Sci. Rep., 6, 30308.
-
Aliyari Ghassabeh, Y., Rudzicz, F., and Moghaddam, H.A., (2015) Fast incremental LDA feature extraction, Pattern. Recogn, 48(6), pp. 1999-2012.
-
Pan, B., (2015) The power of search engine ranking for tourist destinations, Tourism. Manage., 47, pp. 79-87.
-
De-Arteaga, M., Eggel, I., Do, B., Rubin, D., Kahn, C.E., and Muller, H., (2015) Comparing image search behavior in the ARRS Gold Miner search engine and a clinical PACS/RIS, J. Biomed. Inform., 56(C), pp. 57-64.
-
Jiang, L., Yu, S., Meng, D., Mitamura, T., and Hauptmann, A.G., (2016) Text-to-video: a semantic search engine for internet videos, Int J. Multimed. Inform. Retriev., 5(1), pp. 3-18.