Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering

This paper summarizes the major findings of the research project under the code name QG.14.60. The

research aims to enhancement of some fuzzy clustering methods by the mean of more generalized fuzzy sets.

The main results are: (1) Improve a distributed fuzzy clustering method for big data using picture fuzzy sets;

design a novel method called DPFCM to reduce communication cost using the facilitator model (instead of the

peer-to-peer model) and the picture fuzzy sets. The experimental evaluations show that the clustering quality of

DPFCM is better than the original algorithm while ensuring reasonable computational time. (2) Apply picture

fuzzy clustering for weather nowcasting problems in a novel method called PFS-STAR that integrates the STAR

technique and picture fuzzy clustering to enhance the forecast accuracy. Experimental results on the satellite

image sequences show that the proposed method is better than the related works, especially in rain predicting. (3)

Develop a GIS plug-in software that implemented some improved fuzzy clustering algorithms. The tool supports

access to spatial databases and visualization of clustering results in thematic map layers.

Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering trang 1

Trang 1

Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering trang 2

Trang 2

Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering trang 3

Trang 3

Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering trang 4

Trang 4

Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering trang 5

Trang 5

Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering trang 6

Trang 6

Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering trang 7

Trang 7

pdf 7 trang duykhanh 2240
Bạn đang xem tài liệu "Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering", để tải tài liệu gốc về máy hãy click vào nút Download ở trên

Tóm tắt nội dung tài liệu: Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering

Some improvements of fuzzy clustering algorithms using picture fuzzy sets and applications for geographic data clustering
to the 
 kj kj kj u (t) u (t 1)  (t)  (t 1)  (t)  (t 1) 
constraints: 
 or the step counter greater than maxSteps; 
 , 
 ukj ,kj ,kj 0,1 (2) otherwise, return to Step 1. 
 2.2. DPFCM - Distributed fuzzy clustering 
 ukj kj kj 1, 
 (3) using picture fuzzy sets 
 C
 In [17] the authors have proposed a fuzzy 
  ukj 2 kj 1, (4) 
 j 1 clustering algorithm CDFCM for distributed 
 C  computing environments with the peer-to-peer 
 kj 
  kj 1, k 1, N , j 1,C (5) communicational model (P2P). In this 
 j 1 C algorithm, the cluster centers and the fuzzy 
 membership factors of data points are 
 The steps of algorithm are as follows: calculated at every peer site and then updated in 
 - Initial step: t 0; randomly initialize the each iteration using only the results of the peer 
 (t) (t) (t) neighbors. This process is repeated until a 
variables u , , ( k 1, N , j 1,C ) 
 kj kj kj stopping criterion is satisfied. CDFCM is 
so that the conditions (2-3) are satisfied; considered as one of the most effective fuzzy 
 - Step 1: t= t+1; calculate the cluster clustering algorithms for distributed 
centers Vj using the formula below computing_environments. 
 N By analysis in details we realize that 
 m
  ukj 2 kj X k communication costs for each iteration of the 
 k 1 , j 1,C , 
 V j N (6) algorithm CDFCM is high, approximately p.nloc, 
 m
  ukj 2 kj where p is the number of peers and nloc is the 
 k 1 average number of neighbors of one peer. Also, 
 - Step 2: Update the ukj , ηkj, ξkj by the because the algorithm only use the nearby local 
formula (7-9) results to update in each iterations, so the final 
 clustering result may not be of highest quality. 
 1 , Our idea of improving the algorithm 
 ukj 
 2 CDFCM is that we can reduce communication 
 C X V m 1
 2  k j (7) costs and improve the quality of clustering 
  kj X V 
 i 1 k i results through using the picture fuzzy 
 clustering and the facilitator model instead of 
 k 1, N , j 1,C , the peer-to-peer communicational model. The 
 e kj 1 C proposed method is called DPFCM (distributed 
  1  , 
 kj C  ki (8) fuzzy picture clustering method). 
 ki C i 1 
 e - At the local level, each peer site performs 
 i 1 picture fuzzy clustering in each iteration; 
 N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38 35 
 - At the global level, all the peer sites HEART. The speed of convergence and the 
transfer the results to the unique master site cluster validity measurements are evaluated. 
which plays the role of a facilitator in the The average number of iterations AIN is 
communication process. Thus, in one updating obviously better if smaller, where as the 
step at the global level, the cost to complete the average classification rate ACR and the average 
communication process is of order of p. normalized mutual information ANMI [6] are 
Moreover, the global information allows to the bigger the_better. 
improve the quality of clustering. The table below compares the quality of our 
 The experimental evaluation was conducted clustering algorithm DPFCM with some other 
upon the benchmark datasets from UCI algorithms.
 k 
Machine Learning Repository, namely: IRIS, h
GLASS, IONOSPHERE, HABERMAN and 
 F 
 Table 1. Clustering quality of algorithms [10] 
 k 
 The results presented in the table show that international scientific journal "Expert Systems 
the clustering quality of DPFCM is mostly with Applications" [10]. 
better than those of three distributed clustering 
algorithms, namely CDFCM, Soft-DKM and 
PFCM. It is also better than the traditional 3. Application of picture fuzzy clustering in 
centralized clustering algorithm FCM, and is a analysis of meteorological images for 
little worse than the centralized weighted weather nowcasting 
clustering WEFCM. There are some cases, for 
example, of the IONOSPHERE and the One of the methods of predicting the 
HEART dataset, DPFCM results in clustering weather, called weather nowcasting, is on the 
quality of the same order or a little worse than basis of analysis of the satellite images 
CDFCM. sequence by combining the spatio-temporal 
 For the speed of convergence, the autoregressive (STAR) model with fuzzy 
comparison of AIN of DPFCM with the others clustering. There are publications in this 
shows the disadvantage of DPFCM as expected, research domain. Recently Shukla and 
but the differences of AINs are not much. colleagues [14] have proposed a number of 
 The above results were published in the technical improvements to raise the accuracy. 
36 N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38 
However, because using classical fuzzy sets, the Table 2. Comparison of RMSE and computational 
image areas of ambiguous interpretation or lack time of PFC-STAR and the method 
of clarity have the negative impacts to the of Shukla et al [12] 
prediction result. Picture fuzzy clustering [15] Computational 
 RMSE (%) 
using more advanced fuzzy concept has been time (sec) 
shown that is better than the traditional fuzzy Shukla Shukla 
 Data 
clustering. Our idea is advancing the research of PFC- et al. PFC- et al. 
Shukla et al, through combining the primary STAR (2014)’s STAR (2014)’s 
STAR techniques with picture fuzzy clustering method method 
to create a new weather prediction method, Malaysia 26.77 27.11 362.745 359.88 
 Luzon – 
called Picture Fuzzy Clustering - 33.61 33.45 345.672 343.43 
Spatiotemporal autoregressive (PFC-STAR). Philippines 
 Jakarta – 
We hope that the combination can improve the 30.12 32.04 342.76 339.97 
quality of the prediction results. The proposed Indonesia 
PFC-STAR method involves three steps: 
 - The pixels of satellite images (training 4. Developing data clustering tool as a plug-
samples) are divided into groups by using in for GIS 
picture fuzzy clustering algorithm proposed 
in_[15]. For the convenience of users in mining 
 - All the elements of these clusters in geographical data, a data clustering engine 
training samples are then labeled and filtered should be developed and integrated into GIS to 
using the Discrete Fourier Transform to clarify support direct access of spatial database for 
non-predictable scale to increase the time range reading input data and displaying the results on 
of predictability. the map layers. 
 - Finally, the next sequence of images are MapWindow is an open source GIS 
predicted through spatio-temporal auto- software that Windows users are familiar with 
regression method, which allows the weather and it is currently being developed and the 
forecast for the chosen geographic area in a latest version released continuously. 
short time ahead. MapWindow support plug-ins in the form of 
 - The experimental evaluation of the dynamic link libraries (.dll *), and the 
proposed method was conducted on the development environment such as Visual 
personal computer of 2 GB RAM, 2.13 GHz Studio Community Edition is available for free 
core 2 Duo, upon the data sets, which is the download. This tool supports using the 
sequence of satellite images of the Southeast language C# and dot.NET frame. Our 
Asia region. Each data set includes 5 satellite implementation of the proposed algorithms to 
images taken over a time period from 9:30 to run experimental evaluation is conducted using 
13:30, of 100 x 100 pixels in size. Comparison C / C ++, therefore the Visual Studio 
of the results showed that the method proposed development environment in the most suitable 
here is better than the relevant methods of choice to put our source code into. 
weather nowcasting, especially with higher The plug-in named SpatialClust is a 
precision of the rain-rate regression. clustering tool module for geographical data, 
 The above results have been presented and which deployed several fuzzy clustering 
published in the Proceedings of the algorithms with improvements that our team 
International Symposium on Geo-informatics has proposed as presented above. Restrictions 
for Spatial Infrastructure Development in Earth on computational resources of a plug-in does 
and Allied Sciences (GIS-IDEAS)" [12]. not allow to implement the distributed 
 algorithms or to process large data sets. Hence, 
 N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38 37 
only some appropriate algorithms are included clustering to process large data sets in order to 
in the tool, namely: FCM, NE, FGWC, apply for geographical data clustering. The 
CFGWC, IPFGWC, MIPFGWC. The plug-in results contribute to better address real-world 
supports direct access of spatial database for problems we meet in many application areas. 
reading attribute values and displaying the The distributed fuzzy clustering algorithm 
resulting clusters in different colors on the map. to handle large data sets using picture fuzzy sets 
 Input: data file format is *.csv (coma called DPFCM has improved overall clustering 
separated values). All the GIS software have to quality in comparison with the algorithm of 
support importing and exporting data in the Chen and colleagues [17]. Clustering quality of 
*.shp format of one map layer to the *.csv DPFCM is better than some clustering 
format. algorithms of the same type, but the 
 computational time does not add much. The 
 new weather nowcasting method PFC-STAR 
 using picture fuzzy sets instead of classical 
 fuzzy sets has allowed raising the quality of 
 predictions in comparison with the method of 
 Shukla et al [14], especially in predicting rain-
 rate. We can conclude that the use of picture 
 fuzzy clustering actually had a positive impact 
 on the quality of the clustering results for the 
 problems related to the inherently fuzzy 
 concepts. 
 The software tool for data clustering 
 integrated into MapWindow as a plug-in that 
 performs typical fuzzy clustering algorithms 
 Picture 1. Dialog box for choosing input 
 and the improvements proposed in our 
 data and algorithm. 
 researches will help to promote practical 
 Output: there are two types: applications of geographic data mining in 
 1. Output as text file (*.txt or plain text) to various domains. 
provide enough detail for the purposes of 
analysis and evaluation of algorithms or for the Acknowledgements 
subsequent treatment, if any. 
 2. Displaying visually on the map: in The authors would like to thank the 
parallel with printing the results to a text file, colleagues for comments through discussions in 
the tool allows updated cluster labels directly to the scientific seminars which help to correct the 
the cluster column of database beneath and by errors and to complete the results achieved. We 
setting GIS functionalities users can show also express our sincere thanks to VNU Hanoi 
visualization of clusters on maps. For this for funding the research project under the code 
purpose, the properties table of map layer must name QG.14.60 and for other supports to 
have the last column named CLUSTER. conduct the research. 
5. Summary and conclusions References 
 The research we carried out in the research [1] Atanassov, K. T. (1986). Intuitionistic fuzzy sets. 
project has contributed to improve fuzzy Fuzzy Sets and Systems, 20, 87-96. 
clustering algorithms, distributed fuzzy 
38 N.D. Hoa et al. / VNU Journal of Science: Comp. Science & Com. Eng., Vol. 32, No. 3 (2016) 32-38 
 [2] Bezdek, J.C., R. Ehrlich, et al (1984), FCM: the [11] Neethu C V, Subu Surendran, Review of Spatial 
 fuzzy c-means clustering algorithm, Computers Clustering Methods, International Journal of 
 and Geosciences, 10, pp.191-203 Information Technology Infrastructure, Volume 
 [3] Brinkoff, T., Kriegel, H.-P. (1994), The Impact 2, No.3, May - June_2013. 
 of Global Clustering on Spatial Database [12] Nguyen Dinh Hoa, Pham Huy Thong, Le Hoang 
 Systems, Proceedings of the 2th VLDB Son, “Weather Nowcasting from Satellite Image 
 Conference, Santiago, Chile, pp. 168-179. Sequences Using Picture Fuzzy Clustering and 
 [4] Bui Cong Cuong, Vladik Kreinovich, Picture Spatial-temporal Regression”, International 
 Fuzzy Sets - a new concept for computational Symposium on Geoinformatics for Spatial 
 intelligence problems, Proceeding of 2013 Third Infrastructure Development in Earth_and Allied 
 World Congress on Information and Sciences (GIS-IDEAS), Danang, Vietnam, 
 Communication Technologies (WICT 2013),_1-6. December, 7th-9th , 2014, pp. 137-142 
 [5] Deepti Joshi, Polygonal Spatial Clustering, [13] M. Perumal, B. Velumani, A. Sadhasivam, and 
 Ph.D. Dissertation, University of K. Ramaswamy, (2015), Spatial Data Mining 
 Nebraska,_2011. Approches for GIS - A Brief Review, Conference 
 [6] Huang, H. C., Chuang, Y. Y., & Chen, C. S. paper, January 2015, © Springer International 
 (2012), Multiple kernel fuzzy clustering, Publishing Switzerland. 
 IEEE_Transactions on Fuzzy Systems, 20(1), [14] Shukla, B. P., Kishtawal, C. M., & Pal, P. K. 
 120-134. (2014),Prediction of Satellite Image Sequence 
 [7] Le Hoang Son, Bui Cong Cuong, Pier Luca Lanzi, for Weather Nowcasting Using Cluster-Based 
 Hoang Anh Hung (2011) Data Mining in GIS: A Spatiotemporal Regression, IEEE Transactions 
 Novel Context-Based Fuzzy Geographically on Geoscience and Remote Sensing, 52(7), 
 Weighted Clustering Algorithm. International 4155 - 4160. 
 Journal of Machine Learning and Computing. [15] Thong, P.H., Son, L.H. (2014). A new approach 
 [8] Le Hoang Son (2011), Nguyen Dinh Hoa, Pier to multi-variables fuzzy forecasting using picture 
 Luca Lanzi, and Bui Thi Huong Lan, A fuzzy clustering and picture fuzzy rules 
 Combination of Clustering Techniques and interpolation method, Proceeding of 6th 
 Fuzzy Control in 2D Polygon Determination for International Conference on Knowledge and 
 the Terrain Splitting and Mapping Problem, Systems Engineering (KSE 2014), October 9-11, 
 International Journal of Computer and Electrical 2014, Hanoi, Vietnam, 679 - 690. 
 Engineering 3(5), pp. 682 – 689. [16] Visalakshi, N. K., Thangavel, K., & Parvathi, R. 
 [9] Le Hoang Son, Bui Cong Cuong, Pier Luca (2010). An intuitionistic fuzzy approach to 
 Lanzi, Nguyen Tho Thong (2012), A Novel distributed fuzzy clustering, International Journal 
 Intuitionistic Fuzzy Clustering Method for Geo- of Computer Theory and Engineering, 2 (2), 
 Demographic Analysis, Expert Systems with 1793–8201. 
 Applications. [17] Zhou, J., Chen, C., Chen, L., & Li, H. (2013). A 
 [10] Le Hoang Son (2015), “DPFCM: A novel collaborative fuzzy clustering algorithm in 
 distributed picture fuzzy clustering method on distributed network environments, IEEE 
 picture fuzzy sets”, Expert Systems with Transactions on Fuzzy Systems. 
 Applications, 42 (2015) pp. 51-66.  

File đính kèm:

  • pdfsome_improvements_of_fuzzy_clustering_algorithms_using_pictu.pdf