Privacy preserving spatio-temporal databases based on k-anonymity
ABSTRACT
The development of location-based services and mobile devices has led to an increase in the location data. Through the data mining process, some valuable information can be discovered from
location data. In the other words, an attacker may also extract some private (sensitive) information
of the user through the data mining process and this may make threats against the user privacy. For
example, the attacker can mine user's location data for deciding the home address of the user. Thus,
location privacy protection becomes an important requirement to the success in the development
of location-based services. In this paper, we propose a grid-based approach as well as an algorithm
to guarantee k-anonymity, a well-known privacy protection approach, in a location database. To do
this, we assume that the service server will provide services but in a defined area and the grid will
cover the area in which the service server takes effect, Then, the user's location will be hidden in an
anonymization area. The anonymization area will be chosen by cells that forms a rectangle area so
that this area contains at least k distinct users. Moreover, in practice, the location of a user usually
accompanies with a temporal data. And, indeed, the information about the combination of spatial
and temporal data may also disclose some other sensitive information of the user. Thus, the paper
also proposes an approach for guaranteeing k-anonymity for the combination of spatial and temporal database. The proposed approach considers only the information that has significance for
the data mining process while ignoring the un-related information. Finally, the experiment results
show the effectiveness of the proposed approach in comparison with the literature ones
Trang 1
Trang 2
Trang 3
Trang 4
Trang 5
Trang 6
Trang 7
Trang 8
Trang 9
Trang 10
Tải về để xem bản đầy đủ
Tóm tắt nội dung tài liệu: Privacy preserving spatio-temporal databases based on k-anonymity
_anonymization_area which has the number of distinct users in this area is maximum. We call this area as maximal_anonymization_area. Finally, we will anonymize all the location data and time data of tuples, which belong to this max- imal_anonymization_area, to corresponding value, namely maximal_anonymization_area for location data and an interval for time data. Approaches for anonymizing the location data and time data are dis- cussed before. Name: k-anonymization Algorithm() Input: k, threshold tx, threshold ty, threshold t_time, spatio-temporal table T Output: k-anonymization location table T’ Method: Create a gird G which covers the space where the server provides services. Anonymize all location data of tuples in T to grid cell. X = Ø While (exist a tuple which has not been marked) { For each tuple inT and this tupe has not beenmarked { tuple_maximal_anonymization_area = find_safe_max_anoymization(); X = X U tuple_maximal_anonymization_area; } Maximal_anonymization_area = choose the tu- ple_maximal_anonymization_area which has the SI90 Science & Technology Development Journal – Engineering and Technology, 3(SI1):SI82-SI94 number of distinct users in this area is maximum from X. If (the number of distinct users in this maximal anonymization area < k) { Anonymize all location and time data of tuples in this area to closet anonymization area which is safe. Mark the corresponding tuple in the table T. } Else { Anonymize location attribute of users, which belong to this maximal anonymization area, to this area and generalize all time data of these tuples to an interval. Mark the corresponding tuple in the table T. } } Return anonymized table. When the number of distinct users in the maximal anonymization area is smaller than k. wewill consider this area is not significant to the data mining process. Therefore, we will anonymize all location and time data of tuples in this area to “closet” safe anonymiza- tion area. We discussed this idea in section 3.2. The find_safe_max_anoymization() function will choose the safe area for the time and location data of each tuple that the number of distinct users in this anonymization area is maximum. At the first step, this function will find all safe areas according to the time and location value which is input parameter. Among them, it will choose the safe area that the number of distinct users in this anonymization area is biggest. Name: find_safe_max_anoymization() Input: a tuple t contains time and location data, threshold tx, threshold ty, threshold t_time, spatio- temporal table T, Grid G Output: tuple_maximal_anonymization_area for tu- ple t and a set O which contain tuples belong to tu- ple_maximal_anonymization_area Method: Arrange T in order to tupes t1, t2, t3 in T will sat- isfy t1.time < t2.time < t3.time Give set Y = all anonymization area which contain t.location and satisfy both thresholds tx and ty Variable count_max = 0 tuple_maximal_anonymization_area = null; For each y in Y { Variable count_y = 0 R = all tuple i in T and i.location belongs to y For each t’ in R and index of t’ in R <= the index of t { if time-distance(t’.time, t.time) <= t_time { tuple tp For each tuple tr in R and index of tp in R >= the index of t { tp = tr; if time-distance(tp.time, t’.time) > t_time Exit For } if index_of_tp_in_R – index_of_t’_in_R – 1 > count_y { count_y = index_of_tp_in_R – index_of_t’_in_R – 1 Give set O = all tuples in R from (index_of_t’_in_R) to (index_of_tp_in_R – 1) } } If (count_y > count_max) { count_max = count_y tuple_maximal_anonymization_area = y Remember set O } } Return tuple_maximal_anonymization_area, O This function will return the tu- ple_maximal_anonymization_area and a set O contains tuples which satisfy location and time constraints. tuple_maximal_anonymization_area always accompanies with a set O. Therefore, when this area is chosen as maximal_anonymization_area, all tuples in O will be anonymized. EXPERIMENTS We show the experiment for the evaluation of the effectiveness of proposed approach. With our tests, the data mining process wants to find the time inter- val, when users use the service more frequently. The data mining process will work with the original table and k-anonymous table version, which generated by our algorithm. We will compare these two results by getting the overlapped interval between two results. Clearly, the proposed approach is effectiveness if this overlapped interval is large. We will use a ratio to de- scribe this effectiveness: Rtime = overlapped_t original_result_t original_result_t is the result which the data mining process works with original table. overlapped_t is the overlapped interval between the results of original ta- ble and our k-anonymous version. As discussed, the larger the ratio, the more effective the approach. SI91 Science & Technology Development Journal – Engineering and Technology, 3(SI1):SI82-SI94 We will evaluate the approach with a spatio-temporal table with more than 2000 records. The number of distinct users is more than 50. The grid cell size and k are changed in each test case. In each case, we will change the maximum area and maximum time in- terval which data miner desires to be processed from data mining process. The maximum area is max area column and the maximum time interval ismax inter- val column in the Table 8. In Table 8, k will be 5, 10, and 20 for each test. The ratio value is the average of three tests when k is 5, 10 and 20. The result shows that in most case the ratio is larger than 80%. It also means that our approach will generate a k-anonymous version of the original table in which the data mining process can find significant information as when working in original table. CONCLUSIONS ANDDISCUSIONS In this paper, we propose an technique for anonymiz- ing the spatio-temporal database. With this tech- nique, we can anonymize the location and time data easily. We also consider the data mining process re- sult to develop an algorithm, which tradeoffs between data privacy and data quality. In the future, we will focus on improving the algo- rithm in order to guarantee k-anonymity in a big spatio-temporal database more efficiency. ACKNOWLEDGMENT This research is funded by Ho Chi Minh City Uni- versity of Technology, Vietnam National University HoChiMinh City under grant number T-KHMT- 2018-90. CONFLICT OF INTEREST We claim that there is no conflict of interest in this article. AUTHOR CONTRIBUTION Anh Truong is the only author of this article. REFERENCES 1. Ciriani V, Vimercati SDC, Foresti S, Samarati P. k-Anonymous Data Mining: A Survey. Handbook of Database Security - Ap- plications and Trends ISBN 978-0-387-70991-8, Springer Sci- ence and Business Media, LLC. 2008;p. 105–136. Available from: https://doi.org/10.1007/978-0-387-70992-5_5. 2. Samarati P, Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through gen- eralization and suppression. Technical Report SRI-CSL-98-04, Computer Science Laboratory, SRI International. 1998;. 3. Dang TK, Truong AT. Anonymizing but Deteriorating Loca- tion Databases. In the International Research Journal of Com- puter Science and Computer Engineering with Applications, IPN Mexico. 2012;(46):73–81. Available from: https://doi.org/ 10.17562/PB-46-9. 4. Bettini C, Mascetti S, Wang XS. Privacy Protection through Anonymity in Location-based Services. Michael, G, Sushil, J (eds), Handbook of Database Security - Applications and Trends Springer. 2008;p. 509–530. Available from: https://doi. org/10.1007/978-0-387-48533-1_21. 5. Cuellar JR. Location Information Privacy. B. Srikaya (Ed.). Ge- ographic Location in the Internet Kluwer Academic Publish- ers. 2002;p. 179–208. Available from: https://doi.org/10.1007/ 0-306-47573-1_8. 6. Gedik B, Liu L. Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms. IEEE Transactions on Mobile Computing. 2008;7(1):1–18. Available from: https: //doi.org/10.1109/TMC.2007.1062. 7. Bugra G, Ling L. Protecting Location Privacywith Personalized k-Anonymity: Architecture and Algorithms. IEEE Transaction on mobile computing. 2008;. 8. Gidófalvi G, Huang X, Pedersen TB. Privacy-Preserving Data Mining onMoving Object Trajectories. 8th International Con- ference on Mobile Data Management. 2007;Available from: https://doi.org/10.1109/MDM.2007.18. 9. Vinh C, Truong AT, Tran T. A Privacy Preserving Authentication Scheme in the Intelligent Transportation Systems. 5th Inter- national ConferenceonFutureData andSecurity Engineering, Ho Chi Minh - Viet Nam. 2018;. 10. Tran T, Truong AT, Vinh C. An Authentication Scheme to Pre- serveUserś Privacy in Intelligent Transportation Systems, SEA- TUC, Yogyakarta - Indonesia. 2018;. 11. Truong TA, Truong QC, Dang TK. An Adaptive Grid-based Ap- proach to Location Privacy Preservation. Proc of 2nd Asian Conference on Intelligent Information and Database Systems (ACIIDS 2010), Hue City, Vietnam. 2010;. 12. Truong QC, Truong TA, Dang TK. Privacy Preserving through A Memorizing Algorithm in Location-Based Ser- vices. Proc of the 7th International Conference on Ad- vances in Mobile Computing and Multimedia (MoMM2009), Kuala Lumpur, Malaysia. 2009;Available from: https://doi.org/ 10.1145/1821748.1821780. 13. Beresford AR, Stajano F. Mix zones: User privacy in location- aware services. 2nd IEEE Annual Conference on Pervasive Computing and Communications Workshops. 2004;. 14. Bettini C, Wang X, Jajodia S. Protecting privacy against location-based personal identification. 2nd VLDB Workshop on Secure Data Management. 2005;Available from: https:// doi.org/10.1007/11552338_13. 15. Sweeney L. Achieving k-anonymity privacy protection us- ing generalization and suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems. 2002;10(5):571–588. Available from: https://doi.org/10.1142/ S021848850200165X. 16. Phan TN, Dang TK, DangAT, LamTH. AContext-aware Privacy- preserving Solution for Location-based Services. 2018 Inter- national Conference on Advanced Computing and Applica- tions, Hochiminh - Viet Nam. 2018;PMID: 30134661. Available from: https://doi.org/10.1109/ACOMP.2018.00028. 17. Truong TA, Dang TK, Kueng J. On Guaranteeing k-Anonymity in Location Databases. In Proc of the 22nd International Conference on Database and Expert Systems Applications (DEXA’11), pages 280-287, LNCS, Springer. 2011;Available from: https://doi.org/10.1007/978-3-642-23088-2_20. 18. Myles G, Friday A, Davies N. Preserving Privacy in Environ- mentswith Location-Based Applications. IEEE Pervasive Com- puting. 2003;p. 56–64. Available from: https://doi.org/10.1109/ MPRV.2003.1186726. 19. Ardagna CA, Cremonini M, Vimercati SDC, Samarati P. Privacy- enhanced Location-based Access Control. Michael, G, Sushil, J (eds), Handbook of Database Security - Applications and Trends Springer. 2008;p. 531–552. Available from: https://doi. org/10.1007/978-0-387-48533-1_22. 20. Marco G, Xuan L. Protecting Privacy in Continuous Location - Tracking Applications. IEEE Computer Society. 2004;. 21. PanosK,Gabriel G, KyriakosM,Dimitris P. Preventing Location- Based Identity Inference in Anonymous Spatial Queries. IEEE Transactions on Knowledge and Data Engineering. 2007;. SI92 Science & Technology Development Journal – Engineering and Technology, 3(SI1):SI82-SI94 Table 8: Results of experiment tx*ty (m*m) t_time (days) k Cell size (m) max area (m*m) max interval (days) Ratio Rtime (%) 90*90 30 5,10,20 30 200*200 120 88.97 90*90 30 5,10,20 30 400*400 150 87.53 90*90 60 5,10,20 30 200*200 120 88.31 90*90 60 5,10,20 30 400*400 150 86.42 150*150 30 5,10,20 50 300*300 120 84.62 150*150 30 5,10,20 50 500*500 150 85.02 150*150 60 5,10,20 50 300*300 120 84.19 150*150 60 5,10,20 50 500*500 150 83.97 300*300 30 5,10,20 100 500*500 120 81.74 300*300 30 5,10,20 100 800*800 150 80.36 300*300 60 5,10,20 100 500*500 120 79.92 300*300 60 5,10,20 100 800*800 150 80.09 22. Mascetti S, Bettini C, Wang XS, Jajodia S. k-anonymity in databases with timestamped data. Proc of 13th Interna- tional Symposium on Temporal Representation and Reason- ing, IEEE Computer Society. 2006;Available from: https://doi. org/10.1109/TIME.2006.20. SI93 Tạp chí Phát triển Khoa học và Công nghệ – Kĩ thuật và Công nghệ, 3(SI1):SI82-SI94 Open Access Full Text Article Bài Nghiên cứu Khoa Khoa học và Kỹ thuật Máy tính, trường Đại học Bách Khoa- ĐHQG-HCM, Việt Nam Liên hệ Trương Tuấn Anh, Khoa Khoa học và Kỹ thuật Máy tính, trường Đại học Bách Khoa- ĐHQG-HCM, Việt Nam Email: anhtt@hcmut.edu.vn Lịch sử Ngày nhận: 29-7-2019 Ngày chấp nhận: 25-8-2019 Ngày đăng: 04-12-2020 DOI : 10.32508/stdjet.v3iSI1.517 Bản quyền © ĐHQG Tp.HCM. Đây là bài báo công bố mở được phát hành theo các điều khoản của the Creative Commons Attribution 4.0 International license. Bảo vệ tính riêng tư cơ sở dữ liệu không-thời gian dựa trên k-anonymity Trương Tuấn Anh* Use your smartphone to scan this QR code and download this article TÓM TẮT Sự phát triển của các dịch vụ dựa trên vị trí và các thiết bị di động đã dẫn đến việc sinh ra các dữ liệu vj trí. Thông qua quá trình khai phá dữ liệu, các thông tin có ích sẽ được khai thác từ dữ liệu vị trí này. Điều này cũng đồng nghĩa với việc kẻ tấn công có thể lợi dụng để rút trích các thông tin riêng tư của người sử dụng từ các dữ liệu này. Ví dụ, kẻ tấn công có thể xem thông tin vị trí của người dùng để xác định địa chỉ nhà của họ. Bởi vậy, việc bảo vệ thông tin vị trí trở thành một yêu cầu quan trọng. Trong bài báo này, chúng tôi giới thiệu hướng tiếp cận dùng lưới tương thích cũng nhưmột giải thuật để đảm ảo k-anonymity cho các cơ sở dữ liệu vị trí. Để làmđiều này, chúng tôi giả thiết rằng các dịch vụ vị trí sẽ cung cấp dịch vụ trong một vùng không gian định trước và một lưới tương thích sẽ được tạo ra trong vùng này. Sau đó, vị trí của người sử dụng sẽ được ẩn danh trong một vùng ẩn danh. Các vùng ẩn danh này sẽ được lựa chọn theo nguyên tắc là có ít nhất k người sử dụng trong vùng ẩn danh. Chúng tôi cũng đề xuất hướng tiếp cận để đảm bảo k-anonymity cho dữ liệu kết hợp cả không và thời gian. Hướng tiếp cận được dề xuất sẽ chỉ xem xét các thông tin có ý nghĩa cho quá trình khai phá dữ liệu trong khi bỏ qua các thông tin không liên quan khác. Cuối cùng, các kết quả thực nghiệm chỉ ra sự hiệu quả của giải pháp đề xuất khi so sánh với các giải pháp khác. Từ khoá: Tính riêng tư vị trí, Bảo vệ tính riêng tư, khai phá dữ liệu, k-anonymity, cơ sở dữ liệu không-thời gian. Trích dẫn bài báo này: Anh T T. Bảo vệ tính riêng tư cơ sở dữ liệu không-thời gian dựa trên k- anonymity. Sci. Tech. Dev. J. - Eng. Tech.; 3(SI1):SI82-SI94. SI94
File đính kèm:
- privacy_preserving_spatio_temporal_databases_based_on_k_anon.pdf