Privacy preserving spatio-temporal databases based on k-anonymity

ABSTRACT

The development of location-based services and mobile devices has led to an increase in the location data. Through the data mining process, some valuable information can be discovered from

location data. In the other words, an attacker may also extract some private (sensitive) information

of the user through the data mining process and this may make threats against the user privacy. For

example, the attacker can mine user's location data for deciding the home address of the user. Thus,

location privacy protection becomes an important requirement to the success in the development

of location-based services. In this paper, we propose a grid-based approach as well as an algorithm

to guarantee k-anonymity, a well-known privacy protection approach, in a location database. To do

this, we assume that the service server will provide services but in a defined area and the grid will

cover the area in which the service server takes effect, Then, the user's location will be hidden in an

anonymization area. The anonymization area will be chosen by cells that forms a rectangle area so

that this area contains at least k distinct users. Moreover, in practice, the location of a user usually

accompanies with a temporal data. And, indeed, the information about the combination of spatial

and temporal data may also disclose some other sensitive information of the user. Thus, the paper

also proposes an approach for guaranteeing k-anonymity for the combination of spatial and temporal database. The proposed approach considers only the information that has significance for

the data mining process while ignoring the un-related information. Finally, the experiment results

show the effectiveness of the proposed approach in comparison with the literature ones

Download

Trang 1

Trang 2

Trang 3

Trang 4

Trang 5

Trang 6

Trang 7

Trang 8

Trang 9

Trang 10

Tải về để xem bản đầy đủ

13 trang xuanhieu 6300

Download

Bạn đang xem 10 trang mẫu của tài liệu "Privacy preserving spatio-temporal databases based on k-anonymity", để tải tài liệu gốc về máy hãy click vào nút Download ở trên

Tóm tắt nội dung tài liệu: Privacy preserving spatio-temporal databases based on k-anonymity

_anonymization_area which has the
number of distinct users in this area is maximum.
We call this area as maximal_anonymization_area.
Finally, we will anonymize all the location data
and time data of tuples, which belong to this max-
imal_anonymization_area, to corresponding value,
namely maximal_anonymization_area for location
data and an interval for time data. Approaches for
anonymizing the location data and time data are dis-
cussed before.
Name: k-anonymization Algorithm()
Input: k, threshold tx, threshold ty, threshold t_time,
spatio-temporal table T
Output: k-anonymization location table T’
Method:
Create a gird G which covers the space where the
server provides services.
Anonymize all location data of tuples in T to grid cell.
X = Ø
While (exist a tuple which has not been marked)
{
For each tuple inT and this tupe has not beenmarked
{
tuple_maximal_anonymization_area =
find_safe_max_anoymization();
X = X U tuple_maximal_anonymization_area;
}
Maximal_anonymization_area = choose the tu-
ple_maximal_anonymization_area which has the
SI90
Science & Technology Development Journal – Engineering and Technology, 3(SI1):SI82-SI94
number of distinct users in this area is maximum
from X.
If (the number of distinct users in this maximal
anonymization area < k)
{
Anonymize all location and time data of tuples in this
area to closet anonymization area which is safe.
Mark the corresponding tuple in the table T.
}
Else
{
Anonymize location attribute of users, which belong
to this maximal anonymization area, to this area and
generalize all time data of these tuples to an interval.
Mark the corresponding tuple in the table T.
}
}
Return anonymized table.
When the number of distinct users in the maximal
anonymization area is smaller than k. wewill consider
this area is not significant to the data mining process.
Therefore, we will anonymize all location and time
data of tuples in this area to “closet” safe anonymiza-
tion area. We discussed this idea in section 3.2.
The find_safe_max_anoymization() function will
choose the safe area for the time and location data of
each tuple that the number of distinct users in this
anonymization area is maximum. At the first step,
this function will find all safe areas according to the
time and location value which is input parameter.
Among them, it will choose the safe area that the
number of distinct users in this anonymization area
is biggest.
Name: find_safe_max_anoymization()
Input: a tuple t contains time and location data,
threshold tx, threshold ty, threshold t_time, spatio-
temporal table T, Grid G
Output: tuple_maximal_anonymization_area for tu-
ple t and a set O which contain tuples belong to tu-
ple_maximal_anonymization_area
Method:
Arrange T in order to tupes t1, t2, t3 in T will sat-
isfy t1.time < t2.time < t3.time
Give set Y = all anonymization area which contain
t.location and satisfy both thresholds tx and ty
Variable count_max = 0
tuple_maximal_anonymization_area = null;
For each y in Y
{
Variable count_y = 0
R = all tuple i in T and i.location belongs to y
For each t’ in R and index of t’ in R <= the index of t
{
if time-distance(t’.time, t.time) <= t_time
{
tuple tp
For each tuple tr in R and index of tp in R >= the
index of t
{
tp = tr;
if time-distance(tp.time, t’.time) > t_time
Exit For
}
if index_of_tp_in_R – index_of_t’_in_R – 1 > count_y
{
count_y = index_of_tp_in_R – index_of_t’_in_R – 1
Give set O = all tuples in R from (index_of_t’_in_R)
to (index_of_tp_in_R – 1)
}
}
If (count_y > count_max)
{
count_max = count_y
tuple_maximal_anonymization_area = y
Remember set O
}
}
Return tuple_maximal_anonymization_area, O
This function will return the tu-
ple_maximal_anonymization_area and a set O
contains tuples which satisfy location and time
constraints. tuple_maximal_anonymization_area
always accompanies with a set O. Therefore, when
this area is chosen as maximal_anonymization_area,
all tuples in O will be anonymized.
EXPERIMENTS
We show the experiment for the evaluation of the
effectiveness of proposed approach. With our tests,
the data mining process wants to find the time inter-
val, when users use the service more frequently. The
data mining process will work with the original table
and k-anonymous table version, which generated by
our algorithm. We will compare these two results by
getting the overlapped interval between two results.
Clearly, the proposed approach is effectiveness if this
overlapped interval is large. We will use a ratio to de-
scribe this effectiveness:
Rtime =
overlapped_t
original_result_t
original_result_t is the result which the data mining
process works with original table. overlapped_t is the
overlapped interval between the results of original ta-
ble and our k-anonymous version. As discussed, the
larger the ratio, the more effective the approach.
SI91
Science & Technology Development Journal – Engineering and Technology, 3(SI1):SI82-SI94
We will evaluate the approach with a spatio-temporal
table with more than 2000 records. The number of
distinct users is more than 50. The grid cell size and k
are changed in each test case. In each case, we will
change the maximum area and maximum time in-
terval which data miner desires to be processed from
data mining process. The maximum area is max area
column and the maximum time interval ismax inter-
val column in the Table 8.
In Table 8, k will be 5, 10, and 20 for each test. The
ratio value is the average of three tests when k is 5, 10
and 20. The result shows that in most case the ratio is
larger than 80%. It also means that our approach will
generate a k-anonymous version of the original table
in which the data mining process can find significant
information as when working in original table.
CONCLUSIONS ANDDISCUSIONS
In this paper, we propose an technique for anonymiz-
ing the spatio-temporal database. With this tech-
nique, we can anonymize the location and time data
easily. We also consider the data mining process re-
sult to develop an algorithm, which tradeoffs between
data privacy and data quality.
In the future, we will focus on improving the algo-
rithm in order to guarantee k-anonymity in a big
spatio-temporal database more efficiency.
ACKNOWLEDGMENT
This research is funded by Ho Chi Minh City Uni-
versity of Technology, Vietnam National University
HoChiMinh City under grant number T-KHMT-
2018-90.
CONFLICT OF INTEREST
We claim that there is no conflict of interest in this
article.
AUTHOR CONTRIBUTION
Anh Truong is the only author of this article.
REFERENCES
1. Ciriani V, Vimercati SDC, Foresti S, Samarati P. k-Anonymous
Data Mining: A Survey. Handbook of Database Security - Ap-
plications and Trends ISBN 978-0-387-70991-8, Springer Sci-
ence and Business Media, LLC. 2008;p. 105–136. Available
from: https://doi.org/10.1007/978-0-387-70992-5_5.
2. Samarati P, Sweeney L. Protecting privacy when disclosing
information: k-anonymity and its enforcement through gen-
eralization and suppression. Technical Report SRI-CSL-98-04,
Computer Science Laboratory, SRI International. 1998;.
3. Dang TK, Truong AT. Anonymizing but Deteriorating Loca-
tion Databases. In the International Research Journal of Com-
puter Science and Computer Engineering with Applications,
IPN Mexico. 2012;(46):73–81. Available from: https://doi.org/
10.17562/PB-46-9.
4. Bettini C, Mascetti S, Wang XS. Privacy Protection through
Anonymity in Location-based Services. Michael, G, Sushil,
J (eds), Handbook of Database Security - Applications and
Trends Springer. 2008;p. 509–530. Available from: https://doi.
org/10.1007/978-0-387-48533-1_21.
5. Cuellar JR. Location Information Privacy. B. Srikaya (Ed.). Ge-
ographic Location in the Internet Kluwer Academic Publish-
ers. 2002;p. 179–208. Available from: https://doi.org/10.1007/
0-306-47573-1_8.
6. Gedik B, Liu L. Protecting Location Privacy with Personalized
k-Anonymity: Architecture and Algorithms. IEEE Transactions
on Mobile Computing. 2008;7(1):1–18. Available from: https:
//doi.org/10.1109/TMC.2007.1062.
7. Bugra G, Ling L. Protecting Location Privacywith Personalized
k-Anonymity: Architecture and Algorithms. IEEE Transaction
on mobile computing. 2008;.
8. Gidófalvi G, Huang X, Pedersen TB. Privacy-Preserving Data
Mining onMoving Object Trajectories. 8th International Con-
ference on Mobile Data Management. 2007;Available from:
https://doi.org/10.1109/MDM.2007.18.
9. Vinh C, Truong AT, Tran T. A Privacy Preserving Authentication
Scheme in the Intelligent Transportation Systems. 5th Inter-
national ConferenceonFutureData andSecurity Engineering,
Ho Chi Minh - Viet Nam. 2018;.
10. Tran T, Truong AT, Vinh C. An Authentication Scheme to Pre-
serveUserś Privacy in Intelligent Transportation Systems, SEA-
TUC, Yogyakarta - Indonesia. 2018;.
11. Truong TA, Truong QC, Dang TK. An Adaptive Grid-based Ap-
proach to Location Privacy Preservation. Proc of 2nd Asian
Conference on Intelligent Information and Database Systems
(ACIIDS 2010), Hue City, Vietnam. 2010;.
12. Truong QC, Truong TA, Dang TK. Privacy Preserving
through A Memorizing Algorithm in Location-Based Ser-
vices. Proc of the 7th International Conference on Ad-
vances in Mobile Computing and Multimedia (MoMM2009),
Kuala Lumpur, Malaysia. 2009;Available from: https://doi.org/
10.1145/1821748.1821780.
13. Beresford AR, Stajano F. Mix zones: User privacy in location-
aware services. 2nd IEEE Annual Conference on Pervasive
Computing and Communications Workshops. 2004;.
14. Bettini C, Wang X, Jajodia S. Protecting privacy against
location-based personal identification. 2nd VLDB Workshop
on Secure Data Management. 2005;Available from: https://
doi.org/10.1007/11552338_13.
15. Sweeney L. Achieving k-anonymity privacy protection us-
ing generalization and suppression. International Journal
on Uncertainty, Fuzziness and Knowledge-based Systems.
2002;10(5):571–588. Available from: https://doi.org/10.1142/
S021848850200165X.
16. Phan TN, Dang TK, DangAT, LamTH. AContext-aware Privacy-
preserving Solution for Location-based Services. 2018 Inter-
national Conference on Advanced Computing and Applica-
tions, Hochiminh - Viet Nam. 2018;PMID: 30134661. Available
from: https://doi.org/10.1109/ACOMP.2018.00028.
17. Truong TA, Dang TK, Kueng J. On Guaranteeing k-Anonymity
in Location Databases. In Proc of the 22nd International
Conference on Database and Expert Systems Applications
(DEXA’11), pages 280-287, LNCS, Springer. 2011;Available
from: https://doi.org/10.1007/978-3-642-23088-2_20.
18. Myles G, Friday A, Davies N. Preserving Privacy in Environ-
mentswith Location-Based Applications. IEEE Pervasive Com-
puting. 2003;p. 56–64. Available from: https://doi.org/10.1109/
MPRV.2003.1186726.
19. Ardagna CA, Cremonini M, Vimercati SDC, Samarati P. Privacy-
enhanced Location-based Access Control. Michael, G, Sushil,
J (eds), Handbook of Database Security - Applications and
Trends Springer. 2008;p. 531–552. Available from: https://doi.
org/10.1007/978-0-387-48533-1_22.
20. Marco G, Xuan L. Protecting Privacy in Continuous Location -
Tracking Applications. IEEE Computer Society. 2004;.
21. PanosK,Gabriel G, KyriakosM,Dimitris P. Preventing Location-
Based Identity Inference in Anonymous Spatial Queries. IEEE
Transactions on Knowledge and Data Engineering. 2007;.
SI92
Science & Technology Development Journal – Engineering and Technology, 3(SI1):SI82-SI94
Table 8: Results of experiment
tx*ty
(m*m)
t_time (days) k Cell size (m) max area
(m*m)
max interval
(days)
Ratio Rtime (%)
90*90 30 5,10,20 30 200*200 120 88.97
90*90 30 5,10,20 30 400*400 150 87.53
90*90 60 5,10,20 30 200*200 120 88.31
90*90 60 5,10,20 30 400*400 150 86.42
150*150 30 5,10,20 50 300*300 120 84.62
150*150 30 5,10,20 50 500*500 150 85.02
150*150 60 5,10,20 50 300*300 120 84.19
150*150 60 5,10,20 50 500*500 150 83.97
300*300 30 5,10,20 100 500*500 120 81.74
300*300 30 5,10,20 100 800*800 150 80.36
300*300 60 5,10,20 100 500*500 120 79.92
300*300 60 5,10,20 100 800*800 150 80.09
22. Mascetti S, Bettini C, Wang XS, Jajodia S. k-anonymity in
databases with timestamped data. Proc of 13th Interna-
tional Symposium on Temporal Representation and Reason-
ing, IEEE Computer Society. 2006;Available from: https://doi.
org/10.1109/TIME.2006.20.
SI93
Tạp chí Phát triển Khoa học và Công nghệ – Kĩ thuật và Công nghệ, 3(SI1):SI82-SI94
Open Access Full Text Article Bài Nghiên cứu
Khoa Khoa học và Kỹ thuật Máy tính,
trường Đại học Bách Khoa-
ĐHQG-HCM, Việt Nam
Liên hệ
Trương Tuấn Anh, Khoa Khoa học và Kỹ
thuật Máy tính, trường Đại học Bách Khoa-
ĐHQG-HCM, Việt Nam
Email: anhtt@hcmut.edu.vn
Lịch sử
Ngày nhận: 29-7-2019
Ngày chấp nhận: 25-8-2019
Ngày đăng: 04-12-2020
DOI : 10.32508/stdjet.v3iSI1.517
Bản quyền
© ĐHQG Tp.HCM. Đây là bài báo công bố
mở được phát hành theo các điều khoản của
the Creative Commons Attribution 4.0
International license.
Bảo vệ tính riêng tư cơ sở dữ liệu không-thời gian dựa trên
k-anonymity
Trương Tuấn Anh*
Use your smartphone to scan this
QR code and download this article
TÓM TẮT
Sự phát triển của các dịch vụ dựa trên vị trí và các thiết bị di động đã dẫn đến việc sinh ra các dữ
liệu vj trí. Thông qua quá trình khai phá dữ liệu, các thông tin có ích sẽ được khai thác từ dữ liệu
vị trí này. Điều này cũng đồng nghĩa với việc kẻ tấn công có thể lợi dụng để rút trích các thông
tin riêng tư của người sử dụng từ các dữ liệu này. Ví dụ, kẻ tấn công có thể xem thông tin vị trí
của người dùng để xác định địa chỉ nhà của họ. Bởi vậy, việc bảo vệ thông tin vị trí trở thành một
yêu cầu quan trọng. Trong bài báo này, chúng tôi giới thiệu hướng tiếp cận dùng lưới tương thích
cũng nhưmột giải thuật để đảm ảo k-anonymity cho các cơ sở dữ liệu vị trí. Để làmđiều này, chúng
tôi giả thiết rằng các dịch vụ vị trí sẽ cung cấp dịch vụ trong một vùng không gian định trước và
một lưới tương thích sẽ được tạo ra trong vùng này. Sau đó, vị trí của người sử dụng sẽ được ẩn
danh trong một vùng ẩn danh. Các vùng ẩn danh này sẽ được lựa chọn theo nguyên tắc là có ít
nhất k người sử dụng trong vùng ẩn danh. Chúng tôi cũng đề xuất hướng tiếp cận để đảm bảo
k-anonymity cho dữ liệu kết hợp cả không và thời gian. Hướng tiếp cận được dề xuất sẽ chỉ xem
xét các thông tin có ý nghĩa cho quá trình khai phá dữ liệu trong khi bỏ qua các thông tin không
liên quan khác. Cuối cùng, các kết quả thực nghiệm chỉ ra sự hiệu quả của giải pháp đề xuất khi so
sánh với các giải pháp khác.
Từ khoá: Tính riêng tư vị trí, Bảo vệ tính riêng tư, khai phá dữ liệu, k-anonymity, cơ sở dữ liệu
không-thời gian.
Trích dẫn bài báo này: Anh T T. Bảo vệ tính riêng tư cơ sở dữ liệu không-thời gian dựa trên k-
anonymity. Sci. Tech. Dev. J. - Eng. Tech.; 3(SI1):SI82-SI94.
SI94

File đính kèm:

privacy_preserving_spatio_temporal_databases_based_on_k_anon.pdf