Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol

 Trong những năm gần đây, số

lượng sự cố liên quan đến các ứng dụng Web có

xu hướng tăng lên do sự gia tăng số lượng người

dùng thiết bị di động, sự phát triển của Internet

cũng như sự mở rộng của nhiều dịch vụ của nó.

Do đó càng làm tăng khả năng bị tấn công vào

thiết bị di động của người dùng cũng như hệ

thống máy tính. Mã độc thường được sử dụng để

thu thập thông tin về người dùng, dữ liệu cá

nhân nhạy cảm, truy cập vào tài nguyên Web

hoặc phá hoại các tài nguyên này. Mục đích của

nghiên cứu nhằm tăng cường độ chính xác phát

hiện các cuộc tấn công máy tính vào các ứng

dụng Web. Bài báo trình bày một mô hình biểu

diễn các yêu cầu Web, dựa trên mô hình không

gian vectơ và các thuộc tính của các yêu cầu đó

sử dụng giao thức HTTP. So sánh với các nghiên

cứu được thực hiện trước đây cho phép chúng

tôi ước tính độ chính xác phát hiện xấp xỉ 96%

cho các ứng dụng Web khi sử dụng bộ dữ liệu

KDD 99 trong đào tạo cũng như phát hiện tấn

công đi kèm với việc biểu diễn truy vấn dựa trên

không gian vectơ và phân loại dựa trên mô hình

cây quyết định.

Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol trang 1

Trang 1

Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol trang 2

Trang 2

Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol trang 3

Trang 3

Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol trang 4

Trang 4

Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol trang 5

Trang 5

Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol trang 6

Trang 6

Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol trang 7

Trang 7

pdf 7 trang duykhanh 10200
Bạn đang xem tài liệu "Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol", để tải tài liệu gốc về máy hãy click vào nút Download ở trên

Tóm tắt nội dung tài liệu: Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol

Representation model of requests to web resources, based on a vector space model and attributes of requests for HTTP protocol
files used to analyze 
the object belongs to the class of the only nearest network packages, an average of 40.3% in 
neighbor. comparison with the standard module. 
 In [17], the authors used a combined In [21], a comparative analysis of the 
approach – a combination of the genetic capabilities of an artificial neural network and 
algorithm [18] and the k-nearest neighbor the decision trees method for solving problems 
classifier to detect denial of service attacks. of detecting computer attacks is carried out. 
The goal of the genetic algorithm is to find the The researchers came to the conclusions that 
 artificial neural network is effective for 
optimal weight vector, in which  represents 
 i generalization and not suitable for detecting 
the weight of features 1 in. For two new attacks, while decision trees are effective 
vectors features X { x12 , x ,..., xn } and for both tasks. 
 5. Support vector machine 
Y { y12 , y ,..., yn } distance between them 
will be calculated as follows: The initial data in the support vector 
 machine method is a set of elements located 
46 No 2.CS (10) 2019 
 Nghiên cứu Khoa học và Công nghệ trong lĩnh vực An toàn thông tin 
in space. The dimension of space corresponds A. Formation of feature space for our model 
to the number of classifying signs, their value 
 To set the model for presenting requests to 
determining the position of elements (points) 
 Web resources, the author has carried out the 
in space. 
 formation of a corresponding feature space, that 
 The support vector machine method has allowed to evaluate its adequacy from the 
refers to linear classification methods. Two standpoint of solving the problem of detecting 
sets of points belonging to two different computer attacks on Web applications. 
classes are separated by a hyperplane in 
 In fig.2 the main stages of analyzing an 
space. At the same time, the hyperplane is 
 HTTP request received at the Web server input 
constructed in such a way that the distances 
 are demonstrated. We divided the dataset into 
from it to the nearest instances of both 
 two parts: requests with information about 
classes (support vectors) were maximum, 
 attacks and normal requests. In the learning 
which ensures the strict accuracy of 
 process, we will calculate all the necessary 
classification. 
 values such as the expected value and the 
 The support vector machine method allows variance of normal queries, then these values 
[22; 23]: are stored in the database MySQL for the attack 
 • obtaining a classification function with a detection process. The analysis is performed on 
minimum upper estimate of the expected risk the appropriate fields of the protocol to ensure 
(level of classification error); further possibility of its representation in the 
 • using a linear classifier to work with vector space model. It also analyzes and 
nonlinearly shared data. calculates a number of attributes selected by the 
 author. Thus, the proposed query representation 
 III. MODEL FOR PRESENTING model allows moving from the text 
REQUESTS TO WEB RESOURCES, BASED representation to the totality of features of the 
 ON THE VECTOR SPACE MODEL AND vector space model for the corresponding 
 ATTRIBUTES OF REQUESTS VIA HTTP protocol fields and query attributes. 
 The anomaly detection approach is based on The basic steps to form a model for each 
the analysis of HTTP requests processed by query are the following: 
most common Web servers (for example, • Extracting and analyzing data: analysis of 
Apache or nginx) and is intended to be built in all the incoming requests from the Web 
Web Application Firewall (WAF). WAF 
 browser is carried out. 
analyzes all requests coming to the Web server 
 • Transformation into a vector space model: 
and makes decisions about their execution on 
the server (Fig.1). it is used to transform text data into a vector 
 representation using the TF-IDF algorithm 
 [24], which allows estimating the weight of 
 features for the entire text data array. 
 Calculation of attribute values: the values of 8 
 attributes proposed by the author are calculated. 
 1. Extracting and analyzing data 
 At the entrance of the Web server requests via 
 HTTP are received. An example of the contents 
 of a GET request is shown in Fig.3. 
 Fig.1. WAF in Web Application Security System 
 No 2.CS (10) 2019 47 
Journal of Science and Technology on Information Security 
 Fig. 2. Example of the content fields of 
 HTTP request (GET method) 
 2. Conversion to a Vector Space Model 
 To convert strings into a vector form, 
allowing further application of machine learning Fig.3 - Analysis of incoming requests for Web 
 applications within the framework of the proposed model 
methods, an approach based on the TF-IDF 
method was chosen [24]. The length of the request fields sent from 
 TF-IDF is a statistical measure used to the browser (A1). 
assess the importance of words in the context The distribution of characters in the 
of a document that is part of a document request (A2). 
collection or corpus. The weight of a word is Structural inference (A3). 
proportional to the number of uses of the word Token finder (A4). 
in the document and inversely proportional to Attribute order (A5). 
the frequency of the word use in other The author proposed to introduce 3 
documents of the collection. Application of the additional attributes to improve the accuracy of 
TF-IDF approach to the problem being solved attack detection. 
is carried out for each request. The length of the request sent from the 
 For each word 푡 in the query in the total browser (A6) 
of queries the value tfidf is calculated From the analysis of legitimate requests via 
 the HTTP protocol, it was found out that their 
according to the following expression: 
 length varies slightly. However, in the event of an 
 tfidf(,)(,)() t d tf t d idf t (2) 
 attack, the length of the data field may change 
 The values of tf, idf are calculated in significantly (for example, in the case of SQL 
accordance with expressions (3), (4) respectively, injection or cross-site scripting). 
where 푣 is the rest of the words in the query . Therefore, to estimate the limiting thresholds 
 count(,) t d for changing the length of requests, two of the 
 tf(,) t d (3) 
 count(,) v d parameters are evaluated: the expected value  and 
  d variance 2 for the training set of legitimate data. 
 ||D Using Chebyshev's inequality, we can estimate 
 idf( t ) log (4) 
 |d D : t d | the probability that a random variable will take a 
 value far from its mean (expression (5)). 
 Thus, after converting the query ∈ into 
 2
the vector representation | | it will be set using Px(|  | ) , (5) 
the set of weights {푤푡∈ } for each value t from 
the dictionary T. where is a random variable, 휏 is the threshold 
3. Calculation of attribute values value of its change. 
 In [25], 5 basic attributes were proposed for Accordingly, for any probability distribution 
building a detection system computer attacks on with mean and variance , it is necessary to 
web applications: choose a value such that a deviation x from the 
48 No 2.CS (10) 2019 
 Nghiên cứu Khoa học và Công nghệ trong lĩnh vực An toàn thông tin 
mean 휇, when the threshold is exceeded, results 
in blocking the query with the lowest level of 
errors of the first and second kind. 
 The attribute value is equal to the probability 
value from expression (5): 
 A6 P (| x  | ) . (6) 
 Appearance of new characters (A7) 
 Fig. 4. An example of the complete dangerous HTTP 
 From the training sample of legitimate request with the POST method 
requests, we have to select some non-repeating When analyzing a full HTTP request, the 
characters (including various encodings) in order author focuses on the data in a red frame (Fig. 3). 
to compose the set of symbols of the alphabet . After the extraction process, the data will be 
Thus, when the symbol bA appears in the saved in the appropriate files (good_request.txt 
query, the value of the counter for this attribute is and bad_request.txt). The structure of these files 
increased by one. The value of the attribute itself is shown in Fig. 4. 
is calculated as the ratio of the counter value to 
the power of the alphabet set: 
 p
 A7 b (7) 
 ||A
 The emergence of new keywords (A8) 
 From the training sample of legitimate 
queries, we have to select some non-repeating 
terms (words) - 푡 in order to compose a set of 
terms of the dictionary. Thus, when the word 
 T appears in the query, the counter value p 
  
for this attribute is increased by one. The value of Fig.5. File of dangerous HTTP request 
the attribute itself is calculated as the ratio of the A preliminary study allowed us to obtain 
value of the counter to the power of the set of an estimate of the accuracy of detecting attacks 
terms of the dictionary: on Web applications of 96% for the data set [15] 
 p
 A8  (8) using the entered query attributes, query vector 
 ||T representation models and classifier based on 
 decision trees. This fact allows us to conclude 
 IV. CONCLUSION that it is possible to build an algorithm for 
 For testing the operation of machine learning detecting computer attacks on Web applications 
methods, a data set from several data sources of based on the proposed model for presenting 
system protection tools will be used, such as log requests to Web resources based on the vector 
files of the intrusion detection and prevention space model and differing in the attribute 
system, HTTP requests (GET, POST method) of attributes of requests via HTTP. 
the web application firewall, etc. 
 REFERENCES 
 [1] ]. Kaspersky Lab. Security report. - 2019. - (дата 
 обращения: 15.04.2019). http:/ / www. securelist. com 
 / en / analysis / 204792244 / The - geography - of - 
 cybercrime - Western - Europe- and-North-America. 
 [2]. A survey of intrusion detection techniques in cloud / C. 
 Modi [et al.] // Journal of Network and Computer 
 Applications. - Vol. 36, no. 1. - P. 42-57, 2013. 
 No 2.CS (10) 2019 49 
Journal of Science and Technology on Information Security 
[3]. Khamphakdee N., Benjamas N., Saiyod S. Improving Уфимского государственного авиационного 
 intrusion detection system based on snort rules for тех¬нического университета. - 2015. - Т. 19, 4 (70). 
 network probe attack detection // Information and [17]. Su M.-Y. Real-time anomaly detection systems for 
 Communication Technology (IColCT), 2014 2nd Denial-of-Service attacks by weighted k- nearest-
 International Conference On. - IEEE. - P. 69-74. 2014. neighbor classifiers // Expert Systems with 
[4]. A stateful intrusion detection system for world-wide Applications. - Vol. 38, no. 4. - P. 3492-3498. - 2011. 
 web servers / G. Vigna [et al.] // Computer Security [18]. Lee C. H., Chung J. W., Shin S. W. Network 
 Applications Conference, 2003. Proceedings. 19th intrusion detection through genetic feature selection // 
 Annual. - IEEE.. - P. 34-43., 2003 Software Engineering, Artificial Intelligence, 
[5]. Sekar R. An Efficient Black-box Technique for Networking, and Parallel/Distributed Computing, 
 Defeating Web Application Attacks. // NDSS. - 2009. 2006. SNPD 2006. Seventh ACIS International 
[6]. Mutz D., Vigna G., Kemmerer R. An experience Conference on. - IEEE - P. 109-114, 2006. 
 developing an IDS stimulator for the blackbox testing [19]. Intrusion detection with genetic algorithms and fuzzy 
 of network intrusion detection systems // Computer logic / E. Ireland [et al.] // UMM CSci senior seminar 
 Security Applications Conference, 2003. Proceedings. conference..- Pp. 1-6, 2013. 
 19th Annual. - IEEE- P. 374-383, . 2003.. [20]. Kruegel C., Toth T. Using decision trees to improve 
[7]. Li X., Xue Y. BLOCK: a black-box approach for signature-based intrusion detection // Recent Advances 
 detection of state violation attacks towards web in Intrusion Detection. - Springer - P. 173-191, 2003. 
 applications // Proceedings of the 27th Annual [21]. Bouzida Y., Cuppens F. Neural networks vs. 
 Computer Security Applications Conference. - ACM - 
 P. 247-256, 2011. 
[8]. Saxena P., Sekar R., Puranik V. Efficient fine-grained ABOUT THE AUTHORS 
 binary instrumentationwith applications to taint-
 Manh Thang Nguyen 
 tracking // Proceedings of the 6th annual IEEE/ACM 
 international symposium on Code generation and Workplace: Information Technology 
 optimization. - ACM..- P. 74-83, 2008. Faculty – Academy of cryptography 
 techniques. 
[9]. Браницкий А. А., Котенко И. В. Анализ и 
 классификация методов обнаружения сетевых Email: chieumatxcova@gmail.com 
 атак // Труды СПИИРАН. - Т. 2, № 45. - С. Training process: 
 207—244, 2016. 2005-2007: Student at the Military 
[10]. Heckerman D. A tutorial on learning with Bayesian Technical Academy. 
 networks // Innovations in Bayesian networks. - 
 2007-2013: Student at the Applied Mathematics and 
 Springer. - P. 33-82, 2008. 
 Informatics Faculty - Lipetsk State Pedagogical 
[11]. Friedman N., Geiger D., Goldszmidt M. Bayesian University – Russia Federation. 
 network classifiers // Machine learning. - - Vol. 29, no. 
 2017-present: Post-graduate student at the Military 
 2-3. - P. 131-163, 1997. 
 Academy of the Federal Guard Service Russian 
[12]. Goldszmidt M. Bayesian network classifiers // Wiley Federation. 
 Encyclopedia of Operations Research and 
 Management Science. - 2010. Research today: Computer network, network security, 
 machine learning and data mining. 
[13]. Barbara D., Wu N., Jajodia S. Detecting novel 
 network intrusions using bayes estimators // 
 Proceedings of the 2001 SIAM International D.S. Alexander Kozachok 
 Conference on Data Mining. - SIAM. - P. 1-17, . 2001 . Workplace: The Academy of 
[14]. Нейросетевая технология обнаружения сетевых Federal Guard Service of the 
 атак на информационные ресурсы / Ю. Г. Russian Federation. 
 Емельянова [и др.] // Программные системы: Email: alex.totrin@gmail.com 
 теория и приложения. - Т. 2, № 3. - С. 3-15., 2011. 
 The education process: has 
[15]. A Detailed Analysis of the KDD CUP 99 Data Set / received PhD. degree in 
 M. Tavallaee [и др.] // Proceedings of the Second Engineering Sciences in Academy 
 IEEE International Conference on Computational of Federal Guard Service of the 
 Intelligence for Security and Defense Applications. - Russian Federation in Dec. 2012. 
 Ottawa, Ontario, Canada: IEEE Press. - С. 53—58. - 
 (CISDA’09). - URL:  Research today: Information security; Unauthorized access 
 1736481.17 36489, 2009. protection; Mathematical cryptography; theoretical 
 problems of computer. 
[16]. Васильев В.И., Шарабыров И.В. 
 Интеллектуальная система обнаружения атак в 
 ло¬кальных беспроводных сетях // Вестник 
50 No 2.CS (10) 2019 

File đính kèm:

  • pdfrepresentation_model_of_requests_to_web_resources_based_on_a.pdf