Jianfeng Guan Neng Zhang Changqiao Xu Mingchuan Zhang PAWS Hongke Zhang Internet-Draft BUPT Intended status: Informational Hongke Zhang Expires: June 12, 2013 December 12, 2013 PAWS Smart Database draft-guan-paws-smart-database-00 Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on June 12, 2014. Copyright Notice Copyright (c) 2013 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Expires June 12, 2014 [Page 1] Internet-Draft PAWS Smart Database December 2013 Abstract This document provides a Smart Database operation mechanism for PAWS. By this mechanism the master device gets the optimized white space it should communicate to in the regulatory domain. The mechanism is an extension of protocol to access spectrum Database based on user behavior analysis and machine learning concept. Table of Contents 1. Introduction.............................................. ..2 2. Conventions used in this document............................3 3. Procedure Overview........................................ ..4 3.1. Problem Description.....................................4 3.2. Multi-Dimensional Aggregation Policy....................5 3.3. Data Preprocessing......................................6 4. Specification........................................... ....6 4.1. Feature Abstraction.....................................6 4.2. Dataset Training by Machine Learning Methods............8 4.2.1. User Behavior Clustering...........................8 4.2.2. Binary Prediction..................................8 4.2.3. Spectrum Service Recommendation....................8 4.3. Prediction Results......................................9 5. Working flow............................................ ....9 5.1. Spectrum prediction scenario............................9 5.2. WSDB Commendation Procedure............................10 6. Security Considerations.....................................10 7. IANA Considerations.................................... ....10 8. Conclusions............................................ ....11 9. References............................................. ....11 9.1. Normative References...................................11 10. Acknowledgments...................................... .....11 Authors'Addresses................................ ............12 1. Introduction Nowadays, the conception of white space allocation and utilization can come true due to the dynamic spectrum access technology. The increasing spectrum allocation algorithms and industrial solutions have been progressively proposed and implemented from lab to reality, as well as gradually accepted standards presented by IETF working group PAWS. In PAWS protocol, the Database is responsible for spectrum allocation to the master device. However, there is an Expires June 12, 2014 [Page 2] Internet-Draft PAWS Smart Database December 2013 emerged problem that the user behavior of spectrum usage differs from each other while the Database can exclusively distribute users with same spectra stored in the server. This would be another kind of waste due to such imbalanced spectrum usages. From another perspective, although the white space could realize spectrum usage diversity through dynamic random access, taking into account the reasons for fairness allocation and security considerations, some manual intervention and administrative controls are necessary to coordinate spectrum resources intensively. Likewise, heavy information overload caused by competition for one spectrum could be balanced among multiple white spaces equilibration. With respect to such diversified spectrum access motive and management issues, some studies have been undertaken to optimize the spectrum allocation while seldom would concentrate on the above issues. The European FP7 FARAMIR project focuses on spectrum measurement with performance characteristics, to increase the radio environmental and spectral awareness under dynamic spectrum access scenarios. Traffic management research and projects are being carried out in international communications companies to realize efficient spectrum utilization via cloud computing as user behavior demand. But relevant standards have not yet appeared and so far every user is subject to access static spectra with various services. Obviously one format does not fit all. Based on the above observation, we propose a Smart Database analysis and operation mechanism for PAWS. Unlike previous work, our approach allows to characterize spectrum usage behavior applied to different purposes flexibly. The smart Database is proposed initially to enable user behavior recognition and demand-driven spectrum distribution. By this mechanism the master device and slave device can get the optimized WSDBs to communicate with better quality of Experience (QoE) in the regulatory domain. Our protocol is an expansion of the existing PAWS protocol to boost advanced network functions and spectrum usage efficiency. 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. The terminology from PAWS: problem statement, use cases and requirements PAWS RQMTS [PAWS RQMTS] is applicable to this document. White Space Database Analysis Server (WSDB AS): Expires June 12, 2014 [Page 3] Internet-Draft PAWS Smart Database December 2013 This is a specific smart WSDB with cognitive ability and new functionalities such as selecting history data to train samples and learn the user behavior. This server acts as a smart analysis center, abides by learning, control and coordination for white space, benefits both to WSDB and clients. The server operates with three functions: users and service clustering, service prediction with data learning and recommendation analysis with collaborative filtering. The primary goal is to provide the proper white space spectrum towards users?access request. This server can be integrated with a normal WSDB, or a standalone administrator with other auxiliary management functions, depending on the regulatory domain scope and performance requirement. This draft is in scope for the reason that it could provide a group of formatted information for querying the Database using a smart method. Moreover, the device receives a list of available whitespace frequencies at the specified condition with a probability. The device can select a spectrum and send an acknowledgment to the Database. To some extent, the Database can be more cognitive after we expand the Database functions with regard of learning the user condition when querying. 3. Procedure Overview 3.1. Problem Description As previously mentioned, in current PAWS protocol, a typical case is that if plenty of users are simultaneously allocated with a same spectrum resource by Database, one that with small telephone traffic would result in bandwidth surplus while others with video delivery may suffer great QoS degradation due to interferences or limited bandwidth. Our goal is to allocate spectrums based on their attribute and usage behavior. For instance, some low frequency bands with long wavelength are fit for coverage, while some others for capacity, or suitable for large video transmission. Relying on user behavior analysis, a smart Database can recognize and match, make decisions to select the appropriate spectrum for users. Such automatic configuring functions also conform to an especially vital concept in future software- defined network and software-defined radio trend. To realize such functions, we attempt to employ machine learning methods to capture the user behavior pattern based on two reasons. Firstly, various mobile communication services show different and subtle characteristics which are hard to analyze by one simple Expires June 12, 2014 [Page 4] Internet-Draft PAWS Smart Database December 2013 intelligent algorithm. Secondly, the requirement in big data era make machine learning methods over perform than other learning methods such as reinforce learning or correlation analysis to some extent. We classify the general procedure as sensing, deciding and recognition. First we will describe the label selection. Then we will discuss the data mining methods. Correspondingly, the prediction results will be given later. After the analysis of the WSDb AS, the protocol interaction with users will be showed along with newly added optimized parameters. 3.2. Multi-Dimensional Aggregation Policy For the purpose of management assumption, the AS can be deployed in different platforms to send the results to Master Device uniformly or Slave Device directly. Especially for AS location, it can be deployed on Master Device, telecommunications or Internet enterprises. To obtain the multi-dimensional user data samples, a serious of packets inspection or traffic monitor tools and our smart analysis function can be combined to deeply probe potential demand of bandwidth and service for users. Further in view of community benefit, we collect and aggregate data flows by five common deploying policies: (1) data flows for users that share a master device. The Database can be deployed on a base station for real-time analysis and computing. This is a most basic method to manage the spectrum occupancy and redistribution. (2) data flows for users that go to a same master device. On account of security consideration for traffic volume, we can allocate some kind of white space such as a trust channel to users. (3) data flows that pass through a backbone network or a telecommunications. This is another common performing method for commercial value promotion of spectrum and traffic and bandwidth planning. (4) data flows on an Internet enterprise such as Youtube, Facebook. Take Youtube for example, users that request for one video can be aggregate to cultivate one behavior habit and distribute them a relatively large bandwidth. Expires June 12, 2014 [Page 5] Internet-Draft PAWS Smart Database December 2013 (5) data flows for users applying a same application such as WeChat. The spectrum and traffic demand would vary from one service to another, or even one application may contain a serious of service such as video, audio and text. The spectrum features can be utilized to pack to a bundle of functions. Although the data can be easily collected by these policies, limited by respective business area of companies, the potential value of data cannot be immensely released. Thus, data mining can be explored more sufficiently based on the cooperation of these entities. 3.3. Data Preprocessing In future ubiquitous network era, personal traffic volume may be all kinds of information sources including sound, image, video, fingerprint, product information, biological information or brain wave. Those will traverse among countless user equipments and make it more difficult to organize. In our model, we adopt machine learning algorithms to abstract user behavior features and predict the spectrum usage. In this preprocessing step, our goal is to normalize the messy data into a training dataset. Firstly, we adopt general cloud technologies such as HDFS and Map- reduce methods to perform segmented metadata storage. Then the raw data would be aggregated into a dataset with one policy above. After cloud processing and data cleaning, different types of data could be normalized into structured data. In some cases, only a few samples can be trained to predict for small data size. Otherwise random data sampling would be required to reduce the big data complexity. 4. Specification Here is how the Database trains the datasets and predicts a suitable white space to assign. A general procedure is to abstract features, train datasets and predict new data results. It would be a great utilization for scaling and parallelizing machine learning algorithms on big data inside the cloud. 4.1. Feature Abstraction The user features and parameters selection comply with a general unsupervised modeling process. Common feature selection and feature extraction methods such as Filter, Principal Components Analysis (PCA) and Singular Value Decomposition (SVD) are feasible to find significant feature training subsets to some extent. Unlike traditional wireless resource distribution conditions, the features Expires June 12, 2014 [Page 6] Internet-Draft PAWS Smart Database December 2013 in white space access would be more complicated. Here we elaborate several typical features on behalf of user behaviors. 1 Geocation: It is noted that available spectrums are often sensed in a limited area so that topographic information of slave device would affect white space quality and selection. Specific geographic information for the latitude and longitude of the antenna height, etc., can be quantized into a value as a characteristic for data learning. 2 Time label: this feature is composed of two variables. On one hand, the different levels of time scale affect the user behavior pattern. For example, in the beginning of a month, enough monthly mobile data plans may not impel users to intensely seek other resources, thus less frequency hopping in the beginning and similarly more white space requests in the end of a month. Moreover, the spectrum requirement varies in one day. On the other hand, the spectrum occupancy behavior is also influenced by usage time interval. According to the timestamp, this value could be quantized as accurate to minute and time scale would be quantized as every hour of subsection in a month. 3 service types: With respect to numerous applications such as streaming video, Voice over IP (VoIP), e-commerce, Enterprise Resource Planning (ERP) and others, we intend to differentiate them so as to provide a better QoE in addition to best-effort service. Obviously, different applications have variable demands for delay, jitter, bandwidth, packet loss, and availability. Referred to the definition of RFC 4594 and 5127, in view of tolerance to packet loss, delay and jitter, we classify customer service as four types, ten classes with priority values. Meanwhile, referred to the standards of operators and other entities, service types can be classified more flexibly. 4 roam state: this is also a two dimensional feature which have a current roaming state and a handoff frequency of one device out and in a resident area. It is believable that as increasing mobile apps and services emerge, more features like biological data will be introduced into training sets so as to redefine the feature abstraction criterion with machine learning. Expires June 12, 2014 [Page 7] Internet-Draft PAWS Smart Database December 2013 4.2. Dataset Training by Machine Learning Methods This step is to train the established datasets and validate the test results. The primary goal is to predict a most suitable white space according to the user behavior condition. Moreover, other suitable service can be predicted and recommended as well to fulfill the user potential requirement. For user behavior analytics in traditional wireless network or small scale of user quantity, common clustering methods would meet classification or prediction requirements. With the tremendous information explosion and growth of data volume, in the light of different application purposes, it is necessary to utilize more scalable-parallel machine learning tools and methods aiming at such big data. Relevant big data and cloud analytics technologies can be referred to general industry standards. The user data can be also divided locally based on neighborhood similarity for parallelizing process on big data by machine learning methods. Likewise, the Database could be locally distributed in some scale to carry out dataset training. The specific methods can be classified according to the following three analytic models. 4.2.1. User Behavior Clustering The clustering technologies aim to aggregate several items by likelihood and similarity. In our protocol, this kind of methods can be used to aggregate users with similar behavior. Then we execute same actions to this cluster of users like uniform spectrum distribution. This is a basic 4.2.2. Binary Prediction We mainly exploit this learning process model to make decisions and predict a spectrum with confidence or probability. Muilti-ruleset data mining tools such as sparse Bayesian methods and kernel based methods could be prior implemented to give a better prediction results. 4.2.3. Spectrum Service Recommendation The goal for this model is to predict and recommend a service for multiple users with similar behavior. Information filtering technologies and recommender systems based on similarity could match users with spectrum and service they most likely to be interested by Expires June 12, 2014 [Page 8] Internet-Draft PAWS Smart Database December 2013 some kind of scoring mechanism. Muilti-ruleset collaborative filtering could be implemented to compute these preferences and recommend spectrums or other user-oriented service such as data traffic plans. Moreover, such a correlation and filtering mechanism could monitor the spectrum usage mass activity to prevent malicious users?cooperative attack. 4.3. Prediction Results When new spectrum request coming, the Database could abstract user features mentioned above, predict the spectrum based on the trained model. Since such a spectrum is the one that most suitable or frequently-used, the Database can directly response a best candidate spectrum or spectrum lists with probability, instead of a random selected available spectrum list. This also ensures to access a stable and trusted Database out of security consideration. Manual operation would be permitted and pre-built in Database. Similarly, other recommended output results can be pushed via spectrum response. Predicted results could be added to training datasets to improve the prediction accuracy as well as automatically adjust the false alarm rate to adapt the fitting. An alternative recommendation method is that when a requested spectrum period is expired, a master device quits the spectrum occupancy and sends a spectrum feedback to the WSDB. This feedback is marked as an evaluation degree to describe the satisfaction for this white space access. If the number is frequently higher statistic, then this spectrum will be top-ranked and prior allocated to other users for the next time. 5. Working flow This section we will introduce the system implement architecture. Our Database should be locally distributed to solve the mobility and scalability problems. Since node mobility management issues will involve the related registration and termination problems, localization can relieve low latency queries and scalability issues. These also bring advantages that the big data can be learned in portion and integrated for varigrained analysis freely by transforming between lower and higher dimensional data space. 5.1. Spectrum prediction scenario Expires June 12, 2014 [Page 9] Internet-Draft PAWS Smart Database December 2013 +-----------+ +-----------+ +----------+ | | | | | WSDB | | WSD | | WSDB | | Analysis | | | | | | Server | +-----------+ +-----------+ +----------+ | | all users history | | | feature abstraction | | |---------------------------| | | | | |dataset training & modeling| | |---------------------------| | | | | AVAIL_SPEC_BATCH_REQ | | |-------------------------->| | | | feature abstraction | | | & spectrum prediction | | |<------------------------->| |AVAIL_SPEC_BATCH RESP with | | | predicted spectrum | | |<--------------------------| | | | | Figure 1 Procedures of WSD gets predicted spectrum from WSDB From the Figure 1 we can see that the Database is no need to check the current available spectrum for every white space device. Or even we can trace the user activity behavior and preset the likely used spectrum for a series of users. In this way, it will shorten the query delay and resource lookup cost with access to an optimized spectrum in return. 5.2. WSDB Commendation Procedure 6. Security Considerations With regard of the security assumption in user case requirements, the Master Device and the Database may suffer six types of threats. Without additional message interaction, our protocol will not introduce new intercept risks. Moreover, a crowd of malicious attackers could be easily identified since they would act with similar behavior. 7. IANA Considerations This document makes no request of IANA. Expires June 12, 2014 [Page 10] Internet-Draft PAWS Smart Database December 2013 8. Conclusions This memo discusses a smart Database functions during white space database access and describes some scenarios. 9. References 9.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3339] Klyne, G., Ed. and Newman, C., "Date and Time on the Internet: Timestamps", RFC 3339, July 2002. [RFC4594] Babiarz, J., Ed. and Chan, K., "Configuration Guidelines for DiffServ Service Classes", RFC 4594, August 2006. [RFC5127] Chan, K., Ed. And Baker, F., "Aggregation of Diffserv Service Classes", RFC 5127, February 2008. [I-D.ietf-paws-protocol] Chen, V., Das, S., Zhu, L., Malyar, J., and P. McCann,"Protocol to Access Spectrum Database",Draft- ietf-paws-protocol-03(work in progress),February 2013. [I-D.das-paws-protocol] Das, S., Malyar, J., and D. Joslyn, "Device to Database Protocol for White Space", draft-das-paws- protocol-02(work in progress), July 2012. [I-D.ietf-paws-problem-stmt-usecases-rqmts] Mancuso, A. and B. Patil, "Protocol to Access White Space (PAWS) Database: Use Cases and Requirements", draft-ietf-paws-problem-stmt-usecases- rqmts-12 (work in progress), January 2013. [I-D.wei-paws-framework] Wei, X., Zhu, L., and P. McCann, "PAWS Framework", draft-wei-paws-framework-00 (work in progress), July 2012. 10. Acknowledgments Thanks to my colleagues for their sincerely contributions and comments when drafting this document. Expires June 12, 2014 [Page 11] Internet-Draft PAWS Smart Database December 2013 Authors' Addresses Jianfeng Guan State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications, Beijing, 100876, P.R.China EMail: jfguan@bupt.edu.cn Neng Zhang State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications, Beijing, 100876, P.R.China EMail: zn@bupt.edu.cn Changqiao Xu State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications, Beijing, 100876, P.R.China EMail: cqxu@bupt.edu.cn Hongke Zhang Expires June 12, 2014 [Page 12] Internet-Draft PAWS Smart Database December 2013 State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications, Beijing, 100876, P.R.China EMail: hkzhang@bupt.edu.cn Expires June 12, 2014 [Page 13]