I. Introduction
Network protocol recognition focuses on the capability to recognize which protocol or application generated the network traffic. It is significant for Internet Service Providers and network administrators who always want to know what type of traffic is traversing their network backbones. Therefore, protocol identification makes the source of monitored network traffic visible and has many potential applications, such as Quality of Service (QoS), network security monitoring (IDS/IPS), traffic visualization, network forensics, trends and changes in network applications and more. Protocol identification through Deep Packet Inspection is the most widely applied technique in industry and becomes de facto standard, though it is deemed extremely expensive in terms of processing costs on high speed networks. Fortunately, the consideration can be alleviated by exploiting many new high-performance techniques [10] and optimization strategies [4]. The core of DPI is to match the content of the traffic payload with the pre-constructed fingerprints, also called signatures, typically in form of regular expression. However, inferring accurate and efficient fingerprint for various application protocols faces several challenges. (i) Traditionally, it is a time-consuming, challenging task requiring lots of manual analysis from network protocol experts based on protocol specifications and packet traces. (ii) A majority of proprietary protocols are lack of publicly available documentations, although there are standard RFCs for the public-domain protocols. (iii) Although the protocol fingerprint can be obtained from the open specifications' it may not tackle all the variants. The reason hiding behind this is that the same protocol probably have different implementations. Moreover, some of these implementations don't comply with the open available specification. (iv) The labour-intensive manual signature extraction process has to be repeated from time to time so as to maintain a latest signature repository.