Some application protocols such as FTP, H.323 and SIP divide transaction into two flows with two separate connections. The first connection is the control connection followed by data connection. In the FTP case with passive mode, the first connection to port 21 of the file server is the control connection. After the user has logged in and runs an ftp command eg 'ls' is at the beginning of the data connection in a very large port range 1024 - 65535. The firewall has to allow these two connections to operate, but not more than that, it should not allow so many ports. First the server and client will negotiate destination port number that the server will open to listen. But this information belongs to the application protocol and is not obtained by normal connection tracking that only get the protocol header deepest to layer 4. The data analysis behind the TCP header claim a protocol help called helper. This helper read the destination port number and creates a so-called expectation. That's the expectation of data connection to become 'related' to the control connection that acts as a master connection. We then only need to write 'ct state related' rule without knowing what the specific destination port is.
The helper registration is made by the nf_conntrack_helper_register() function. The argument entering the function is a nf_conntrack_helper structure pointer. This structure is the representative of the helper, including the following fields, we explain to the FTP case:
• Helper's name, "ftp".
• A tuple is a nf_conntrack_tuple structure, information that it needs to hold is the layer 3 protocol number, layer 4 protocol number and working port number of master connection. The layer 3 protocol is IPv4 or IPv6. The layer 4 protocol is TCP, the working port is 21.
• Function pointer help to handle the application protocol. This is the main job of helper.
• hnode is a hlist_node structure to insert this helper at the head of a list in a bucket in the helper hash table.
• Function pointer from_nlattr, the function is called when the master conntrack is injected from the user space, typically when the firewall recovers the connection. This function intends to handle netlink attributes but FTP only sets NF_CT_FTP_SEQ_PICKUP flag to ignore the sequence number checking of the data because the previous backup firewall does not know the sequence number (only the active firewall can update the sequence numbers).
• Flags, expectation policy and some other things.
The registration function takes the index based on the helper's tuple to access the array of the hash table and insert the helper at the head of the list at that index. Once the node is in the hash table, the helper is its container so can be retrieved.
When the first packet of the master connection arrives, like any normal connection, the connection tracking system (hereafter called the conntrack system) will create an entry called conntrack. A conntrack is a nf_conn structure that holds connection information, including:
• The status field holds the connection state, is the packet has been seen both ways, has left the box (confirmed and conntrack has been inserted into the official hash table in the last hook postrouting), is the expected connection, is the new connection or dying...
• tuplehash is an array of nf_conntrack_tuple_hash structs, there are two structures for the original and reply directions. Each structure has two fields: a tuple holding connection information and a hnnode. Each structure is inserted into the hash table through its hnnode. When it is need to access a conntrack that has a node in the hash table, the system computes the index to get the bucket containing the node, and then finds the node in the bucket. Once a node is present, the pointer is moved back by data offset to the beginning of the nf_conn structure and conntrack is obtained. Two tuples are the most important members of a conntrack. Unlike helper tuple that need only three types of information to access its hash table, conntrack's tuples contain full information: source address, destination address, source port, destination port, layer 3 protocol, layer protocol 4 and direction. One tuple is for a packet's original direction and another for reply direction.
• master conntrack, if it is a data connection, this element points to the conntrack of the control connection.
• timeout defines the life time of a conntrack. When it expires, the conntrack is destroyed.
• ct_general is a nf_conntrack structure. This structure has only one member, use, which is used to manage the reference count of the conntrack object. When the reference count decreases to 0, it is safe to release the object. In practice, the nf_ct_put() function is used to reduce the reference count by 1 and if the reference count is zero, the object is released. The function nf_ct_expect_put() has the same function as the nf_ct_put() function but applies to the expectation object.
• The ext pointer points to the nf_ct_ext structure. This structure has a data field which holds some extension structures to be added as needed. The offset field is an array containing the offsets of each extended structure from the beginning of the container structure with the array index of their id. offset[NF_CT_EXT_HELPER] is the offset of the nf_conn_help structure. The nf_conn_help structure helps the master conntrack manage its expectations and communicate with the helper. The nf_conn_help structure has four members:
◦ The helper pointer points to the helper.
◦ expectations is a structure hlist_head. It is the head of the list of expectations of the master conntrack. A newly created expectation will be inserted at the head of this list. This is how the master conntrack manages its expectations independently of the general management of expectations in the expectation hash table.
◦ expecting is an array of integers, holding the number of current expectations by class. Currently FTP only uses one class, 0.
◦ data is a 32 byte field for helper-specific information. FTP uses this data for the nf_ct_ftp_master structure which holds the NF_CT_FTP_SEQ_PICKUP flag and sequence number information as described above.
Another extension structure is nf_conntrack_ecache. This structure has a cache field that holds reporting events such as IPCT_NEW, IPCT_DESTROY, IPCT_HELPER.
• Some other things.
During conntrack initialization, helper assignment is performed if automatic helper assignment is configured. This includes finding the helper, adding the nf_conn_help structure and assigning its helper pointer to the helper. At the stage of the last hook, postrouting, before the packet comes out of the box, conntrack is confirmed with the nf_conntrack_confirm() function. If the packet is accepted this function inserts conntrack into the hash table. Then it checks the helper with the nfct_help() function. The nfct_help() function returns a pointer to the nf_conn_help structure. Because the helper was assigned, this went smoothly. So it sets the event with the nf_conntrack_event_cache() function, setting bit 1 << IPCT_HELPER. Finally, the nf_conntrack_confirm() function delivers the event with the nf_ct_deliver_cached_events() function (in the nf_conntrack_core.h source file). This function first looks for cached events using the nf_ct_ecache_find() function. Since the event is set, it finds this one together the other events. Thus the cached events are delivered once, and they are deleted immediately after delivery (this is done by the statement: events = xchg(&e->cache, 0);).
In the nf_ct_deliver_cached_events() function there is a notify pointer of the nf_ct_event_notifier structure. This construct has a fcn field which is a function pointer to handle event messages. Meanwhile in the source file nf_conntrack_netlink.c there is a struct nf_ct_event_notifier that was initialized when declared with its fcn pointer assigned the ctnetlink_conntrack_event() function. Now let's briefly analyze network activity to understand the event handover process.
The net structure holds the network operations. This structure has a gen pointer field that points to a net_generic structure (generic.h source file) which has a ptr field to be an array whose indexes are the ids of the net's network operations. Each network operation registered with the register_pernet_subsys() function which takes an argument of a pernet_operations structure pointer (the net_namespace.c source file). The pernet_operations structure is a representation of network activity. It has a list field to insert into the list of operations, an id pointer, an init function pointer, and several others. The register_pernet_subsys() function calls the register_pernet_operations() function. This function calls the ida_alloc_min() function to generate an id for the activity, and then it calls the __register_pernet_operations() function for the specific registration. In turn, the __register_pernet_operations() function adds the operation to the list and calls the ops_init() function to initialize the operation with an initialized net structure named init_net. It is the default net and also the only net in the system if we do not create additional network namespaces.
The ops_init() function calls the net_assign_generic() function to assign the new net_generic structure (if necessary) to the net's gen pointer, and then calls the pernet_operations structure's init function to initialize the network operation. The registration of network activity here has been completed.
We return to the event delivery part of the ctnetlink activity. The ctnetlink subsystem registers activity with the register_pernet_subsys() function where the active structure is ctnetlink_net_ops. The ctnetlink_net_ops has an init function of ctnetlink_net_init(), so registration leads to a call to ctnetlink_net_init(). The function ctnetlink_net_init() again calls the function nf_conntrack_register_notifier() with the argument of the above nf_ct_event_notifier structure pointer.
Talking more about the net structure, it has a ct field which is a netns_ct structure that manages conntracks. The netns_ct structure has a field nf_conntrack_event_cb which is a pointer to the nf_ct_event_notifier structure with the goal of holding event notification callback function. So the nf_conntrack_register_notifier() function assigns address of the nf_ct_event_notifier structure above to the net's nf_conntrack_event_cb pointer.
Back to the nf_ct_deliver_cached_events() function, which uses the rcu_dereference() function to obtain net->ct.nf_conntrack_event_cb and assigns to the notify pointer. Pointer notify then run fcn ie call ctnetlink_conntrack_event() function with arguments to be the events and address of a nf_ct_event structure which contains the conntrack pointer.
nf_ct_deliver_cached_events() only delivers the events, while the ctnetlink_conntrack_event() function actually broadcasts the events.
What we are interested in here is IPCT_HELPER ie event creating helper for the master conntrack.
The ctnetlink_conntrack_event() function generates an event message including the header and payload in a socket buffer and fills the information based on the events and conntrack.
There are many types of events, but messages fall into three categories:
1. DESTROY: if the event is IPCT_DESTROY, then the message type is DESTROY (IPCTNL_MSG_CT_DELETE bit).
2. NEW: if the event is IPCT_NEW or IPCT_RELATED, the message type is NEW (IPCTNL_MSG_CT_NEW bit, with the NLM_F_CREATE | NLM_F_EXCL flags are set).
3. UPDATE: in the remaining cases, the message type is UPDATE (bit IPCTNL_MSG_CT_NEW but do not set the flag). The UPDATE message means that conntrack was created in the previous events and this is its updated status.
The helper creation happens in the new connection, the IPCT_HELPER event goes together with the IPCT_NEW event so the message type is NEW.
There are two message subsystems: the message subsystem for conntrack is NFNL_SUBSYS_CTNETLINK and the message subsystem for expectation is NFNL_SUBSYS_CTNETLINK_EXP. In this case it is conntrack so the function puts the type NFNL_SUBSYS_CTNETLINK in the message header.
With the IPCT_HELPER event it calls the ctnetlink_dump_helpinfo() function. The ctnetlink_dump_helpinfo() function asserts that it must obtain the helper pointer of the nf_conn_help structure of conntrack, otherwise the helper is considered non-existent . This happens when the firewall recovers the connection, it injects the conntrack whose inherent helper into the kernel table but loses the helper shortly after (we'll fix this soon).
The ctnetlink_dump_helpinfo() function then sets the netlink attribute CTA_HELP and the nested attribute CTA_HELP_NAME with the helper's name to the message. Finally, the nfnetlink_send() function sends the message to the opening netlink sockets.
On the userspace side, the conntrack tool (command conntrack -E) opens a nfnetlink socket. Whenever events are delivered, it will collect the message and output the information
As we can see, the message is NEW and the helper is created at the very first step in the three-way TCP connection establishment.
For the conntrack -L command, it queries the conntrack hash table and receives conntrack messages. The messages are then fed to a callback function named __callback (the callback.c source file of the libnetfilter_conntrack library). This function handles both conntrack and expectation messages. It analyzes the nlmsghdr message header structure and finds that the nlmsg_type field matches NFNL_SUBSYS_CTNETLINK, so it knows this is a conntrack message. The function therefore creates an nf_conntrack struct pointed by the ct pointer, which holds the user-space conntrack information, and uses the nfct_nlmsg_parse() function (parse_mnl.c source file) to parse the message and populate the conntrack. This function again calls the nfct_payload_parse() function to parse the payload. The nfct_payload_parse() function declares an array of pointers of nlattr structure (netlink attribute structure), the array name is tb. The nlattr structure has two fields: nla_len is the length of the attribute data including the header and payload, and nla_type is the attribute type.
<----- MNL_ATTR_HDRLEN ----> <-- MNL_ALIGN(payload)-->
+---------------------+- - -+- - - - - - - - - -+- - -+-------------- - -
| Header | Pad | Payload | Pad | Header
| (struct nlattr) | ing | | ing | (struct nla
+---------------------+- - -+- - - - - - - - - -+- - -+-------------- - -
<-------------- nlattr->nla_len --------------> ^
Padings are added so that the header and payload extend to multiples of 4.
nla_type consists of 16 bits: the last 14 bits are the actual attribute type; if the first bit is 1, then this attribute has nested attributes inside its payload; if the second bit is 1 then the payload is stored in network byte order, otherwise host byte order. However, the event messaging subsystem does not use the second bit because the netlink attributes are known in advance of the byte order of the payload and therefore need not be checked for this bit.
nla_type (16 bits)
| N | O | Attribute Type |
N := Carries nested attributes
O := Payload stored in network byte order
Next, the nfct_payload_parse() function calls the mnl_attr_parse_payload() function to populate the tb array, in the arguments nfct_parse_conntrack_attr_cb is a callback function called to process the data. The function mnl_attr_parse_payload() (libmnl library's attr.c source file) uses the mnl_attr_for_each_payload macro to iterate through the attributes in the message payload. For each attribute the callback function nfct_parse_conntrack_attr_cb() is called to process and tb contains attribute pointers with the array indexes to be the attribute types. For helper event the attribute pointer is tb[CTA_HELP].
Back to the nfct_payload_parse() function, there is now the tb[CTA_HELP] pointer, so it calls the nfct_parse_helper() function. The nfct_parse_helper() function again parses the CTA_HELP attribute to get the nested attributes in its payload. The nfct_parse_helper() function again uses a different tb array, and it obtains pointer tb[CTA_HELP_NAME] which points to the nested attribute, CTA_HELP_NAME.
At this point, we can easily get the character string that is the name of the helper by taking the payload of CTA_HELP_NAME, that is, adding to the pointer tb[CTA_HELP_NAME] an amount of MNL_ATTR_HDRLEN. The nfct_parse_helper() function copies the name of the helper to the helper_name field of conntrack via the ct pointer. It then sets the ATTR_HELPER_NAME bit for ct->head.set. The statement in the source code is as follows:
Message retrieving is complete, the __callback function passes the work to another callback function that was registered by nfct_callback_register(). Now it is the job of the conntrack tool to receive the conntrack handover from the library to render the information. That callback function is dump_cb(). After a bit of filtering by user demand on the command line, it calls the nfct_snprintf_labels() function to output the information. The nfct_snprintf_labels() function goes through a chain of intermediate processing functions, and the final output function is __snprintf_conntrack_default() (snprintf_default.c source file of the libnetfilter_conntrack library). The __snprintf_conntrack_default() function checks the ATTR_HELPER_NAME bit of ct->head.set and finds out so it outputs the name of the helper, here "ftp".
If the master conntrack fails to create or loses helper, then conntrack -L or conntrack -E cannot display the name of the helper. Otherwise, once they see the ATTR_HELPER_NAME bit they will display helper=ftp
conntrack -E and conntrack -L both handle conntrack messages, so why doesn't conntrack -L present the message type?
There are some notable points in the output of conntracl -L :
• The dump_cb() function does not handle the type of the message, so it does not display the message type.
• The kernel doesn't set appropriate flag for the message header so the rendering doesn't show the correct message type.
• conntracl -L only lists the conntracks currently in the kernel's conntrack hash table (hereafter called the conntrack table) so obviously there is no DESTROY type message. Conntrack has thus been removed from the list. Expired conntracks are withdrawn from the conntrack table and placed on the dying table, but usually immediately after that it is withdrawn from the dying table and freed from memory. When the system is working properly, the dying table is always empty.
• The hash of a conntrack depends only on net and tuple. During the three-way TCP connection establishment, the net is the same (and is the unique net when not using the network namespace). The protocol parameters of layers 3 and 4 of the steps are the same so they have the same tuple (second step - SYN_RECV uses the reply tuple). Therefore conntrack is created only once in the first step (SYN_SENT). The next steps have the same hash to get into the conntrack bucket, and with tuple, net (and zone alike) they get the pre-generated conntrack. Thus, all three TCP connection steps have only one conntrack in the table, which is updated to the last step (LAST_ACK), and conntracl -L only outputs that single entry (conntrack -E shows all three entries).
• The helper information is filled in the message using the ctnetlink_dump_helpinfo() function as for the event.
• Each conntrack has two table entries, one for the original tuple and one for the reply tuple. But dumping the table requires only one entry, of the original tuple.
• A new conntrack is normally set three bits of status until it is inserted into the table, if NAT is not in use: IPS_DST_NAT_DONE, after the packet exits the nf_conntrack_in() function and the setup of the destination NAT is considered complete (even with use NAT or not) before routing; IPS_SRC_NAT_DONE, source NAT setup is considered complete, before conntrack is confirmed by nf_conntrack_confirm(); IPS_CONFIRMED, conntrack is confirmed and inserted into the hash table, has nothing more to do with the packet and let it go out of the box (add the IPS_EXPECTED bit if the conntrack matches expectation).
If you want conntracl -L to display the full message type you can modify the dump_cb() function and the kernel. The modified code to set message flag as follows:
if (ct->status == (IPS_CONFIRMED | IPS_NAT_DONE_MASK)) flags |= NLM_F_CREATE; /* fill message */ ... flags &= ~NLM_F_CREATE;
Where IPS_NAT_DONE_MASK := (IPS_DST_NAT_DONE | IPS_SRC_NAT_DONE).
A better way is to use the IPS_SEEN_REPLY bit to distinguish a new conntrack:
if (!test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) flags |= NLM_F_CREATE; /* fill message */ ... flags &= ~NLM_F_CREATE;
Share on Twitter Share on Facebook Share on Linked In
Can't see mail in Inbox? Check your Spam folder.