Zero-copy BPF buffer implementation
-----------------------------------

Reduce the number of system calls, copies, and even context switches in BPF
by adding shared memory buffers between userspace and the kernel.  Processes
select zero-copy buffer mode, "donate" two buffers to the kernel, which are
used in place of the two kernel memory buffers in BPF.  The process uses a
shared memory interface to check for new data and acknowledge buffers, but
can also use an ioctl to force early rotation of a buffer before it is full
(timeout), and select()/poll()/kevent() to wait for a buffer to fill.  This
API allows the number of system calls used to access BPF data to go to zero
as the load increases.

This implementation was created by Robert N. M. Watson under contract to
Seccuris Inc., in collaboration with Christian S. J. Peron of Seccuris Inc,
and has been released under a two-clause BSD license.

Building
--------

Untar the tarball and drop the new src/ files into your src/ tree; this
should consist of two new .c files and two new .h files in src/sys/net/.

Apply the patch, which should modify a number of files in the kernel,
especially src/sys/net and src/sys/conf, as well as in contrib/libpcap in
order to teach the pcap library how to use zero-copy buffers.

Build a fresh kernel and install it; build and install a fresh libpcap.

A new sysctl will be present, net.bpf.zerocopy_enable -- when it is set to 1
all new BPF sessions created by libpcap will use zero-copy support.  If set
to 0, new sessions will use buffered reads.  The BPF_ZEROCOPY kernel option
and  BPF_ZERO_COPY/BPF_ZEROCOPY environmental variables used in earlier
prototypes have now been removed in favour of this run-time configuration
model.

Notes
-----

You can learn more about this implementation by looking at the slides from
BSDCan 2007:

  http://www.watson.org/~robert/freebsd/2007bsdcan/20070517-devsummit-zerocopybpf.pdf

src/libpcap/pcap-bpf.c has a relative include that needs to be changed to an
absolute include before it can be committed.

The necessary autoconf/automake work has not yet been done to submit these
patches back to tcpdump.org.

If doing an incremental rebuild of libpcap, make sure to "make clean" before
"make" and "make install", as the dependencies appear not always to pick up
changes in the size of pcap_t, which can cause libpcap to become confused.

Issues
------

The zero-copy buffer implementation follows the same held/store/free buffer
model as with buffered reads.  We currently assign ownership of a buffer to
userspace using the user/kernel generation number scheme only for buffers in
the store position.  As the user acknowledgement to the kernel is
asynchronous (via a memory write rather than a system call), it's possible to
get into a state where the a rotation won't occur until another packet is
delivered to BPF, even if the buffer in the store spot is full.  This means
that the user process may wait undefinitely for the store buffer to be move
to the hold position and get assigned to userspace, if no new packets arrive.
This should probably be fixed by allowing a full buffer in the store spot to
be assigned to userspace, allowing the buffer to be fully read without
waiting for a new packet to arrive in order to trigger the state machine.

This can be worked around in userspace by performing a rotate ioctl to cause
the recently acknowledge buffer to be moved into the free position (etc)
before sleeping waiting for new data (libpcap currently doesn't do this), but
this problem should be fixed properly by modifying the buffer model.  This is
slightly undesirable as it offers different buffer semantics than with
buffered reads, and will require the BPF code to know that sometimes it isn't
allow to write packet data to the store buffer even if there is room (as a
larger previous packet may have caused ownership to be assigned to userspace,
meaning that the kernel can no longer write to the buffer).
