Submission¶
Introduction¶
Submitting an I/O request is a sequence that generally looks like this:
/* Get an SQE */
struct io_uring_sqe *sqe = io_uring_get_sqe(ring);
/* Setup a readv operation */
io_uring_prep_readv(sqe, file_fd, fi->iovecs, blocks, 0);
/* Set user data */
io_uring_sqe_set_data(sqe, fi);
/* Finally, submit the request */
io_uring_submit(ring);
This snippet is from the example cat with liburing.
You call io_ring_get_sqe()
to get an submission queue entry or SQE, use one of the submission helpers for the type of I/O you’re trying to get done like io_uring_prep_readv()
or io_uring_prep_accept()
, call io_uring_set_sqe_data()
to get a pointer to a a data structure that uniquely identifies this request (you get this same user data back on the completion side), and finally call io_uring_submit()
to submit the request.
You can also set up polling as to avoid calling the io_uring_submit()
system call. TODO: link polling example here.
-
struct io_uring_sqe *
io_uring_get_sqe
(struct io_uring *ring)¶ This function returns a submission queue entry that can be used to submit an I/O operation. You can call this function multiple times to queue up I/O requests before calling
io_uring_submit()
to tell the kernel to process your queued requests.Parameters
ring:
io_uring
structure as set up byio_uring_queue_init()
.
Return value: a pointer to
io_uring_sqe
that represents a vacant SQE. NULL is returned if the submission queue is full.Please see the submission introduction code snippet for example usage.
-
void
io_uring_sqe_set_data
(struct io_uring_sqe *sqe, void *data)¶ This is an inline convenience function that sets the user data field of the SQE instance passed in.
Parameters
sqe: the SQE instance for which you want to set the user data.
data: a pointer to the user data.
-
void
io_uring_sqe_set_flags
(struct io_uring_sqe *sqe, unsigned flags)¶ This is an inline convenience function that sets the flags field of the SQE instance passed in.
Parameters
sqe: the SQE instance for which you want to set the user data.
flags: the flags you want to set. This is a bitmap field. Please see the io_uring_enter reference page for various SQE flags and what they mean.
-
int
io_uring_submit
(struct io_uring *ring)¶ Submits the SQEs acquired via
io_uring_get_sqe()
to the kernel. You can call this once after you have calledio_uring_get_sqe()
multiple times to set up multiple I/O requests.Parameters
ring:
io_uring
structure as set up byio_uring_queue_init()
.
Return value: returns the number of SQEs submitted.
-
int
io_uring_submit_and_wait
(struct io_uring *ring, unsigned wait_nr)¶ Same as
io_uring_submit()
, but takes an additional parameterwait_nr
that lets you specify how many completions to wait for. This call will block untilwait_nr
submission requests are processed by the kernel and their details placed in the completion queue.Parameters
wait_nr: The number of completions to wait for.
Return value: returns the number of SQEs submitted.
Submission helpers¶
Submission helpers are convenience functions that make it easy to specify the I/O operation you want to request via an SQE. There is one function per supported I/O type.
Please see the submission introduction code snippet for example usage of the io_uring_prep_readv()
function.
-
void
io_uring_prep_nop
(struct io_uring_sqe *sqe)¶ This function sets up the submission queue entry pointed to by
sqe
with anIORING_OP_NOP
operation, which is a no-op. This kind of operation exists for testing purposes and serves to test the speed and efficiency of theio_uring
interface.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.
-
void
io_uring_prep_read
(struct io_uring_sqe *sqe, int fd, void *buf, unsigned nbytes, off_t offset)¶ This function sets up the submission queue entry pointed to by
sqe
with a read operation.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor to read from.
buf: the buffer to copy the read data into.
nbytes: number of bytes to read.
offset: absolute offset of the file to read from.
-
void
io_uring_prep_write
(struct io_uring_sqe *sqe, int fd, const void *buf, unsigned nbytes, off_t offset)¶ This function sets up the submission queue entry pointed to by
sqe
with a write operation.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor to write to.
buf: the buffer to write data from.
nbytes: number of bytes to write.
offset: absolute offset of the file to write to.
-
void
io_uring_prep_readv
(struct io_uring_sqe *sqe, int fd, const struct iovec *iovecs, unsigned nr_vecs, off_t offset)¶ This function sets up the submission queue entry pointed to by
sqe
with a “scatter” read operation, much like readv(2) or preadv(2), which are part of Linux’s scatter/gather I/O family of system calls.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor to read from.
iovecs: pointer to an array of
iovec
structuresnr_vecs: number of
iovec
instances in the array pointed to by theiovecs
argument.offset: absolute offset of the file to read from.
See also
cat utility example with liburing which uses this function
-
void
io_uring_prep_read_fixed
(struct io_uring_sqe *sqe, int fd, void *buf, unsigned nbytes, off_t offset, int buf_index)¶ Much like
io_uring_prep_read()
, this function sets up the submission queue entry pointed to bysqe
with a read operation. The main difference is that this function is designed to work with fixed set of pre-allocated buffers registered viaio_uring_register()
.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor to read from.
buf: the buffer to copy the read data into.
nbytes: number of bytes to read.
offset: absolute offset of the file to read from.
buf_index: index of the set of pre-allocated buffers to use.
See also
-
void
io_uring_prep_writev
(struct io_uring_sqe *sqe, int fd, const struct iovec *iovecs, unsigned nr_vecs, off_t offset)¶ This function sets up the submission queue entry pointed to by
sqe
with a “gather” write operation, much like writev(2) or pwritev(2), which are part of Linux’s scatter/gather I/O family of system calls.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor to write to.
iovecs: pointer to an array of
iovec
structuresnr_vecs: number of
iovec
instances in the array pointed to by theiovecs
argument.offset: absolute offset of the file to write to.
See also
cp utility example with liburing which uses this function
-
void
io_uring_prep_write_fixed
(struct io_uring_sqe *sqe, int fd, const void *buf, unsigned nbytes, off_t offset, int buf_index)¶ TODO: fixed buffers example to be added.
Much like
io_uring_prep_read()
, this function sets up the submission queue entry pointed to bysqe
with a read operation. The main difference is that this function is designed to work with fixed set of pre-allocated buffers registered viaio_uring_register()
.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor to read from.
buf: the buffer to copy the read data into.
nbytes: number of bytes to read.
offset: absolute offset of the file to read from.
buf_index: index of the set of pre-allocated buffers to use.
See also
-
void
io_uring_prep_fsync
(struct io_uring_sqe *sqe, int fd, unsigned fsync_flags)¶ This function sets up the submission queue entry pointed to by
sqe
with an fsync(2) like operation. This causes any “dirty” buffers of the file’s data and metadata in the disk cache to be synced to disk.Note
It is important to note that queuing up this operation does not guarantee that any write operations that are queued up before this operation will have the data they write to the file synced to disk. This is because operations from the submission queue could be picked up and executed by the kernel in parallel. This sync operation could finish well before other write operations that were queued in front of it. What effect it does have is any of the file’s already existing “dirty” buffers–at the time at which this operation is executed–are synced to disk.
Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor to read from.
fsync_flags: This can either be 0 or
IORING_FSYNC_DATASYNC
, which makes it act like fdatasync(2).
See also
-
void
io_uring_prep_close
(struct io_uring_sqe *sqe, int fd)¶ This function sets up the submission queue entry pointed to by
sqe
with an close(2) like operation. This causes the file descriptor pointed to byfd
to be closed.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor to read from.
See also
-
void
io_uring_prep_openat
(struct io_uring_sqe *sqe, int dfd, const char *path, int flags, mode_t mode)¶ This function sets up the submission queue entry pointed to by
sqe
with an openat(2) like operation. This causes the file pointed to bypath
to be opened in a path relative to the directory represented bydfd
directory file descriptor.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.dfd: the directory file descriptor representing a directory relative to which the file is to be opened.
path: path name of the file to be opened.
flags: These are access mode flags. The same as in open(2).
mode: File permission bits applied when creating a new file. The same as in open(2).
-
void
io_uring_prep_openat2
(struct io_uring_sqe *sqe, int dfd, const char *path, struct open_how *how)¶ This function sets up the submission queue entry pointed to by
sqe
with an openat2(2) like operation. This causes the file pointed to bypath
to be opened in a path relative to the directory represented bydfd
directory file descriptor.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.dfd: the directory file descriptor representing a directory relative to which the file is to be opened.
path: path name of the file to be opened.
how: a pointer to a
open_how
structure which let you control how exactly you want to open the file. See openat2(2) for more details.
See also
-
void
io_uring_prep_fallocate
(struct io_uring_sqe *sqe, int fd, int mode, off_t offset, off_t len)¶ This function sets up the submission queue entry pointed to by
sqe
with an fallocate(2) like operation. The fallocate(2) system call is used to allocate, deallocate, collapse, zero or increase file space for the file represented by the file descriptorfd
. See fallocate(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor of the file to conduct the operation on.
mode: described the operation to conduct on the file. See fallocate(2) for details.
offset: The offset of the file at which to begin operation.
len: operation length.
See also
-
void
io_uring_prep_statx
(struct io_uring_sqe *sqe, int dfd, const char *path, int flags, unsigned mask, struct statx *statxbuf)¶ This function sets up the submission queue entry pointed to by
sqe
with an statx(2) like operation. The statx(2) system call gets meta information on the file pointed to bypath
which is filled up into astatx
structure pointed to bystatxbuf
. See statx(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.dfd: depending on the value of this and
path
, the file pointed to bypath
can be interpreted as an absolute, relative to process or relative to the directory referred to by a directory descriptor indfd
along with other types of interpretations are possible. See statx(2) for details.path: file path. Interpreted in combination with value in
dfd
. See statx(2) for details.flags: this is used to influence how the path name is looked up. It can also influence what sort of synchronization the kernel will do when querying a file on a remote filesystem. See statx(2) for details.
See also
-
void
io_uring_prep_fadvise
(struct io_uring_sqe *sqe, int fd, off_t offset, off_t len, int advice)¶ This function sets up the submission queue entry pointed to by
sqe
with an posix_fadvise(2) like operation. The posix_fadvise(2) system call lets the application advise the operating system how it plans to access data in the file represented by the file descriptorfd
–sequentially, randomly or otherwise. This is with the intention to better the performance of the application. See posix_fdavise(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the file descriptor of the file to give advice about.
offset: the offset of the file starting with which the advice applies.
len: the length until which the advice applies.
See also
-
void
io_uring_prep_madvise
(struct io_uring_sqe *sqe, void *addr, off_t length, int advice)¶ This function sets up the submission queue entry pointed to by
sqe
with an madvise(2) like operation. The madvise(2) system call lets the application advise the operating system on memory pointed to byaddr
up tolength
bytes. The advice could be on how the application plans to access that said range of memory (sequentially, randomly or otherwise) or if the operating system should not share it when the process forks children, among other things. This is with the intention to better the performance of the application. See mdavise(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.addr: starting address of the memory range to which the advice applies.
len: the length until which the advice applies.
See also
-
void
io_uring_prep_splice
(struct io_uring_sqe *sqe, int fd_in, loff_t off_in, int fd_out, loff_t off_out, unsigned int nbytes, unsigned int splice_flags)¶ This function sets up the submission queue entry pointed to by
sqe
with a splice(2) like operation. The splice(2) system call copies data between two file descriptors (fd_in
andfd_out
) without copying data between kernel address space and user address space. However, one of the file descriptors must represent a pipe. See splice(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd_in: the file descriptor to read from.
off_in: Has to be NULL if
fd_in
refers to a pipe. Iffd_in
is not a pipe and:off_in
is NULL, then data fromfd_in
is read from its file offset and the file offset is adjusted accordingly.off_in
is non-NULL, then thenoff_in
must point to a buffer which specifies the starting offset from which bytes will be read fromfd_in
. In this case, the file offset offd_in
is left unchanged.
fd_out and off_out: Analogous statement such as those for
fd_in
andoff_in
apply for these arguments.nbytes: number of bytes to copy
flags: a bit mask that influences the copy. See splice(2) for details.
See also
-
void
io_uring_prep_recvmsg
(struct io_uring_sqe *sqe, int fd, struct msghdr *msg, unsigned flags)¶ This function sets up the submission queue entry pointed to by
sqe
with a recvmsg(2) like operation. The recvmsg(2) system call is used to read data from a socket. It uses amsghdr
structure to reduce the number of arguments it takes. This call works with both connection-oriented (like TCP) and connectionless (like UDP) sockets. See recvmsg(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the socket to read from.
msg: pointer to an
msghdr
structure.flags: a bit mask that influences the read. See recvmsg(2) for details.
See also
-
void
io_uring_prep_sendmsg
(struct io_uring_sqe *sqe, int fd, const struct msghdr *msg, unsigned flags)¶ The same as
io_uring_prep_recvmsg()
, but for writing to a socket.See also
-
void
io_uring_prep_recv
(struct io_uring_sqe *sqe, int sockfd, void *buf, size_t len, int flags)¶ This function sets up the submission queue entry pointed to by
sqe
with a recv(2) like operation. The recv(2) system call is used to read data from a socket. It uses amsghdr
structure to reduce the number of arguments it takes. This call works with both connection-oriented (like TCP) and connectionless (like UDP) sockets. Without theflags
argument, it is the exact equivalent of read(2) except one small difference while dealing with zero-length datagrams. See recv(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the socket to read from.
buf: pointer to a buffer to read data into.
len: count of bytes to read.
flags: a bit mask that influences the read. See recv(2) for details.
See also
-
void
io_uring_prep_send
(struct io_uring_sqe *sqe, int sockfd, const void *buf, size_t len, int flags)¶ The same as
io_uring_prep_recv()
, but for writing to a socket.See also
-
void
io_uring_prep_accept
(struct io_uring_sqe *sqe, int fd, struct sockaddr *addr, socklen_t *addrlen, int flags)¶ This function sets up the submission queue entry pointed to by
sqe
with a accept4(2) like operation. The accept4(2) system call is used with connection-oriented socket types (SOCK_STREAM, SOCK_SEQPACKET). It extracts the first connection request on the queue of pending connections for the listening socketfd
. With theflags
argument set to 0, accept4(2) is the exact equivalent of accept(2). See accept4(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the listening socket.
addr: pointer to a
sockaddr
structure. This will be filled with the address of the peer.addrlen: pointer to
socklen_t
. A value-result argument that must be filled in with the size of thesockaddr
structure for the call and which will be set to the size of the peer address.flags: a bit mask that influences the system call. See accept4(2) for details.
See also
-
void
io_uring_prep_connect
(struct io_uring_sqe *sqe, int fd, struct sockaddr *addr, socklen_t addrlen)¶ This function sets up the submission queue entry pointed to by
sqe
with a connect(2) like operation. The connect(2) system call is used to connect the socket referred to infd
to the address specified inaddr
. See connect(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: the listening socket.
addr: pointer to a
sockaddr
structure containing the address of the peer.addrlen: pointer to
socklen_t
. A value-result argument that must be filled in with the size of thesockaddr
structure for the call and which will be set to the size of the peer address.
See also
-
void
io_uring_prep_epoll_ctl
(struct io_uring_sqe *sqe, int epfd, int fd, int op, struct epoll_event *ev)¶ This function sets up the submission queue entry pointed to by
sqe
with a epoll_ctl(2) like operation. The epoll_ctl(2) system call is used to add or remove modify entries in the interest list of the epoll(7) instance referred byepfd
. The add, remove or modify operation specified byop
is applied on the file descriptorfd
. See epoll_ctl(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.epfd: file descriptor representing and
epoll
instance.fd: the file descriptor to add, delete or modify.
op: the
epoll
operation to perform (EPOLL_CTL_ADD
,EPOLL_CTL_DEL
orEPOLL_CTL_MOD
).ev: pointer to an
epoll_event
structure.
See also
-
void
io_uring_prep_poll_add
(struct io_uring_sqe *sqe, int fd, short poll_mask)¶ This function sets up the submission queue entry pointed to by
sqe
with a poll(2) like operation to add a file descriptor topoll
’s interest list and to listen to events specified inpoll_mask
. Unlikepoll
orepoll
withoutEPOLLONESHOT
, this interface always works in one-shot mode. That is, once the poll operation is completed, it will have to be resubmitted. See poll(2) for more details.Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.fd: file descriptor to poll for events.
poll_mask: bit mask containg events to listen for.
See also
-
void
io_uring_prep_poll_remove
(struct io_uring_sqe *sqe, void *user_data)¶ Remove from a request from monitoring by poll(2).
Parameters
sqe: pointer to an SQE as generally returned by
io_uring_get_sqe()
.user_data: pointer to user data. The request associated with this user data is removed from further monitoring.
See also