head 1.1; access; symbols libdrm-1_0_4:1.1; locks; strict; comment @# @; 1.1 date 2003.06.25.14.40.08; author jrfonseca; state Exp; branches; next ; desc @@ 1.1 log @Switch the source format to DocBook XML. @ text @ Implementation This section will try to give information on the implementation details of a DRI driver. The issues presented here follow loosely the same order by which information flows when a application is using a DRI driver, i.e., it mimics the graphics pipeline. The DRI driver initialization process? This is a description of the DRI driver initialization process. Extracted and edited from a series of emails between Ian Romanick and Brian Paul The whole process begins when an application calls glXCreateContext ( xc/lib/GL/glx/glxcmds.c ). glXCreateContext is just a stub that call CreateContext. The real work begins when CreateContext calls __glXInitialize ( xc/lib/GL/glx/glxext.c ). The driver specific initialization process starts with __driCreateScreen. Once the driver is loaded (via dlopen), dlsym is used to get a pointer to this function. The function pointer for each driver is stored in the createScreen array in the __DRIdisplay structure. This initialization is done in driCreateDisplay ( xc/lib/GL/dri/dri_glx.c ), which is called by __glXInitialize. Note that __driCreateScreen really is the bootstrap of a DRI driver. It's the only that's not really true- there's also the __driRegisterExtensions function that libGL uses to implement glXGetProcAddress. That's another long story. function in a DRI driver that libGL directly knows about. All the other DRI functions are accessed via the __DRIdisplayRec, __DRIscreenRec, __DRIcontextRec and __DRIdrawableRec structs defined in xc/lib/GL/glx/glxclient.h ). Those structures are pretty well documented in the file. After performing the __glXInitialize step, CreateContext calls the createContext function for the requested screen. Here the driver creates two data structures. The first, GLcontext (extras/Mesa/src/mtypes.h), contains all of the device independent state, device dependent constants (i.e., texture size limits, light limits, etc.), and device dependent function tables. The driver also allocates a structure that contains all of the device dependent state. The GLcontext structure links to the device dependent structure via the DriverCtx pointer. The device dependent structure also has a pointer back to the GLcontext structure. The device dependent structure is where the driver will store context specific hardware state (register settings, etc.) for when context (in terms of OpenGL / X context) switches occur. This structure is analogous to the buffers where the OS stores CPU state where a program context switch occurs. The texture images really are stored within Mesa's data structures. Mesa supports about a dozen texture formats which happen to satisfy what all the DRI drivers need. So, the texture format/ packing is dependent on the hardware, but Mesa understands all the common formats. See Mesa/src/texformat.h. Gareth and Brian spent a lot of time on that. createScreen (i.e., the driver specific initialization function) is called for each screen from AllocAndFetchScreenConfigs ( xc/lib/GL/glx/glxext.c ). This is also called from __glXInitialize. For all of the existing drivers, the __driCreateScreen function is just a wrapper that calls __driUtilCreateScreen ( xc/lib/GL/dri/dri_util.c ) with a pointer to the driver's API function table (of type __DriverAPIRec). This creates a __DRIscreenPrivate structure for the display and fills it in (mostly) with the supplied parameters (i.e., screen number, display information, etc.). It also opens and initialized the connection to DRM. This includes opening the DRM device, mapping the frame buffer (note: the DRM documentation says that the function used for this is called drmAddMap, but it is actually called drmMap), and mapping the SAREA. The final step is to call the driver initialization function for the driver (from the InitDriver field in the __DriverAPIRec (DriverAPI field of the __DRIscreenPrivate). The InitDriver function does (at least in the Radeon and i810 drivers) two broad things. It first verifies the version of the services (XFree86, DDX, and DRM) that it will use. The driver then creates an internal representation of the screen and stores it (the pointer to the structure) in the private field of the __DRIscreenPrivate structure. The driver-private data may include things such as mappings of MMIO registers, mappings of display and texture memory, information about the layout if video memory, chipset version specific data (feature availability for the specific chip revision, etc.), and other similar data. This is the handle that identifies the specific graphics card to the driver (in case there is more than one card in the system that will use the same driver). After performing the __glXInitialize step, CreateContext calls the createContext function for the requested screen. This is where it gets pretty complicated. I have only looked at the Radeon driver. radeonCreateContext ( xc/lib/GL/mesa/src/drv/radeon/radeon_context.c ) allocates a GLcontext structure (actually struct __GLcontextRec from extras/Mesa/src/mtypes.h). Here it fills in function tables for virtually every OpenGL call. Additionally, the __GLcontextRec has pointers to buffers where the driver will store context specific hardware state (textures, register settings, etc.) for when context (in terms of OpenGL / X context) switches occur. The __GLcontextRec (i.e. GLcontext in Mesa) doesn't have any buffers of hardware-specific data (except texture image data if you want to be picky). All Radeon-specific, per-context data should be hanging off of the struct radeon_context. All the DRI drivers define a hardware-specific context structure (such as structure radeon_context, typedef'd to be radeonContextRec, or structure mga_context_t typedef'd to be mgaContext). radeonContextRec has a pointer back to the Mesa __GLcontextRec and Mesa's __GLcontextRec->DriverCtx pointer points back to the radeonContextRec. If we were writing all this in C++ (don't laugh) we'd treat Mesa's __GLcontextRec as a base class and create driver-specific derived classes from it. Inheritance like this is actually pretty common in the DRI code, even though it's sometimes hard to spot. These buffers are analogous to the buffers where the OS stores CPU state where a program context switch occurs. Note that we don't do any fancy hardware context switching in our drivers. When we make-current a new context, we basically update all the hardware state with that new context's values. When each of the function tables is initialized (see radeonInitSpanFuncs for an example), an internal Mesa function is called. This function (e.g., _swrast_GetDeviceDriverReference) both allocates the buffer and fills in the function pointers with the software fallbacks. If a driver were to just call these allocation functions and not replace any of the function pointers, it would be the same as the software renderer. The next part seems to start when the createDrawable function in the __DRIscreenRec is called, but I don't see where this happens. createDrawable should be called via glXMakeCurrent since that's the first time we're given an X drawable handle. Somewhere during glXMakeCurrent we use a DRI hash lookup to translate the X Drawable handle into an pointer to a __DRIdrawable. If we get a NULL pointer that means we've never seen that handle before and now have to allocate the __DRIdrawable and initialize it (and put it in the hash table). Mesa internals How does one writes a new Mesa driver? There are two basic aspects to writing a new driver. First, define the public OpenGL / window system API. In the case of GLX, these are the glx*() functions. For OSMesa these are the OSMesa*() functions seen in include/GL/osmesa.h. You'll basically need functions for specifying frame buffer formats (bits per rgb, bits for Z, bits for stencil, etc.), functions for creating/destroying contexts, binding contexts to windows. etc. Second, implement the internal functions needed by the "DD" interface. Look at the osmesa.c file and grep for ctx->Driver. = . This is where the driver hooks itselft into the core of Mesa. In many cases we hook in fall-back functions (like _swrast_DrawPixels). This isn't simple (or even as straight-forward as is used to be) but the system's designed for efficiently, flexibility and modularity. If the device driver interface were made for simplicity above all else there would probably only be two driver functions: ReadPixel() and WritePixel(). The OSMesa driver is pretty simple. The only complexity comes from supporting all the different frame buffer formats like RGB, RGBA, BGRA, ABGR, etc. I think the Windows driver is in pretty good shape too. The XMesa driver (upon which Mesa's GLX is layered) is rather large because of lots of frame buffer formats and optimized point/line/triangle rendering functions. Old Mesa 3.4.x Implementation Notes This document is an overview of the internal structure of Mesa and is meant for those who are interested in modifying or enhancing Mesa, or just curious. Based on the original Mesa Implementation Notes by Brian Paul. Library State and Contexts OpenGL uses the notion of a state machine. Mesa encapsulates the state in one large structure: gl_context, as seen in types.h The gl_context structure actually contains a number of sub structures which exactly correspond to OpenGL's attribute groups. This organization made glPushAttrib and glPopAttrib trivial to implement and proved to be a good way of organizing the state variables. Vertex buffer The vertices between glBegin and glEnd are accumulated in the vertex buffer (see vb.h and vb.c ). When either the vertex buffer becomes filled or a state change outside the glBegin/glEnd is made, we must flush the buffer. That is, we apply the vertex transformations, compute lighting, fog, texture coordinates etc. Then, we can render the vertices as points, lines or polygons by calling the gl_render_vb() function in render.c . When we're outside of a glBegin/glEnd pair the information in this structure is retained pending either of the flushing events described above. Originally, Mesa didn't accumulate vertices in this way. Instead, glVertex transformed and lit then buffered each vertex as it was received. When enough vertices to draw the primitive (1 for points, 2 for lines, >2 for polygons) were accumulated the primitive was drawn and the buffer cleared. The new approach of buffering many vertices and then transforming, lighting and clip testing is faster because it's done in a vectorized manner. See gl_transform_points in xform.c for an example. Also, vertices shared between primitives (i.e. GL_LINE_STRIP) are only transformed once. The only complication is clipping. If no vertices in the vertex buffer have their clip flag set, the rasterization functions can be applied directly to the vertex buffer. Otherwise, a clipping function is called before rasterizing each primitive. If clipping introduces new vertices they will be stored at the end of the vertex buffer. For best performance Mesa clients should try to maximize the number of vertices between glBegin/glEnd pairs and used connected primitives when possible. Rasterization The point, line and polygon rasterizers are called via the PointsFunc, LineFunc, and TriangleFunc function pointers in the dd_function_table driver function pointer table. Whenever the library state is changed in a significant way, the NewState context flag is raised. When glBegin is called NewState is checked. If the flag is set we re-evaluate the state to determine what rasterizers to use. Special purpose rasterizers are selected according to the status of certain state variables such as flat vs smooth shading, depth-buffered vs. non-depth- buffered, etc. The gl_set_point|line|polygon_function functions do this analysis. They in turn query the device driver for accelerated rasterizers. More on that later. In general, typical states (depth-buffered & smooth-shading) result in optimized rasterizers being selected. Non-typical states (stenciling, blending, stippling) result in slower, general purpose rasterizers being selected. Pixel (fragment) buffer The general purpose point, line and bitmap rasterizers accumulate fragments (pixels plus color, depth, texture coordinates) in the PB (Pixel Buffer) structure seen in . pb.h and pb.c . When the pixel buffer is full or glEnd is called the pixel buffer is flushed. This includes clipping the fragments against the window, depth testing, stenciling, blending, stippling, etc. Finally, the pixel buffer's pixels are drawn to the display buffer by calling one of device driver functions. The goal is to maximize the number of pixels processed inside loops and to minimize the number of function calls. Pixel spans The polygon, glDrawPixels, and glCopyPixels functions generate horizontal runs of pixels called spans. Spans are processed in span.c . Processing includes window clipping, depth testing, stenciling, texturing, etc. After processing the span is written to the frame buffer by calling a device driver function. Device Driver There are three Mesa data types which are meant to be used by device drivers: GLcontext this contains the Mesa rendering state GLvisual this describes the color buffer (rgb vs. ci), whether or not there's a depth buffer, stencil buffer, etc. GLframebuffer contains pointers to the depth buffer, stencil buffer, accum buffer and alpha buffers. These types should be encapsulated by corresponding device driver data types. See xmesa.h and xmesaP.h for an example. In OOP terms, GLcontext, GLvisual, and GLframebuffer are base classes which the device driver must derive from. The structure dd_function_table seen in dd.h , defines the device driver functions. By using a table of pointers, the device driver can be changed dynamically at runtime. For example, the X/Mesa and OS/Mesa (Off-Screen rendering) device drivers can co-exist in one library and be selected at runtime. In addition to the device driver table functions, each Mesa driver has its own set of unique interface functions. For example, the X/Mesa driver has the XMesaCreateContext, XMesaBindWindow, and XMesaSwapBuffers functions while the Windows/Mesa interface has WMesaCreateContext, WMesaPaletteChange and WMesaSwapBuffers. New Mesa drivers need to both implement the dd_function_table functions and define a set of unique window system or operating system-specific interface functions. The device driver functions can roughly be divided into four groups: pixel span functions which read or write horizontal runs of RGB or color-index pixels. Each function takes an array of mask flags which indicate whether or not to plot each pixel in the span. pixel array functions which are very similar to the pixel span functions except that they're used to read or write arrays of pixels at random locations rather than horizontal runs. miscellaneous functions for window clearing, setting the current drawing color, enabling/disabling dithering, returning the current frame buffer size, specifying the window clear color, synchronization, etc. Most of these functions directly correspond to higher level OpenGL functions. if your graphics hardware or operating system provides accelerated point, line and polygon rendering operations, they can be utilized through the PointsFunc, LineFunc, and TriangleFunc functions. Mesa will call these functions to ask the device driver for accelerated functions through the UpdateState. If the device driver can provide an appropriate renderer, given the current Mesa state, then a pointer to that function can be returned. Otherwise the PointsFunc, LineFunc, and TriangleFunc functions pointers can just be set to NULL. Even if hardware accelerated renderers aren't available, the device driver may implement tuned, special purpose code for common kinds of points, lines or polygons. The X/Mesa device driver does this for a number of lines and polygons. See the xmesa3.c file. Overall Organization The overall relation of the core Mesa library, X device driver/interface, toolkits and application programs is shown in this diagram: +-----------------------------------------------------------+ | | | Application Programs | | | | +- glu.h -+------ glut.h -------+ | | | | | | | | GLU | GLUT | | | | | toolkits | | | | | | | +---------- gl.h ------------+-------- glx.h ----+ | | | | | | Mesa core | GLX functions | | | | | | +---------- dd.h ------------+------------- xmesa.h --------+ | | | XMesa* and device driver functions | | | +-----------------------------------------------------------+ | Hardware/OS/Window System | +-----------------------------------------------------------+ Mesa 4.x <footnote> <para> The big changes in Mesa were made between Mesa 3.4.x and Mesa 3.5. That's when Keith re-modularized the source code into separate modules for T&L, s/w rasterization, etc. </para> </footnote> Implementation Notes This document is an overview of the internal structure of Mesa and is meant for those who are interested in modifying or enhancing Mesa, or just curious. Based on the original Mesa Implementation Notes and corrections by Brian Paul. Library State and Contexts OpenGL uses the notion of a state machine. Almost all OpenGL state is contained in one large structure: __GLcontextRec (typedef'd to GLcontext), as seen in mtypes.h . This is the central context data structure for Mesa. The __GLcontextRec structure actually contains a number of sub structures which exactly correspond to OpenGL's attribute groups. This organization made glPushAttrib and glPopAttrib trivial to implement and proved to be a good way of organizing the state variables. Vertex buffer The immediate represents everything that can take place between glBegin and glEnd being able to represent multiple glBegin/glEnd pairs. It can be used to losslessly encode this information in display lists. See t_context.h and t_imm_api.c . When either the vertex buffer becomes filled or a state change outside the glBegin/glEnd is made, we must flush the buffer. That is, we apply the vertex transformations, compute lighting, fog, texture coordinates etc. The various vertex transformations are implemented as software pipeline stages by the tnl/t_pipeline.c and tnl/t_vb_*.c files. When we're outside of a glBegin/glEnd pair the information in this structure is retained pending either of the flushing events described above. Originally, Mesa didn't accumulate vertices in this way. Instead, glVertex transformed and lit then buffered each vertex as it was received. When enough vertices to draw the primitive (1 for points, 2 for lines, >2 for polygons) were accumulated the primitive was drawn and the buffer cleared. The new approach of buffering many vertices and then transforming, lighting and clip testing is faster because it's done in a vectorized manner. See gl_transform_points in math/m_xform.c for an example. For best performance Mesa clients should try to maximize the number of vertices between glBegin/glEnd pairs and used connected primitives when possible. Rasterization The point, line and polygon rasterizers are called via the Point, Line, and Triangle function pointers in the SWcontext structure in swrast/s_context.h . Whenever the library state is changed in a significant way, the NewState context flag is raised. When glBegin is called NewState is checked. If the flag is set we re-evaluate the state to determine what rasterizers to use. Special purpose rasterizers are selected according to the status of certain state variables such as flat vs smooth shading, depth-buffered vs. non-depth- buffered, etc. The _swrast_choose_* functions do this analysis. It's up to the device driver to choose optimized or accelerated rasterization functions to replace those in the general software rasterizer. In general, typical states (depth-buffered & smooth-shading) result in optimized rasterizers being selected. Non-typical states (stenciling, blending, stippling) result in slower, general purpose rasterizers being selected. Pixel spans Point, Line, Triangle, glDrawPixel, glCopyPixels and glBitmap all use the sw_span structure and functions in swrast/s_span.c generate horizontal runs of pixels called spans. Processing includes window clipping, depth testing, stenciling, texturing, etc. After processing the span is written to the frame buffer by calling a device driver function. The goal is to maximize the number of pixel processed inside loops and to minimize the number of function calls. Pixel buffers are no longer present in the latest Mesa code (4.1). All fragment (pixels plus color, depth, texture coordinates) processing is done via the span functions in swrast/s_span.c. Device Driver There are three Mesa data types which are meant to be used by device drivers: GLcontext this contains the Mesa rendering state GLvisual this describes the color buffer (rgb vs. ci), whether or not there's a depth buffer, stencil buffer, etc. GLframebuffer contains pointers to the depth buffer, stencil buffer, accum buffer and alpha buffers. These types should be encapsulated by corresponding device driver data types. See xmesa.h and xmesaP.h for an example. In OOP terms, GLcontext, GLvisual, and GLframebuffer are base classes which the device driver must derive from. The structure dd_function_table seen in dd.h , defines the device driver functions Many of the functions which used to be in the dd_function_table are now moved into the tnl or swrast modules. . By using a table of pointers, the device driver can be changed dynamically at runtime. For example, the X/Mesa and OS/Mesa (Off-Screen rendering) device drivers can co-exist in one library and be selected at runtime. In addition to the device driver table functions, each Mesa driver has its own set of unique interface functions. For example, the X/Mesa driver has the XMesaCreateContext, XMesaBindWindow, and XMesaSwapBuffers functions while the Windows/Mesa interface has WMesaCreateContext, WMesaPaletteChange and WMesaSwapBuffers. New Mesa drivers need to both implement the dd_function_table functions and define a set of unique window system or operating system-specific interface functions. The device driver functions can roughly be divided into four groups: pixel span functions which read or write horizontal runs of RGB or color-index pixels. Each function takes an array of mask flags which indicate whether or not to plot each pixel in the span. miscellaneous functions for window clearing, setting the current drawing color, enabling/disabling dithering, returning the current frame buffer size, specifying the window clear color, synchronization, etc. Most of these functions directly correspond to higher level OpenGL functions. if your graphics hardware or operating system provides accelerated point, line and polygon rendering operations, they can be utilized through the PointsFunc, LineFunc, and TriangleFunc functions. Mesa will call these functions to ask the device driver for accelerated functions through the UpdateState. If the device driver can provide an appropriate renderer, given the current Mesa state, then a pointer to that function can be returned. Otherwise the PointsFunc, LineFunc, and TriangleFunc functions pointers can just be set to NULL. Even if hardware accelerated renderers aren't available, the device driver may implement tuned, special purpose code for common kinds of points, lines or polygons. The X/Mesa device driver does this for a number of lines and polygons. See the X/xm_line.c and X/xm_tri.c and files. Overall Organization The overall relation of the core Mesa library, X device driver/interface, toolkits and application programs is shown in this diagram: +-----------------------------------------------------------+ | | | Application Programs | | | | +- glu.h -+------ glut.h -------+ | | | | | | | | GLU | GLUT | | | | | toolkits | | | | | | | +---------- gl.h ------------+-------- glx.h ----+ | | | | | | Mesa core | GLX functions | | | | | | +---------- dd.h ------------+------------- xmesa.h --------+ | | | XMesa* and device driver functions | | | +-----------------------------------------------------------+ | Hardware/OS/Window System | +-----------------------------------------------------------+ Mesa's pipeline The work starts on t_pipeline.c were a driver configurable pipeline is run in response to either the vertex buffer filling up, or a statechange. The pipeline stages operate on context variables (suchs as vertices coord, colors, normals, textures coords, etc), applying the necessary operations in a OpenGL pipeline (such as coord transformation, lighting, etc.). The last stage - rendering -, calls *BuildVertices in *_vb.c which applies the viewport transformation, perpective divide, data type convertion and packs the vertex data in the context (in the arrays tnl->vb->*Ptr->data) into a driver dependent buffer with just the information relevent for the current OpenGL state (e.g., with/without texture, fog, etc). The template t_dd_vbtmp.h does this into a D3D alike vertex structure format. For instance, if we needed to premultiply the textures coordinates, as it is done in the tdfx and mach64 driver, we will need to make a costumized version of t_dd_vbtmp.h for that effect, or change it and supply a configuration parameter to control that behavior. This buffer is then used to render the primitives in *_tris.c. This vertex data is intended to be copied almost verbatim into DMA buffers, with a header command, in most chips with DMA. But in the case of Mach64, were the commands are interleaved with each of the vertex data elements, it will be necessary to use a different structure of *Vertex to do the same, and probably to come up with a rather different implementation of t_dd_vbtmp.h as well. Indeed, if the chip expects something quite different to the d3d vertices, one will certainly want to look at this. In the meantime, it may be simplest to go with a normal-looking *_vb.c and do some extra stuff in the triangle/line/point functions. The ffb and glint drivers are a bit like this, I think. All this mechanism is controlled with function pointers in the context which are rechosen whenever the OpenGL state changes enough. These functions pointers can also be overwritten with those in the sw_* modules to fallback to software rendering. How about the main X drawing surface? Are 2 extra "window sized" buffers allocated for primary and secondary buffers in a page-flipping configuration? Right now, we don't do page flipping at all. Everything is a blit from back to front. The biggest problem with page flipping is detecting when you're in full screen mode, since OpenGL doesn't really have a concept of full screen mode. We want a solution that works for existing games. So we've been designing a solution for it. It should get implemented fairly soon since we need it for antialiasing on the V5. In the current implementation the X front buffer is the 3D front buffer. When we do page flipping we'll continue to do the same thing. Since you have an X window that covers the screen it is safe for us to use the X surface's memory. Then we'll do page flipping. The only issue will be falling back to blitting if the window is ever moved from covering the whole screen. Clipping This section gives some notions about the several concepts associated to clipping. Contributed by Leif Delgass. Scissors The scissors are register settings that determine a hardware clipping rect in window coords. Any part of a primitive or other drawing operation that extends beyond the scissors is not drawn. The scissors can be set through GL commands. This has nothing to do with perspective clipping in the pipeline, just the final window coordinates. Cliprects Cliprects are used to determine what parts of the context/window should be redrawn to handle overlapping windows. The more overlapping windows, the more cliprects you have. These need to be passed to the drm. It does a clear or swap for each cliprect. Again these are for 2D clipping after rasterization and not part of the pipeline. Things get a bit complicated by the fact that there can be separate clip rects for the front and back buffers. The cliprects are stored in device-independent structures, hence the code is abstracted out of the individual drivers. Viewport The viewport array holds values to determine how to translate transformed, clipped, and projected vertex coordinates into window coordinates. This is the last stage of the pipeline. The values are based on the size and position of the drawable, also known as the drawing area of the window for the context. Texture management What follows is a description made by Ian Romanick of the texture management system in the DRI. This is all based on the Radeon driver in the 11-Feb-2002 CVS of the mesa-4-0-branch. While it is based on the Radeon code, all drivers except gamma and tdfx seem to use the same scheme (and virtually identical code). Just FYI: the tdfx texture memory management code is different because: It was originally developed before the scheme Keith implemented for the i810 and mga drivers (and later used for the R128 and radeon). There are some idiosynchracies with the Voodoo3 such as two separate banks of TRAM and needing to store alternate mipmap levels in alternate banks. We did everything through the Glide interface, rather than working directly with the hardware. Excluding the texture backing store, which is managed by Mesa, texture data is tracked in two places. The per-screen (card) SAREA divides each type of texturable memory (on-card, AGP, etc.) into an array of fixed sized chunks (RADEONSAREAPriv.texList in programs/Xserver/hw/xfree86/drivers/ati/radeon_sarea.h). The number of these chunks is a compile-time constant, and it cannot be changed without destroying the universe. That is, any changes here will present major compatability issues. Currently, this constant is 64. So, for the new 128MB Radeon 8500 cards, each block of memory will likely be 1MB or more. This is not as bad as it may first seem, see below. The usage of each type of memory is also tracked per-context. The per-context memory tracking is done using a memHeap_t. Allocations from the memHeap_t (see lib/GL/mesa/src/drv/common/mm.c) are done at byte granularity. When a context needs a block of texture memory, it is allocated from the memHeap_t. This results in very little memory fragmentation (within a context). After the allocation is made, the map of allocated memory in the SAREA is updated (radeonUpdateTexLRU in lib/GL/mesa/src/drv/radeon/radeon_texmem.c). Basically, each block of memory in texList that corresponds to an allocated region (in the per-context memHeap_t) is marked as allocated. The texList isn't just an array of blocks. It's also a priority queue (linked list). As each texture is accessed, the blocks that it occupies are moved to the head of the queue. In the Radeon code, each time a texture is uploaded or bound to a texture unit, the blocks of memory (in AGP space or on-card) are moved to the head of the texList queue. If an allocation (via the memHeap_t) fails when texture space is allocated (radeonUploadTexImages in lib/GL/mesa/src/drv/radeon/radeon_texmem.c), blocks at the end of the texList queue are freed until the allocation can succeed. This may be an area where the algorithm could be improved. For example, it might be better to find the largest free block (in the memHeap_t) and release memory around that block in LRU or least-often-used fashion until the allocation can succeed. This may be too difficult to get right or too slow when done right. Someone would have to try it and see. Each time a direct-client detects that another client has held the per-screen lock, radeonGetLock is called. This synchronizes the per-context vision of the hardware state. Part of this synchronization is synchronizing the view of texture memory. In addition to the texList, the SAREA holds a texAge array. This array stores the generation number of each of the texture heaps. If a client detects that the generation number of a heap has changed in radeonGetLock, it calls radeonAgeTextures for that heap. radeonAgeTextures runs through the texList looking for blocks with a more recent generation number. Each block that has a newer generation is passed to radeonTexturesGone. radeonTexturesGone searches the per-context memHeap_t for an allocated region matching the block with the updated generation. When a matching region is found, it is freed, and if the region was for a texture in the local context, the local state of that texture is updated. If the updated block (from the global context) is in-use (i.e., some other context has stolen that block from the current context), the block is re-allocated and marked as in-use by another context. It seems that about 2 years ago a few people (from the CVS log) had taken a stab at factoring the common code out and put it in shared_texture_lru.[ch] in lib/GL/mesa/src/drv/common but it isn't used and hasn't been touched (at least not in CVS) since 4-April-2000. How often are checks done to see if things need clipped/redrawn/redisplayed? The locking system is designed to be highly efficient. It is based on a two tiered lock. Basically it works like this: The client wants the lock. The use the CAS (I was corrected that the instruction is compare and swap, I knew that was the functionality, but I got the name wrong) If the client was the last application to hold the lock, you're done you move on. If it wasn't the last one, then we use an IOCTL to the kernel to arbitrate the lock. In this case some or all of the state on the card may have changed. The shared memory carries a stamp number for the X server. When the X server does a window operation it increments the stamp. If the client sees that the stamp has changed, it uses a DRI X protocol request to get new window location and clip rects. This only happens on a window move. Assuming your clip rects/window position hasn't changed, the redisplay happens entirely in the client. The client may have other state to restore as well. In the case of the tdfx driver we have three more flags for command fifo invalid, 3D state invalid, textures invalid. If those are set the corresponding state is restored. So, if the X server wakes up to process input, it current grabs the lock but doesn't invalidate any state. I'm actually fixing this now so that it doesn't grab the lock for input processing. If the X server draws, it grabs the lock and invalidates the command FIFO. If the X server moves a window, it grabs the lock, updates the stamp, and invalidates the command FIFO. If another 3D app runs, it grabs the lock, invalidates the command FIFO, invalidates the 3D state and possibly invalidates the texture state. What is templated DRM code? It was first discussed in a email about what Gareth had done to bring up the mach64 kernel module. Not wanting to simply copy-and-paste another version of _drv.[ch], _context.c, _bufs.s and so on, Gareth did some refactoring along the lines of what him and Rik Faith had discussed a long time ago. This is very much along the lines of a lot of Mesa code, where there exists a template header file that can be customized with a few defines. At the time, it was done _drv.c and _context.c, creating driver_tmp.h and context_tmp.h that could be used to build up the core module. An inspection of mach64_drv.c on the mach64-0-0-1-branch reveals the following code: #define DRIVER_AUTHOR "Gareth Hughes" #define DRIVER_NAME "mach64" #define DRIVER_DESC "DRM module for the ATI Rage Pro" #define DRIVER_DATE "20001203" #define DRIVER_MAJOR 1 #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0 static drm_ioctl_desc_t mach64_ioctls[] = { [DRM_IOCTL_NR(DRM_IOCTL_VERSION)] = { mach64_version, 0, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_GET_UNIQUE)] = { drm_getunique, 0, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_GET_MAGIC)] = { drm_getmagic, 0, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_IRQ_BUSID)] = { drm_irq_busid, 0, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_SET_UNIQUE)] = { drm_setunique, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_BLOCK)] = { drm_block, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_UNBLOCK)] = { drm_unblock, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_AUTH_MAGIC)] = { drm_authmagic, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_ADD_MAP)] = { drm_addmap, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_ADD_BUFS)] = { drm_addbufs, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_MARK_BUFS)] = { drm_markbufs, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_INFO_BUFS)] = { drm_infobufs, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_MAP_BUFS)] = { drm_mapbufs, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_FREE_BUFS)] = { drm_freebufs, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_ADD_CTX)] = { mach64_addctx, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_RM_CTX)] = { mach64_rmctx, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_MOD_CTX)] = { mach64_modctx, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_GET_CTX)] = { mach64_getctx, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_SWITCH_CTX)] = { mach64_switchctx, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_NEW_CTX)] = { mach64_newctx, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_RES_CTX)] = { mach64_resctx, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_ADD_DRAW)] = { drm_adddraw, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_RM_DRAW)] = { drm_rmdraw, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_LOCK)] = { mach64_lock, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_UNLOCK)] = { mach64_unlock, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_FINISH)] = { drm_finish, 1, 0 }, #if defined(CONFIG_AGP) || defined(CONFIG_AGP_MODULE) [DRM_IOCTL_NR(DRM_IOCTL_AGP_ACQUIRE)] = { drm_agp_acquire, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_AGP_RELEASE)] = { drm_agp_release, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_AGP_ENABLE)] = { drm_agp_enable, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_AGP_INFO)] = { drm_agp_info, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_AGP_ALLOC)] = { drm_agp_alloc, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_AGP_FREE)] = { drm_agp_free, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_AGP_BIND)] = { drm_agp_bind, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_AGP_UNBIND)] = { drm_agp_unbind, 1, 1 }, #endif [DRM_IOCTL_NR(DRM_IOCTL_MACH64_INIT)] = { mach64_dma_init, 1, 1 }, [DRM_IOCTL_NR(DRM_IOCTL_MACH64_CLEAR)] = { mach64_dma_clear, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_MACH64_SWAP)] = { mach64_dma_swap, 1, 0 }, [DRM_IOCTL_NR(DRM_IOCTL_MACH64_IDLE)] = { mach64_dma_idle, 1, 0 }, }; #define DRIVER_IOCTL_COUNT DRM_ARRAY_SIZE( mach64_ioctls ) #define HAVE_CTX_BITMAP 1 #define TAG(x) mach64_##x #include "driver_tmp.h" And that's all you need. A trivial amount of code is needed for the context handling: #define __NO_VERSION__ #include "drmP.h" #include "mach64_drv.h" #define TAG(x) mach64_##x #include "context_tmp.h" And as far as I can tell, the only thing that's keeping this out of mach64_drv.c is the __NO_VERSION__, which is a 2.2 thing and is not used in 2.4 (right?). To enable all the context bitmap code, we see the #define HAVE_CTX_BITMAP 1 To enable things like &AGP;, &MTRR;s and &DMA; management, the author simply needs to define the correct symbols. With less than five minutes of mach64-specific coding, I had a full kernel module that would do everything a basic driver requires — enough to bring up a software-fallback driver. The above code is all that is needed for the tdfx driver, with appropriate name changes. Indeed, any card that doesn't do kernel-based &DMA; can have a fully functional &DRM; module with the above code. &DMA;-based drivers will need more, of course. The plan is to extend this to basic &DMA; setup and buffer management, so that the creation of PCI or &AGP; &DMA; buffers, installation of IRQs and so on is as trivial as this. What will then be left is the hardware-specific parts of the &DRM; module that deal with actually programming the card to do things, such as setting state for rendering or kicking off &DMA; buffers. That is, the interesting stuff. A couple of points: Why was it done like this, and not with C++ features like virtual functions (i.e. why don't I do it in C++)? Because it's the Linux kernel, dammit! No offense to any C++ fan who may be reading this :-) Besides, a lot of the initialization is order-dependent, so inserting or removing blocks of code with #defines is a nice way to achieve the desired result, at least in this situation. Much of the core &DRM; code (like bufs.c, context.c and dma.c) will essentially move into these template headers. I feel that this is a better way to handle the common code. Take context.c as a trivial example — the i810, mga, tdfx, r128 and mach64 drivers have exactly the same code, with name changes. Take bufs.c as a slightly more interesting example — some drivers map only &AGP; buffers, some do both &AGP; and PCI, some map differently depending on their &DMA; queue management and so on. Again, rather than cutting and pasting the code from drm_addbufs into my driver, removing the sections I don't need and leaving it at that, I think keeping the core functionality in bufs_tmp.h and allowing this to be customized at compile time is a cleaner and more maintainable solution. This it has the possibility to make keeping the other OSs code up to date a lot easier. The current mach64 branch is only using one template in the driver. Check out the r128 driver from the trunk, for a good example. Notice there are files in there such as r128_tritmp.h. This is a template that gets included in r128_tris.c. What it does basically is consolidate code that is largely reproduced over several functions, so that you set a few macros. For example: #define IND (R128_TWOSIDE_BIT) #define TAG(x) x##_twoside followed by #include "r128_tritmp.h" Notice the inline function's name defined in r128_tritmp.h is the result of the TAG macro, as well the function's content is dependent on what IND value is defined. So essentially the inline function is a template for various functions that have a bit in common. That way you consolidate common code and keep things consistent. Look at e.g. xc/programs/Xserver/hw/xfree86/os-support/linux/drm/kernel/r128.h though. That's the template architecture at its beauty. Most of the code is shared between the drivers, customized with a few defines. Compare that to the duplication and inconsistency before. @