head 1.1; access; symbols libdrm-1_0_4:1.1; locks; strict; comment @# @; 1.1 date 2003.06.25.14.40.07; author jrfonseca; state Exp; branches; next ; desc @@ 1.1 log @Switch the source format to DocBook XML. @ text @ Hardware Specific AGP What is the AGP? AGP is a dedicated high-speed bus that allows the graphics controller to move large amounts of data directly from system memory. Uses a Graphics Address Re-Mapping Table (GART) to provide a physically-contiguous view of scattered pages in system memory for DMA transfers. With AGP, main memory is specifically used for advanced three-dimensional features, such as textures, alpha buffers, and z-buffers. There are two primary AGP usage models for 3D rendering that have to do with how data are partitioned and accessed AGPIS, and the resultant interface data flow characteristics. DMA In the DMA model, the primary graphics memory is the local memory associated with the accelerator, referred to as local frame buffer. 3D structures are stored in system memory, but are not used (or executed ) directly from this memory; rather they are copied to primary (local) memory (the DMA operation) to which the rendering engine's address generator makes its references. This implies that the traffic on the A.G.P. tends to be long, sequential transfers, serving the purpose of bulk data transport from system memory to primary graphics (local) memory. This sort of access model is amenable to a linked list of physical addresses provided by software (similar to operation of a disk or network I/O device), and is generally not sensitive to a non-contiguous view of the memory space. execute In the execute model, the accelerator uses both the local memory and the system memory as primary graphics memory. From the accelerator's perspective, the two memory systems are logically equivalent; any data structure may be allocated in either memory, with performance optimization as the only criteria for selection. In general, structures in system memory space are not copied into the local memory prior to use by the accelerator, but are executed in place. This implies that the traffic on the A.G.P. tends to be short, random accesses, which are not amenable to an access model based on software resolved lists of physical addresses. Because the accelerator generates direct references into system memory, a contiguous view of that space is essential; however, since system memory is dynamically allocated in random 4K pages, it is necessary in the execute model to provide an address mapping mechanism that maps random 4K pages into a single contiguous, physical address space. The A.G.P. supports both the DMA and execute models. However, since a primary motivation of the AGP is to reduce growth pressure on local memory, the execute model is the design center. AGP also allows to issue several access requests in a pipelined fashion while waiting for the data transfers to occur. Pipelining access requests results in having several read and/or write requests outstanding in the corelogic's request queue at any point in time. What is the GART? The execute model interface specification requires a physical-to-physical address remapping mechanism which insures the graphics accelerator (an AGP master) will have a contiguous view of graphics data structures dynamically allocated in system memory. This address remapping is accomplished via a memory-based table called the Graphics Address Remapping Table (GART) and used ( walked ) by the corelogic to perform the remapping. In order to avoid compatibility issues and allow future implementation flexibility, this mechanism is specified at a software (API) level. In other words, the actual GART table format is not specified; rather it is abstracted to the API by a HAL or miniport driver that must be provided with the corelogic. This remapping function should not be confused in any way with the system address translation table mechanism. While some of the concepts are similar, these are completely separate mechanisms which operate independently, under control of the operating system. Where can I get more info about AGP? Check the AGP Implementors Forum Q&A for more frequently asked questions about AGP. There also is a section about AGP on Barron01. Intel provides Accelerated Graphics Port Interface Specification AGPIS. Why not use the existing XFree86 AGP manipulation calls? You have to understand that the DRI functions have a different purpose then the ones in XFree. The DRM has to know about AGP, so it talks to the AGP kernel module itself. It has to be able to protect certain regions of AGP memory from the client side 3D drivers, yet it has to export some regions of it as well. While most of this functionality (most, not all) can be accomplished with the /dev/agpgart interface, it makes sense to use the DRM's current authentication mechanism. This means that there is less complexity on the client side. If we used /dev/agpgart then the client would have to open two devices, authenticate to both of them, and make half a dozen calls to agpgart, then only care about the DRM device. As a side note, the XFree86 calls were written after the DRM functions. Also to answer a previous question about not using XFree86 calls for memory mapping, you have to understand that under most OSs (probably Solaris as well), XFree86's functions will only work for root privileged processes. The whole point of the DRI is to allow processes that can connect to the X server to do some form of direct to hardware rendering. If we limited ourselves to using XFree86's functionality, we would not be able to do this. We don't want everyone to be root. How do I use AGP? You can also use this test program as a bit more documentation as to how agpgart is used. How to allocate AGP memory? Generally programs do the following: open /dev/agpgart ioctl(ACQUIRE) ioctl(INFO) to determine amount of memory for AGP mmap the device ioctl(SETUP) to set the AGP mode ioctl(ALLOCATE) a chunk o memory, specifying offset in aperture ioctl(BIND) that same chunk o memory Every time you update the GATT, you have to flush the cache and/or TLBs. This is expensive. Therefore, you allocate and bind the pages you'll use, and mmap() just returns the right pages when needed. Then you need to have a remap of the AGP aperture in the kernel which you can access. Use ioremap to do that. After that you have access to the AGP memory. You probably want to make sure that there is a write combining MTRR over the aperture. There is code in mga_drv.c in our kernel directory that shows you how to do that. If one has to insert pages he needs to check for -EBUSY errors and loop through the entire GTT. Wouldn't it be better if the driver fills up pg_start of agp_bind structure instead of user filling up? All this allocation should be done by only one process. If you need memory in the GTT you should be asking the Xserver for it (or whatever your controlling process is). Things are implemented this way so that the controlling process can know intimate details of how memory is laid out. This is very important for the I810, since you want to set tiled memory on certain regions of the aperture. If you made the kernel do the layout, then you would have to create device specific code in the kernel to make sure that the backbuffer/dcache are aligned for tiled memory. This adds complexity to the kernel that doesn't need to be there, and imposes restrictions on what you can do with agp memory. Also, the current Xserver implementation (4.0) actually locks out other applications from adding to the GTT. While the Xserver is active, the Xserver is the only one who can add memory. Only the controlling process may add things to the GTT, and while a controlling process is active, no other application can be the controlling process. Microsoft's VGART does things like you are describing I believe. I think its bad design. It enforces a policy on whoever uses it, and is not flexible. When you are designing low level system routines I think it is very important to make sure your design has the minimum of policy. Otherwise when you want to do something different you have to change the interface, or create custom drivers for each application that needs to do things differently. How does the DMA transfer mechanism works? Here's a proposal for an zero-ioctl (best case) DMA transfer mechanism. Let's call it 'kernel ringbuffers'. The premise is to replace the calls to the 'fire-vertex-buffer' ioctl with code to write to a client-private mapping shared by the kernel (like the current SAREA, but for each client). Starting from the beginning: Each client has a private piece of AGP memory, into which it will put secure commands (typically vertices and texture data). The client may expand or shrink this region according to load. Each client has a shared user/kernel region of cached memory. (Per-context SAREA). This is managed like a ring, with head and tail pointers. The client emits vertices to AGP memory (as it currently does with DMA buffers). When a state change, clear, swap, flush, or other event occurs, the client: Grabs the hardware lock. Re-emits any invalidated state to the head of the ring. Emits a command to fire the portion of AGP space as vertices. Updates the head pointer in the ring. Releases the lock. The kernel is responsible for processing all of the rings. Several events might cause the kernel to examine active rings for commands to be dispatched: A flush ioctl. (Called by impatient clients) A periodic timer. (If this is low overhead?) An interrupt previously emitted by the kernel. (If timers don't work) Additionally, for those who've been paying attention, you'll notice that some of the assumptions that we use currently to manage hardware state between multiple active contexts are broken if client commands to hardware aren't executed serially in an order which is knowable to the clients. Otherwise, a client that grabs the heavy lock doesn't know what state has been invalidated or textures swapped out by other clients. This could be solved by keeping per-context state in the kernel and implementing a proper texture manager. That's something we need to do anyway, but it's not a requirement for this mechanism to work. Instead, force the kernel to fire all outstanding commands on client ringbuffers whenever the heavyweight lock changes hands. This provides the same serialized semantics as the current mechanism, and also simplifies the kernel's task as it knows that only a single context has an active ring buffer (the one last to hold the lock). An additional mechanism is required to allow clients to know which pieces of their AGP buffer is pending execution by the hardware, and which pieces of the buffer are available to be reused. This is also exactly what NV_vertex_array_range requires. ATI Cards How do I obtain specifications to the ATI cards? Please read this section before you consider applying. Is there support for the video and TV playback features of cards made by ATI? Check the GATOS project for that. Mach64 based cards What's the status of mach64 branch? Leif Delgass has a page describing the current status of the mach64 branch here. He also has a page has links with the results of OpenGL conformance and performance tests on the mach64 branch. How do I build the mach64 branch? Follow the steps described in the Leif Delgass' Compiling the mach64 branch of DRI mini-HOWTO . Where can I get documentation about mach64 chipset? Take a look at the code, the list archives and the DRI documentation on its homepage (it's a little stale, but a good starting point). We are also using the driver from the Utah-GLX project as a guide, so you might want to check that out. Many have documentation from ATI as well, you can apply to their developer program for documentation. 3DFX How do I obtain specifications to the 3DFX cards? You can get them here. What's the relationship between Glide and DRI? Right now the picture looks like this: Client -> OpenGL/GLX -> Glide as HAL (DRI) -> hw In this layout the Glide(DRI) is really a hardware abstraction layer. The only API exposed it OpenGL and Glide(DRI) only works with OpenGL. It isn't useful by itself. There are a few Glide only games. 3dfx would like to see those work. So the current solution, shown above, doesn't work since the Glide API isn't available. Instead we need: Client -> Glide as API (DRI) -> hw Right now Mesa does a bunch of the DRI work, and then hands that data down to Glide. Also Mesa does all the locking of the hardware. If we're going to remove Mesa, then Glide now has to do the DRI work, and we have to do something about the locking. The solution is actually a bit more complicated. Glide wants to use all the memory as well. We don't want the X server to draw at all. Glide will turn off drawing in the X server and grab the lock and never let it go. That way no other 3D client can start up and the X server can still process keyboard events and such for you. When the Glide app goes away we just force a big refresh event for the whole screen. I hope that explains it. We're really not trying to encourage people to use the Glide API, it is just to allow those existing games to run. We really want people to use OpenGL directly. Another interesting project that a few people have discussed is removing Glide from the picture at all. Just let Mesa send the actual commands to the hardware. That's the way most of our drivers were written. It would simplify the install process (you don't need Glide separately) and it might improve performance a bit, and since we're only doing this for one type of hardware (Voodoo3+) Glide isn't doing that much as a hardware abstraction layer. It's some work. There's about 50 calls from Glide we use and those aren't simple, but it might be a good project for a few people to tackle. S3 Are there plans to enable the S3TC extension on any of the cards that currently support it? There's not a lot we can do with S3TC because of S3's patent/license restrictions. Normally, OpenGL implementations would do software compression of textures and then send them to the board. The patent seems to prevent that, so we're staying away from it. If an application has compressed texture (they compressed them themselves or compressed them offline) we can download the compressed texture to the board. Unfortunately, that's of little use since most applications don't work that way. Savage Are there any plans to support the Savage chips? Yes. You can read the original announcement. What's the status of the Savage driver? At this point work is still being done to make the 2D driver DRI aware. See the Savage DDX driver on CVS and the preliminary DDX chapter of the DRI Driver HOWTO. savage_dri.c is still needing several functions. i've been basically copy & pasting from the other drivers (especially Radeon, which is one of the most recent) to fill in the void. How can I help? Making the DDX DRI-aware doesn't require a very deep knowledge is a matter of mostly copy & pasting from other drivers. Bootstrapping a DRM kernel module is also easy - see for example the 3DFX one. Both these things could be done by a newbie, allowing the developers to concentrate on the more demanding Mesa driver. @