4358 – Assorted renderer changes patch (VBO, GLSL, "modern" techniques etc.)

Bug 4358 - Assorted renderer changes patch (VBO, GLSL, "modern" techniques etc.)

Status:	RESOLVED FIXED

Alias:	None

Product:	ioquake3
Classification:	Unclassified
Component:	Video
Version:	GIT MASTER
Hardware:	All All

Importance:	P3 enhancement
Assignee:	James Canete
QA Contact:	ioquake3 bugzilla mailing list

URL:

Duplicates (1):	4527
Depends on:
Blocks:

Reported:	2009-11-27 07:27 EST by tinkah
Modified:	2022-01-10 03:31:30 EST
CC List:	15 users (show)

See Also:

Attachments
VBO code cobbled together from xreal w/ changes (133.07 KB, patch) 2010-11-11 03:41 EST, James Canete
Newer version of VBO code from xreal w/ changes (121.07 KB, patch) 2010-11-12 04:33 EST, James Canete
version 3 of VBO code from xreal w/ changes (142.81 KB, patch) 2010-11-15 22:55 EST, James Canete
VBO and GLSL code from xreal w/ changes (245.89 KB, patch) 2010-11-22 05:17 EST, James Canete
generic vertex program for vbo/glsl patch (8.97 KB, text/plain) 2010-11-22 05:19 EST, James Canete
generic fragment program for vbo/glsl patch (1.81 KB, text/plain) 2010-11-22 05:20 EST, James Canete
Version 3 of VBO and GLSL code from xreal, w/ changes (280.20 KB, patch) 2010-11-24 07:15 EST, James Canete
Version 4 of VBO and GLSL code from xreal, w/ changes (288.94 KB, patch) 2010-11-25 22:15 EST, James Canete
VBO and GLSL code from xreal, w/ changes & occlusion code (298.83 KB, patch) 2010-12-06 08:07 EST, James Canete
Version 5 of VBO and GLSL code from xreal, w/ changes (308.18 KB, patch) 2010-12-07 07:14 EST, James Canete
Version 6 of VBO and GLSL code from xreal, w/ changes (315.66 KB, patch) 2010-12-08 06:44 EST, James Canete
Version 7 of VBO and GLSL code from xreal, w/ changes (319.44 KB, patch) 2010-12-13 00:03 EST, James Canete
Version 8 of VBO and GLSL code from xreal, w/ changes (349.14 KB, patch) 2010-12-16 01:33 EST, James Canete
Version 9 of VBO and GLSL code from xreal, w/ changes (359.80 KB, patch) 2010-12-21 04:24 EST, James Canete
Version 10 of VBO and GLSL code from xreal, w/ changes (355.89 KB, patch) 2011-01-05 07:49 EST, James Canete
Version 11 of VBO and GLSL code from xreal, w/ changes (355.03 KB, patch) 2011-02-08 23:26 EST, James Canete
Version 12 of VBO and GLSL code from xreal, w/ changes (362.83 KB, patch) 2011-03-30 10:17 EDT, James Canete
Version 13 of VBO and GLSL code from xreal, w/ changes (404.83 KB, patch) 2011-04-11 09:52 EDT, James Canete
Version 14 of VBO and GLSL code from xreal, w/ changes (427.05 KB, patch) 2011-04-18 11:05 EDT, James Canete
V15 of VBO/GLSL patch (472.09 KB, patch) 2011-05-10 06:17 EDT, James Canete
V16 of VBO/GLSL patch (474.58 KB, patch) 2011-05-12 06:43 EDT, James Canete
V17 of VBO/GLSL patch (477.96 KB, patch) 2011-06-14 08:55 EDT, James Canete
V18 of VBO/GLSL patch (504.21 KB, patch) 2011-07-19 02:57 EDT, James Canete
V19 of VBO/GLSL patch (503.63 KB, patch) 2011-08-02 21:04 EDT, James Canete
V20 of VBO/GLSL patch, gzipped (474.23 KB, application/octet-stream) 2011-08-09 04:34 EDT, James Canete
V20a of VBO/GLSL patch (243.48 KB, patch) 2011-08-09 04:54 EDT, James Canete
V21 of VBO/GLSL patch (977.89 KB, patch) 2011-08-09 20:25 EDT, James Canete
ioquake3 vanilla renderer code wit VBO Patch + FBO post process testing stuff (219.86 KB, application/x-gzip) 2011-09-04 13:06 EDT, Adrian Fuhrmann
V22 of VBO/GLSL patch (211.86 KB, application/x-gzip) 2011-09-06 07:13 EDT, James Canete
netradiant q3map2 patch for hdr support (11.05 KB, patch) 2011-09-06 07:17 EDT, James Canete
v23 of renderer patch (217.96 KB, application/x-gzip) 2011-09-21 05:42 EDT, James Canete
GTKRadiant q3map2 hdr patch (10.07 KB, patch) 2011-09-21 05:46 EDT, James Canete
v24 of renderer patch (218.11 KB, application/x-gzip) 2011-10-04 22:17 EDT, James Canete
v25 of renderer patch (226.31 KB, application/x-gzip) 2011-10-26 00:07 EDT, James Canete
v26 of renderer patch (230.51 KB, application/x-gzip) 2012-01-10 07:01 EST, James Canete
v27 of renderer patch (228.66 KB, application/x-gzip) 2012-01-12 02:34 EST, James Canete
v28 of renderer patch (230.47 KB, application/x-gzip) 2012-02-24 21:07 EST, James Canete
v29 of renderer patch (233.34 KB, application/x-gzip) 2012-03-21 03:08 EDT, James Canete
v30 of renderer patch (242.24 KB, application/x-gzip) 2012-06-18 23:08 EDT, James Canete
v31 of renderer patch (246.53 KB, application/x-gzip) 2012-10-17 06:20 EDT, James Canete
v31a of renderer patch (246.53 KB, application/x-gzip) 2012-10-18 06:29 EDT, James Canete
README for Rend2, v31a (22.21 KB, text/plain) 2012-10-18 09:37 EDT, James Canete
Rend2, v32. (253.74 KB, application/x-gzip) 2012-10-23 23:33 EDT, James Canete
Rend2, v32a. (253.77 KB, application/x-gzip) 2012-10-24 14:27 EDT, James Canete
Rend2 the FINAL (269.15 KB, application/x-gzip) 2012-10-25 21:14 EDT, James Canete

Description tinkah 2009-11-27 07:27:37 EST

Vertex Buffer Object rendering (http://en.wikipedia.org/wiki/Vertex_Buffer_Object) is the modern way of OpenGL redenring. Code available for its use on ioquake3 is included here:

http://forums.urbanterror.net/topic/11414-bumpmapping-modifications-enthusiastic-mappers-required/page__view__findpost__p__187394

The author is the lead coder of Urban Terror developing it for the next release of the game.

Main thread here:
http://forums.urbanterror.net/topic/11414-bumpmapping-modifications-enthusiastic-mappers-required/

Comment 1 Zachary J. Slater 2010-02-08 19:28:29 EST

*** Bug 4527 has been marked as a duplicate of this bug. ***

Comment 2 James Canete 2010-11-11 03:41:08 EST

Created attachment 2475 [details]
VBO code cobbled together from xreal w/ changes

So I decided to try my hand at implementing VBOs.  This involved taking the vbo code from xreal, retrofitting it into ioquake3's renderer, and writing some shader adaptation code without using vertex programs.

It's a little messy, and crashes/freezes at odd times, but it seems to work.  There's a whole laundry list of caveats, though.

1. A lot of shader options are not supported, or half-supported.
2. The shader compatibility that's there has been done in a fundamentally backward way, using matrix stuff that's deprecated in OpenGL 3.0.
2. Only world surfaces use VBOs, everything else is still using the old vertex array code.
3. Recompiling the index buffers at every vis transition is slow.  On my (old) system, running tvy-bench.pk3 took around 300-600 ms on every transition, which is an awful hitch every few moments, and in exchange I only get around 20 fps more than before.

Still, I suppose it might be useful as a base or something to someone.

Comment 3 James Canete 2010-11-12 04:33:11 EST

Created attachment 2477 [details]
Newer version of VBO code from xreal w/ changes

I've fixed a whole lot of things in the previous patch.

This one crashes less, and performs a lot better on vis cluster changes, with the hitch reduced to under 10 ms on tvy-bench.  As well, doors, portals, and mirrors all render correctly now.

The problem of the unsupported shaders remains, though.

Comment 4 Styx 2010-11-14 21:10:38 EST

(In reply to comment #3)
> Created an attachment (id=2477) [details]
> Newer version of VBO code from xreal w/ changes
> 
> I've fixed a whole lot of things in the previous patch.
> 
> This one crashes less, and performs a lot better on vis cluster changes, with
> the hitch reduced to under 10 ms on tvy-bench.  As well, doors, portals, and
> mirrors all render correctly now.
> 
> The problem of the unsupported shaders remains, though.

This is an epic patch, I've wanted this for ages (but didn't have the skill or the patience to make it).  The second version didn't include tr_vbo.c , though.  With a bit of work, I got this working in Tremulous.  It will be great when support for dynamic models is added.

Comment 5 James Canete 2010-11-15 22:55:05 EST

Created attachment 2479 [details]
version 3 of VBO code from xreal w/ changes 

> The second version didn't include tr_vbo.c , though.

Oops.  Here's another version of the patch.  This also fixes a nasty memory leak in the grid mesh code.

Comment 6 tinkah 2010-11-16 06:02:48 EST

Notice the other VBO code from UrT is now available on a latest ioquake3 SVN code based client for urt for all OSes here: http://www.www0.org/w/Optimized_executable;_builds_of_ioq3_engine_for_urt

It was initially ported by 'undead' and I did some minor modifications (mainly to fix a crash in case of an invalid shader with a stage that lacked texture map specification).

It appears compatible with all the feature set of quake3 arena, or at least everything q3ut4 uses.

Comment 7 James Canete 2010-11-22 05:17:28 EST

Created attachment 2485 [details]
VBO and GLSL code from xreal w/ changes

I couldn't think of a way to get all of the Quake 3 shaders working with the fixed-function pipeline, so I copied the glsl code from xreal and wrote/modified a pair of vertex and fragment programs.

This patch should support all Quake 3 shaders, but doesn't support dynamic lights.  The vertex and fragment programs have been put into long strings in tr_glsl.c, but they can be overridden with files named glsl/generic_vp.glsl and glsl/generic_fp.glsl.  I'll upload those to Bugzilla as well, as the long string versions are more or less incomprehensible.

There are a few problems left, however.  Fog works, but it's not 100% accurate to Quake 3's version.  Dynamic lights are completely ignored.  The old vertex array code and vbo with fixed function shaders code is still in, with no easy way to reenable it.  There's a crash bug that I haven't tracked down yet that happens occasionally when quitting to the main menu or changing level.  Cinematics are broken.  There's still a bunch of fixed-function code left that, when touched, kills the framerate.  Also the whole thing is very messy, and is begging for a cleanup.

Comment 8 James Canete 2010-11-22 05:19:24 EST

Created attachment 2486 [details]
generic vertex program for vbo/glsl patch

Comment 9 James Canete 2010-11-22 05:20:01 EST

Created attachment 2487 [details]
generic fragment program for vbo/glsl patch

Comment 10 James Canete 2010-11-24 07:15:02 EST

Created attachment 2489 [details]
Version 3 of VBO and GLSL code from xreal, w/ changes

I managed to squeeze a few more fps out of this by adding some tweaks and fixing some bugs.

Unfortunately, fog is very broken on sky boxes, and I don't think I've tracked down the crash bug, though it crashes less than before.

There are three cvars for changing the render pipe now, r_arb_vertex_buffer_object, r_arb_shader_objects, and r_mergeClusterSurfaces.  r_arb_vertex_buffer_object controls the use of vertex buffers, but not yet their creation.  They will be created regardless.  r_arb_shader_objects is the same with the GLSL shaders, and doesn't do anything if vertex buffers are disabled.  r_mergeClusterSurfaces merges similar surfaces in a cluster into a smaller set of vertex buffers for faster rendering, using code from xreal, but doesn't activate if vertex buffers are disabled.

As well, I started testing with the History map from http://www.zfight.com/, and I've found it actually performs better without the surface merging than with.  It's likely because the surfaces in this level are more easily culled separately than merged into cluster surfaces, but I haven't actually checked if this is the case.

Comment 11 Robert Beckebans 2010-11-25 05:57:17 EST

(In reply to comment #10)

I recommend to merge the VBO code from ET-XreaL GIT repository instead of the old Subversion code.
It contains alot optimizations although it is not finished yet.
Especially the VBO code was optimized and you don't need r_mergeClusterSurfaces 1 anymore in that version because it can batch the small BSP surfaces without VBOs using glMultiDrawElements. That fixed some horrible performance problems wiht the ATI driver and you only render the content inside the camera now and not everything in the PVS which always worked fine for Nvidia cards.

Comment 12 James Canete 2010-11-25 22:15:32 EST

Created attachment 2491 [details]
Version 4 of VBO and GLSL code from xreal, w/ changes

I took your advice and took a look at ET-Xreal's GIT repository, but I'm sort of wary about directly integrating changes, as ET's source code release was covered by GPLv3, whereas Quake 3's was covered by GPLv2.

Still, I wrote up my own version of the glMultiDrawElements part, and added a simple optimization (r_mergeMultidraws) that netted me 2fps on tvy-bench.  I also copied out ET-Xreal's version of R_CullLocalBox(), as the file header for tr_main.c stated GPLv2, so I'm thinking that's still covered by that.

I somehow broke merging cluster surfaces in this version of the patch though.  I'm looking into it.

Comment 13 James Canete 2010-12-06 08:07:42 EST

Created attachment 2511 [details]
VBO and GLSL code from xreal, w/ changes & occlusion code

I've strayed a bit from the aim of the original patch, and added in a couple of features.  They really don't belong here, though, so I'll try to separate them out into their own patches later.

First, I implemented what I call lazy frustums.  Basically, the BSP tree isn't traversed unless the current view frustum is partially outside the culling frustum used during the last BSP traversal.  This is activated by a cvar, r_lazyFrustums.  A setting of 1 enables the default, which performs no worse than before.  A setting of 2 expands the fov of the culling frustum slightly, resulting in more overdraw, but fewer BSP traversals since the changing view frustum fits better in the reused culling frustum.

Second, I implemented very simple occlusion culling.  It isn't coherent, and it isn't hierarchical, but it culls. :) Whenever the above code traverses the tree, the occlusion code draws an invisible white cube around every vis'd leaf, querying the number of samples drawn for each leaf.  After that, it polls every frame to check for results, and culls out leaves which had zero samples drawn.

The problems with this version of occlusion culling are that it's expensive whenever it's used, causing a noticeable drop in framerate, and it often culls too much, resulting in visual errors.  For now, the cvar r_arb_occlusion_query defaults to 0.

There are also some fixes to the VBO and GLSL code in there as well, such as fixing the texture coordinate calculations in the shaders and fixing bugs where some unvis'd maps did not render, and doors opened into nodraw.

Comment 14 James Canete 2010-12-07 07:14:42 EST

Created attachment 2512 [details]
Version 5 of VBO and GLSL code from xreal, w/ changes

I've changed a few things again.

This version of the patch removes the occlusion stuff I had in the previous patch, and adds GPU vertex skinning from ET-XreaL, as well as optimized vertex/fragment programs and some fixes here and there.

It's actually reasonably fast now, with s_useOpenAL set to 0 it's faster than base quake3.exe.  It's still slower than ioquake3 1.36, though.

With "timedemo 1;demo four" I get:

quake3.exe:
1260 frames, 4.3 seconds: 296.2 fps

ioquake3.exe SVN with VBO/GLSL patch:
1260 frames 3.7 seconds 342.9 fps 1.0/2.9/9.0/1.0 ms

ioquake3.exe 1.36 from website:
1260 frames 3.4 seconds 373.4 fps 1.0/2.7/6.0/0.9 ms

So there's some room for improvement. :)

Comment 15 Robert Beckebans 2010-12-07 10:02:58 EST

(In reply to comment #14)
> Created an attachment (id=2512) [details]
> Version 5 of VBO and GLSL code from xreal, w/ changes
> 
> I've changed a few things again.
> 
> This version of the patch removes the occlusion stuff I had in the previous
> patch, and adds GPU vertex skinning from ET-XreaL, as well as optimized
> vertex/fragment programs and some fixes here and there.
> 
> It's actually reasonably fast now, with s_useOpenAL set to 0 it's faster than
> base quake3.exe.  It's still slower than ioquake3 1.36, though.
> 
> With "timedemo 1;demo four" I get:
> 
> quake3.exe:
> 1260 frames, 4.3 seconds: 296.2 fps
> 
> ioquake3.exe SVN with VBO/GLSL patch:
> 1260 frames 3.7 seconds 342.9 fps 1.0/2.9/9.0/1.0 ms
> 
> ioquake3.exe 1.36 from website:
> 1260 frames 3.4 seconds 373.4 fps 1.0/2.7/6.0/0.9 ms
> 
> So there's some room for improvement. :)

Make sure that you are not using the old xreal GLSL shaders with dynamic branching on the GPU. I was told that those were no problem for modern GPUs but I proved it to be wrong. I could speed up the performance in ET-XreaL by moving the if statements in the GLSL shaders to preprocessor macros and then I used different shaders for different combinations.

Comment 16 James Canete 2010-12-08 06:44:35 EST

Created attachment 2515 [details]
Version 6 of VBO and GLSL code from xreal, w/ changes

>Make sure that you are not using the old xreal GLSL shaders with dynamic
>branching on the GPU. I was told that those were no problem for modern GPUs but
>I proved it to be wrong. I could speed up the performance in ET-XreaL by moving
>the if statements in the GLSL shaders to preprocessor macros and then I used
>different shaders for different combinations.

I tried implementing this, but it seems for what base Q3 is rendering, the cost of switching contexts is higher than what I save from skipped portions of the shader programs.  I do get a small boost from removing the portal plane cull, except when a portal enters the view, and another from disabling the vertex animation when drawing world surfaces, so it is worth it.  It'd probably be better if the surfaces were sorted by their need to use texture coordinate generation and vertex deforms, but I can try that later.

So here's another patch.  This one also includes dlights, so it now supports nearly all of Quake 3's effects, except the buggy fog and TMOD_TURBULENT, neither of which I've thought up an adequate fix for.

Performance has dropped slightly from the previous patch due to the dlights, but it's still within .5 seconds with "timedemo 1;demo four" on my system.

As well, I got it compiling in MinGW, and included the changed Makefile, so maybe someone with Linux can fix it up and see if it works there.

Comment 17 James Canete 2010-12-13 00:03:52 EST

Created attachment 2523 [details]
Version 7 of VBO and GLSL code from xreal, w/ changes

Version 7.

This version adds the unified lightmap from xreal, merges all the submodel verts into the world vbo, and removes as many state changes as I could manage.

It's faster than base ioquake3, in this one level with this one demo I made. :)
Level was industrial.pk3 from zfight.com, results were:

quake3.exe:
4435 frames, 31.2 seconds: 142.0 fps

ioquake3.exe, dated Aug 30, 2010, from website
4435 frames, 28.3 seconds 156.7 fps 1.0/6.4/251.0/2.0 ms

ioquake3.exe, SVN w/ vbo/glsl patch 7, compiled with MinGW gcc 4.5.0
4435 frames, 18.1 seconds 244.6 fps 1.0/4.1/416.0/1.2 ms

I'm thinking most of that boost is because of the unified lightmap, though.

Comment 18 James Canete 2010-12-16 01:33:48 EST

Created attachment 2530 [details]
Version 8 of VBO and GLSL code from xreal, w/ changes

Version 8.

This version takes an idea from ioUrT (no actual code, though. :)) and merges surfaces based on common leafs.  It also fixes a nasty crash/freeze with vertex interpolation, as well as a bug where r_ignoreFastPath would cause some surfaces to not be rendered.

Performance is good on nvidia hardware, specifically my 8800GT.  Testing on tvy-bench with a short demo:

quake3.exe:
4490 frames, 90.1 seconds: 49.9 fps

ioquake3.exe Aug 30, 2010 from website:
4490 frames 81.0 seconds 55.5 fps 5.0/18.0/29.0/3.9 ms

ioquake3 SVN with VBOGLSL 8, compiled with MinGW GCC 4.5.0:
4490 frames 20.1 seconds 223.9 fps 2.0/4.5/27.0/0.9 ms

From what I've heard from others, though, ati performance is lacking, just barely beating ioquake3.exe.  As well, the surface merging is slow, and adds a lot to load time.

Comment 19 gimhael 2010-12-17 05:21:51 EST

(In reply to comment #16)
> I do get a small boost from removing the portal plane cull,
> except when a portal enters the view, [...]

Doing the portal culling in the fragment shader is pretty costly, a much more elegant solution is to pick a projection matrix that makes the near plane identical to the portal clipping plane. (Eric Leyngel describes this method in his "Projection Matrix Tricks" paper.)

That way there is 0 additional cost for portal plane culling (you probably have to change the projection matrix anyway when the portal is renderer), the geometry is clipped/culled before it hits the rasterizer and portal culling is independend of the GLSL shaders you use.

Comment 20 James Canete 2010-12-21 04:24:41 EST

Created attachment 2543 [details]
Version 9 of VBO and GLSL code from xreal, w/ changes

Version 9.

This version has some big changes.

First, marks (blob shadows, bullet marks) now work with merged surfaces.  This is done by storing the merged surfaces in a different array, letting the marks code use the old mark surfaces, and adding a new array, viewsurfaces, for rendering.  This is a lot better than the last version, which simply overwrote the first surface in a merge and changed the marksurfaces array entries to point at it.

Second, the surface merging algorithm has been changed to something faster.  It's still a bit slow on larger maps, and it isn't as efficient at merging surfaces as the old routine, but it's fast enough to be usable.

Third, fog works, as in really works.  More or less everything is fogged correctly now, though the fogging code isn't exactly the same as Quake 3's.

As well, I implemented gimhael's suggestion and use the near plane for clipping when rendering a portal.  It's much cleaner than the vertex/fragment combination I was using before, though I didn't notice any change in speed.

The only other change I remember was fixing the state when r_arb_vertex_buffer_object is 0 but r_arb_shader_objects is 1.  This caused certain surfaces to render incorrectly.

Comment 21 tinkah 2010-12-23 04:40:05 EST

(In reply to comment #18)
> the surface merging is slow, and adds a lot to load time.

That is at loading stage, before gameplay, correct? I wonder if it's easily doable to spawn a thread for them, while leaving the main thread for other processing during loading stage.

SDL has threading, mutexing and semaphores without requiring extra libraries so one wouldn't even need to include pthreads or another multiplatform solution.

(OpenMP could theoretically be used on the individual functions involved but I suspect it would be slowed down by variable protections it may need to the point of making it pointless, but that's still speculation, at least for your code.)

Comment 22 James Canete 2011-01-05 07:49:49 EST

Created attachment 2551 [details]
Version 10 of VBO and GLSL code from xreal, w/ changes

Version 10.

This one handles TMOD_TURBULENT, sort of.  It's a bit of a hack.  I hid the values needed for calculating it into the texture matrix, and extract and calculate in the vertex program.  Due to this, it's not a perfect recreation of the way quake3 does it, but it's pretty much the same as long as the turbulent pass was the last texture mod in the shader.

Additionally, I tweaked the surface merging again, and it's fast enough to do tvy-bench in 0.05 seconds on my system now.  YMMV.

As well, I pushed shaders with multiple vertex deforms back on the CPU, slowing them somewhat but making those all render correctly again.

Now that everything is supported though, it's somewhat slower than the peak performance I got with version 8.  tvy-bench is around 120fps with the timedemo I made before.  The easiest way to get the speed back up would be to remove or re-implement differently the various features, specifically the mark surfaces, the dlights, and the fog.

So the only things left really are bug-fixing, cleanup/documentation, and maybe some more optimization if I can think of anything else to optimize.

Comment 23 James Canete 2011-02-08 23:26:34 EST

Created attachment 2586 [details]
Version 11 of VBO and GLSL code from xreal, w/ changes

Just a small update to keep up with the recent commits.

This patch updates the fat lightmap code to handle up to 1024 lightmaps, though it should probably be rewritten to use multiple 1024x1024 or 2048x2048 textures instead of a single 4096x4096 texture when going over 256 lightmaps.

As well, it fixes a bug in sdl_glimp.c which should fix compiling for non-win32 platforms, though I haven't actually tested it myself.

Comment 24 Fabio 2011-02-21 05:09:32 EST

What are the known advantages/disadvantages of the latest patch compared to the "bumpy" renderer of Urban Terror (and possibly other implementations)?
http://www.www0.org/w/Optimized_executable;_builds_of_ioq3_engine_for_urt

Comment 25 James Canete 2011-03-30 10:17:42 EDT

Created attachment 2647 [details]
Version 12 of VBO and GLSL code from xreal, w/ changes

Version 12.

No big speed increases in this version, but a lot of under-the-hood changes.  All the warnings I could see have been silenced, and the glsl shader code has been generalized, allowing for the addition of additional shaders besides the default generic one.  This patch includes four such shaders for specific drawing tasks: lightmapped, fog, dlight, and textureonly.  The lightmapped shader is activated with r_ignoreFastPath 0, as it implements part of that path in the old renderer.  As well, r_speeds 7 has been set to show the number of times each shader is used, as well as the number of glsl shader binds per frame.  

Also, to respond to Fabio, the only real advantage to the bumpy renderer that I can think of is the fact that this patch doesn't require any more work to apply to current ioquake3.  I haven't messed with the bumpy renderer, so I've no idea how this patch compares.  Any comparisons will probably be irrelevant once a proper modular rendering system patch is agreed upon, though.

Comment 26 James Canete 2011-04-11 09:52:53 EDT

Created attachment 2658 [details]
Version 13 of VBO and GLSL code from xreal, w/ changes

v13.

I fell under the spell of feature creep again, grabbed a bit more code from xreal, and implementing deluxemaps.

They're pretty easy to use, just compile your map with -deluxe, and add "stage normalmap" and "stage specularmap" stages in the right shaders, like what you would do for xreal, and everything should just work.

Comment 27 James Canete 2011-04-18 11:05:17 EDT

Created attachment 2663 [details]
Version 14 of VBO and GLSL code from xreal, w/ changes

v14.

Little bit of speedup, but the main parts of this patch are normal mapping on md3 models, per-pixel dlights (on certain shaders), and cvars to turn off normal/specular/diffuse.

Comment 28 Styx 2011-04-18 20:13:20 EDT

I'd like to see the option to disable the GLSL shaders.  They were my main problem with XreaL; XreaL requires a good video card to even run, so it won't run on a large number of computers.  I'd particularly like to see it independent of the VBO rendering.  VBOs have better support than current shaders.

Comment 29 Matt Turner 2011-05-06 19:44:51 EDT

(In reply to comment #27)
> Created attachment 2663 [details]
> Version 14 of VBO and GLSL code from xreal, w/ changes
> 
> v14.
> 
> Little bit of speedup, but the main parts of this patch are normal mapping on
> md3 models, per-pixel dlights (on certain shaders), and cvars to turn off
> normal/specular/diffuse.

Since the renderer is being rewritten, now is a perfect time to repack the structs to avoid unaligned accesses. Right now, they're fucking terrible. 

Consider:

typedef struct mdvModel_s
{
    int             numFrames;
    mdvFrame_t     *frames;

    int             numTags;
    mdvTag_t       *tags;
    mdvTagName_t   *tagNames;

    int             numSurfaces;
    mdvSurface_t   *surfaces;

    int             numVBOSurfaces;
    srfVBOMDVMesh_t  *vboSurfaces;

    int             numSkins;
} mdvModel_t;

on 64-bit platforms, *frames and *surfaces are unaligned. This causes some performance loss on platforms that don't care too much (amd64), to terrible performance loss on platforms that do care (alpha), and causes very strict platforms to SIGBUS (sparc).

Is there a place you hang out that we could work? IRC? Is this just a straight port of the XReal code? If so, I guess it'd be best to fix the unaligned accesses upstream.

Comment 30 James Canete 2011-05-10 06:17:22 EDT

Created attachment 2702 [details]
V15 of VBO/GLSL patch

v15.

Big new performance-killing feature is shadows.  I've added two types.

First is dlight shadows, activated with "r_dlightShadows 1". This (expensively) renders a cubemap for every dlight, and uses it to cast proper shadows when rendering that dlight.  As this requires 6 full view renders for every light, it's a hideously expensive operation.

The second is projected entity shadows, activated with "cg_shadows 4". This (expensively) renders a shadowmap for each of the closest 16 entities, and renders alpha-blended black on nearby surfaces.  These look similar to Source-engine shadows, complete with shadows projecting through objects and overlapping with each other.  These shadows aren't as expensive as the dlight ones, but the surface culling still needs work, as the shadows are rendered on far too many surfaces.

As well, I've added a fresnel component to the normal map-based specular lighting, using a lighting equation much like the one described in http://renderwonk.com/publications/s2010-shading-course/gotanda/course_note_practical_implementation_at_triace.pdf .  It's more expensive, though, and as of yet I haven't added a way to turn it off.

Along with those three there are a couple of optimizations and crash fixes, but nothing serious.

>I'd like to see the option to disable the GLSL shaders.

There is one, "r_arb_shader_objects 0".  A lot of features depend on the glsl shaders though, such as on-GPU vertex animation, and so I haven't put a lot of work into this path.

>Since the renderer is being rewritten, now is a perfect time to repack the
>structs to avoid unaligned accesses. Right now, they're fucking terrible. 

I haven't been writing the patch to fit 64-bit alignments, mostly because I haven't been dev'ing or testing on a 64-bit OS. :) Still, I'm open to any changes or suggestions that don't kill performance on 32-bit.

>Is there a place you hang out that we could work? IRC? Is this just a straight
>port of the XReal code?

I'm usually in #ioquake3 on freenode, but usually not paying attention.  Try yelling "SmileTheory" in channel and see if I respond. :)

And most of this code isn't a straight port, I've mostly been using XReal as a reference after cribbing the VBO parts, though the mdv code is largely a cut and paste job.

Comment 31 Matt Turner 2011-05-10 15:59:08 EDT

Lots of trailing whitespace in the patch. Also, a lot of Window's newlines.

A few other problems I see:
- You can't #define preprocessor identifiers that start with GL_, since they're reserved. (See GL_{MODULATE,ADD,REPLACE}).

- Trying to take the dot of a vec3 and vec4. texture2D() returns a vec4. See http://www.opengl.org/wiki/GLSL_:_common_mistakes
float dist = dot(dist3, vec3(1.0 / (256.0 * 256.0), 1.0 / 256.0, 1.0)) * u_LightRadius;

Also, when I disabled r_arb_*, maps fail to load with 'ERROR: Hunk_Alloc failed on 3179808' or similar.

Comment 32 James Canete 2011-05-12 06:43:00 EDT

Created attachment 2705 [details]
V16 of VBO/GLSL patch

v16.

This is just a few bugfixes really, the stuff mattst88 mentioned, as well as a fix for skyboxes not always being rendered, and improvements to selecting and culling projected shadows.

The only thing I didn't track down was the increased hunk usage with r_arb_* off.  mattst88, could you narrow it down to a/some specific Hunk_Alloc() call(s)?  If not, could you try bumping up com_HunkMegs?  The default of 64 seems small, and as far as I can see, hunk usage should go up with r_arb_* enabled, not down.

Comment 33 Matt Turner 2011-05-13 15:43:23 EDT

(In reply to comment #32)
> The only thing I didn't track down was the increased hunk usage with r_arb_*
> off.  mattst88, could you narrow it down to a/some specific Hunk_Alloc()
> call(s)?  If not, could you try bumping up com_HunkMegs?  The default of 64
> seems small, and as far as I can see, hunk usage should go up with r_arb_*
> enabled, not down.

Not really sure how to track it down, but changing com_hunkmegs to 128 (from 64) did work-around the problem.

With v16, quake3 seems to mess up the resolution upon exit, changing it to something very low. I use r_mode -1, and custom resolution in Quake3 of 1920x1200. I didn't have this problem with v15 I don't believe.

Comment 34 Matt Turner 2011-05-13 22:17:46 EDT

(In reply to comment #33)
> With v16, quake3 seems to mess up the resolution upon exit, changing it to
> something very low. I use r_mode -1, and custom resolution in Quake3 of
> 1920x1200. I didn't have this problem with v15 I don't believe.

Ignore this part. Seems to have been a problem caused by a GPU hang and reinit. Restarting the computer fully made this go away.

I tested the patch on Alpha, and expected it to make a sizable performance improvement, since the amount of bus traffic per frame should be reduced greatly.

As it turns out, with my 800MHz Alpha and a Radeon 9800 Pro using r300g, performance in `demo four` drops from 30.4 FPS to 21.7 FPS. Any ideas what's going on here? Is the GLSL not optimized or perhaps uses lots of loops and things that older cards like the 9800 aren't totally capable of?

Comment 35 Matt Turner 2011-05-16 19:43:52 EDT

On my 800 MHz Alpha using r300g in demo four:

Radeon    vbo shader fps
9800 Pro    0      0 30.4 FPS
9800 Pro    1      0 22.8 FPS (totally corrupted)
9800 Pro    1      1 21.7 FPS
X1550       0      0 14.6 FPS
X1550       1      0 20.7 FPS (totally corrupted)
X1550       1      1 12.1 FPS

So using VBOs without shader objects yields a totally corrupted image.

Disabling shaders while enabling VBOs provides a sizable speed-up for the X1550 but not for the 9800 -- no idea why.

Comment 36 James Canete 2011-06-14 08:55:27 EDT

Created attachment 2781 [details]
V17 of VBO/GLSL patch

v17.

Just a maintenance update really, to make sure the patch applies cleanly still.

Still, there are some useful features and fixes:

- close shadow casters are merged into the same shadow, which should have some performance benefits.
- a bug where shadows would show up on the wrong side of a surface was fixed.
- a nasty bug with rendering portals was fixed, where the zFar value wasn't reset.
- the lightmap value is modified in the glsl shader when a deluxe map is used, since lightmaps are diffuse shaded.
- the shader keyword "specularReflectance" has been added, for use in the "stage specular" stage.  This is basically a crossfade between diffuse+fresnel lighting at 0, and purely specular lighting at 1.  If not specified, the patch defaults to 0.04.

Comment 37 James Canete 2011-07-19 02:57:36 EDT

Created attachment 2854 [details]
V18 of VBO/GLSL patch

v18.

Lots of changes in this one.

- Optimized alphagen/colorgen in the glsl shaders.  The fast lighting path now supports all original alphagen/colorgen settings besides CGEN_LIGHTING_DIFFUSE, AGEN_LIGHTING_SPECULAR, and AGEN_PORTAL.
- changed r_dlightShadows to r_dlightMode.  0 is default, with quake3-style dlights.  1 is per-pixel dlights, and 2 is per-pixel dlights with cubemap shadows.
- added code to automatically load normal and specular textures, by adding "_normal" and "_specular" before the file extension.  Haven't tested this, though.
- updated the fat lightmap code to upload the fat lightmap one normal lightmap-sized chunk at a time, to avoid allocating a huge chunk of memory during load.
- added an experimental diffuse and specular ambient light to the lightall shader.  This is tied to r_normalAmbient, where any value above zero will split the lightmap values into that much ambient and the rest directed light.  Set to 0 by default, deactivating the whole thing.
- fixed a bug where shaders associated with vertex animated models weren't always vertex animated, leaving odd trails.
- fixed a division-by-zero in the specular portion of the lightall shader, which caused weird highlights and darkness.

And probably some other stuff I can't recall.

Comment 38 Thilo Schulz 2011-07-29 18:18:38 EDT

use.less: have you spent a few thoughts on how this vbo renderer path can be integrated with the modular rendering?
Is there reusable code between the two renderers? How do we have them share the code between the two paths? Should we create a separate directory for the vbo renderer and the old rasterizer?

Comment 39 Tim Angus 2011-07-29 19:22:48 EDT

This does much much more than just use VBOs...

Comment 40 Thilo Schulz 2011-07-30 01:58:01 EDT

(In reply to comment #38)
> use.less: have you spent a few thoughts on how this vbo renderer path can be
> integrated with the modular rendering?
> Is there reusable code between the two renderers? How do we have them share the
> code between the two paths? Should we create a separate directory for the vbo
> renderer and the old rasterizer?

I have given some more thoughts to this. What would be desirable to have:
Leave the old renderer and its code untouched in code/renderer/

Add a new directory named like:
code/renderergl2
You can choose another name if that is more suitable.

If you find portions of code from code/renderer/ that are reusable, reference that code from code/renderergl2 instead of rewriting it completely.

There's two reasons for why this is desirable:
1. Offspin projects based on ioq3 that have changed the renderer have it easier to integrate with versions of ioq3 that have your modern renderer
2. Changes to the renderer in the reusable part don't have to be made twice.

If this route is viable, I would guess we could add this to ioquake3. But you'd have to be committed to maintaining your renderer path if there are changes to the standard rasterizer. I could imagine giving you svn write access for that, but that is not for me to decide.

Comment 41 James Canete 2011-07-31 19:57:50 EDT

> have you spent a few thoughts on how this vbo renderer path can be
integrated with the modular rendering?

I have tested it with #4790, and compiling dlls with and without this patch and switching between them with the same executable does work, on win32 anyway.  The only part I'm iffy about is dealing with the situation where the renderer dll doesn't work for whatever reason, be it missing opengl extensions or simply crashing.  I fiddled with this once before, and I didn't come up with a good way to automatically revert to the default dll.

>I have given some more thoughts to this. What would be desirable to have:
>Leave the old renderer and its code untouched in code/renderer/
>
>Add a new directory named like:
>code/renderergl2
>You can choose another name if that is more suitable.
>
>If you find portions of code from code/renderer/ that are reusable, reference
>that code from code/renderergl2 instead of rewriting it completely.

I'm good with this idea, but it might take a while to do.  Most of the new renderer still uses the original code, just with a few modifications here and there.  Reusing code in the old renderer folder would either require adding a ton of #define's there or renaming and refactoring a lot of code in the new renderer folder.

Right now I'd be in favor of simply putting in the modular renderer patch, perhaps after figuring out some fix to the issue I mentioned above, and worrying about this patch once I finish separating it out into another folder and refactoring it.  It doesn't need to live in the ioquake3 svn just yet.

Comment 42 Thilo Schulz 2011-08-01 11:52:48 EDT

(In reply to comment #41)
> I'm good with this idea, but it might take a while to do.  Most of the new
> renderer still uses the original code, just with a few modifications here and
> there.  Reusing code in the old renderer folder would either require adding a
> ton of #define's there or renaming and refactoring a lot of code in the new
> renderer folder.

Yeah, I'm not entirely sure how viable this route is at all, because I don't really know the code behind this.
Maybe it would make more sense to have three directories:

renderer_common  <-- code in use by both renderergl1 and renderergl2
renderer <-- code exclusively in use for old renderer
renderergl2 <-- your code

> Right now I'd be in favor of simply putting in the modular renderer patch,
> perhaps after figuring out some fix to the issue I mentioned above, and
> worrying about this patch once I finish separating it out into another folder
> and refactoring it.  It doesn't need to live in the ioquake3 svn just yet.

Agreed, the modular renderer has been implemented. As to your problem: At present I think it's sufficient to just bail out with an error message if the new generation renderer does not work. The user would have to make a conscious effort to switch the renderer by setting cl_renderer and can reset it if it doesn't work out. Yes, it would be nice to implement some kind of fallback behaviour, but I don't see that as hugely problematic, it can always be added later.

I am pretty excited about this and judging from some first impressions you did a great job. I'd really like to see this in ioq3

Comment 43 Thilo Schulz 2011-08-01 12:07:02 EDT

[17:54:59] <Thilo> there two routes we can go
[17:55:17] <Thilo> one is separate the VBO renderer via ifdeffery and keep the renderer/ path
[17:55:29] <Timbo> VBO is such a silly term :P
[17:55:33] <Thilo> or separate the renderer out in separate directories
[17:55:47] <Timbo> that's a tiny part of the patch
[17:55:47] <Thilo> both have its advantages/disadvantages
[17:56:09] <Timbo> I would tend to go with ifdeffery
[17:56:32] <Timbo> since both paths are compiled
[17:57:00] <Timbo> my problem with excessive ifdefs is that it encourages dead code that rarely gets compiled
[17:57:11] <Timbo> so you might break it but you never know about it
[17:57:15] <Thilo> yes
[17:57:19] <Thilo> that's ugly indeed
[17:57:33] <Timbo> and by the time someone figures that out it's had fuckloads of surrounding changes and doesn't even make sense any more
[17:58:35] <Timbo> it's an awkward thing to deal with really
[17:58:42] <Timbo> this huge renderer patch I mean
[17:58:44] <Thilo> yeah.

Alright. I'll leave it up to you, whether you want to separate the two renderer paths via #ifdef or two directories.
Via #ifdef would probably be much easier and quicker for you. It may be more difficult to merge #ifdefs into existing renderers, but once that move is done, it would probably also be easier to maintain the two paths at the same time if you change something in the renderer.

Comment 44 Kuehnhammer Tobias 2011-08-01 16:01:25 EDT

Hi,

I have no experience with rendering I only know the idtech3 AI by heart so 
maybe my question is a bit silly! I applied the patches from 1-18 of the 
patches by "use.less01" from the very first time. The work "use.less01" did is 
so fantastic it even outperforms "Xreal" on some maps (tvy_bench, mxl_school, 
ET and even on "ET/tce" maps). The patches always applied without problems.
There are only a few small bugs (r_flares 1 while shooting...) 
Now when I follow your (professional) discussion I wonder what is wrong with 
one render path? I ask because I have always been interested in GPU 
programming but I doesn't have the skills!

However, After 10 years during completely reworking the botai I realized that 
it was a good decision to always stay with one (AI) path!
What is the difference here?

So my question is:


Why do ioquake3 need a modular rendering system, seperated dll's or more than 
one render path at all! 

Who will ever need the default (current) render path? If the patch is applied 
there is only one path, with no errors and it compiles without problems!

So why is it not good to make this "use.less01" render path the default one?


"use.less01" this is the most impressive work I have ever seen! Keep on going!

I hope someone can give me an answer! Thanks!

Comment 45 James Canete 2011-08-02 21:04:47 EDT

Created attachment 2886 [details]
V19 of VBO/GLSL patch

v19.

This'll probably be the last version of the patch before I refactor it to work better with the modular renderer code that's in svn.  Patching the source and compiling it with this version should result in a renderer dll you can swap in with a default executable compile, though.

The big changes in this one are alternate shading algorithms.  Setting r_normalMapping to 2 turns on (expensive) Oren-Nayar diffuse, which is then adjustable by adding "diffuseRoughness <value>" to the diffuse stage in a shader.  Likewise, r_specularMapping has three settings.  1 is the standard, TriAce specular, 2 is the plain, fast Blinn, and 3 is the slow, expensive Cook-Torrance.

There are also a couple of bug fixes which I can't recall specifically.

Comment 46 James Canete 2011-08-07 22:44:14 EDT

So I spent a few days trying the #ifdef path with this patch, but in the meantime gimhael's renderer patch (#5160) was submitted, and that makes me think I shouldn't be messing with the original renderer directory at all, and that separate directories is the way to go.

It makes me think there are probably a lot of large renderer patches out there, and messing around with the original renderer defeats the purpose of modularizing the renderer in the first place.  There might be a lot of duplicated code, but I think it's better to keep the original renderer pure of any modifications that are specific to another, specific renderer.

Maintenance of the duplicate code might be a problem, but I'm thinking that development with the old renderer will likely slow anyway, with new features migrating out into new renderers.

As well, a clean break will be good for me, as I can completely rearrange the new renderer code as I see fit, without worrying too much about the patch footprint.

So I'm going to start rewriting the patch with these in mind.

Comment 47 Thilo Schulz 2011-08-08 06:57:25 EDT

(In reply to comment #46)
> So I'm going to start rewriting the patch with these in mind.

Alright. It really was not my call to make, since you did all of this stuff. I am not really worried about the maintenance aspect of this patch as long as you keep your stuff updated.

Comment 48 Tim Angus 2011-08-08 07:01:39 EDT

I don't really have any really strong feelings either way, but you're probably right. We haven't tended to touch the renderer code much anyway, so I suspect the benefits of sharing the code probably aren't that great in the first place.

I do have vague plans to do an OpenGL ES renderer at some point, which should be much more trivial a prospect and probably would be best done with ifdefs.

Comment 49 James Canete 2011-08-09 04:34:39 EDT

Created attachment 2927 [details]
V20 of VBO/GLSL patch, gzipped

v20.

Bugzilla didn't like the size of the patch, so I gzipped it first.  I'll have another look at it though, it seems oversized.

This duplicates the entire renderer/ directory into renderergl2/, and then applies the original patch to that.  As well, I moved sdl_glimp.c from sdl/ into renderer/ and renderergl2/, because that needed modifications in the new renderer.  sdl/sdl_gamma.c could probably be moved too, but for now it's a common file to both renderers.

The makefile has been modified as well, to build both renderer dlls.

Also, there was an annoying bit with deluxe mapping where dark parts of the deluxe map would cause odd, bright specular highlights, so I added a couple of lines to the shader to tweak the light direction so these were removed/less noticeable.

Comment 50 James Canete 2011-08-09 04:54:13 EDT

Created attachment 2928 [details]
V20a of VBO/GLSL patch

Ack, ignore that previous one, it had all the files duplicated.

Comment 51 Tim Angus 2011-08-09 06:04:19 EDT

(In reply to comment #49)
> As well, I moved sdl_glimp.c from sdl/
> into renderer/ and renderergl2/, because that needed modifications in the new
> renderer.

That certainly isn't acceptable. Looks like you add GL 3 context creation and extension loading. The context creation will eventually be available via SDL 1.3 so lets put that to one side for now. Perhaps the extension loading should be refactored out into its own file; tr_extensions.c or whatever?

Comment 52 James Canete 2011-08-09 20:25:00 EDT

Created attachment 2932 [details]
V21 of VBO/GLSL patch

v21.

I've restored sdl_glimp.c to its original place, reverted the changes to it, and added renderergl2/tr_extensions.c, which grabs the extensions that the new renderer requires.  I've done it this way because of gimhael's GLEW patch in 5170, so that will apply cleanly whenever it goes in.  I can then rewrite this patch to suit it.  No point in refactoring the extensions myself if there's another patch to do it. :)

As well, I reduced the size of this patch by modifying a few files in the old renderer, just replacing the #include tr_local.h, which they weren't really using anyway.

Comment 53 Adrian Fuhrmann 2011-09-04 03:55:02 EDT

Hi,

using the patch at a fresh ioquake3 works without any flaws. But i noticed that it breaks "old" enhancements, like the bloom and celshading extra ( which is outdated indeed, patchfile is a little bit old). The old extras suddenly stopped working ( like the bloom from tremolous: http://patches.mercenariesguild.net/index.php?do=details&task_id=201&project=1&order=category&sort=desc ).

But instead of fixing them, it would be much nicer to go a similar way this patch does. Using framebuffer objects with glsl can enable the engine to use cellshading, rotoscope, bloom and blur a faster and more comfortable way.

has anyone any experiences or a working version available? i played around the the (olderish) FBO solution from GordAllott (https://bugzilla.icculus.org/show_bug.cgi?id=3422) wich results in a pure black screen, when plain applied ( with little adjustments related to the modular rendering system).

Comment 54 Adrian Fuhrmann 2011-09-04 13:06:18 EDT

Created attachment 2961 [details]
ioquake3 vanilla renderer code wit VBO Patch + FBO post process testing stuff

ive patched the ioquake renderer and added some of the changes from http://www.www0.org/w/Optimized_executable%3b_builds_of_ioq3_engine_for_urt regarding post processing and FBO. This is not intend to be a patch or a full functional update, it is a proposal to extend this patch(4358) with framebuffer capabilities.

there are mainly two issues:

1. when running the post process effects with r_postProcess 1, and at least one enabled effect (r_bloom, r_dof or r_fogdensity > 0 ) the 2D rendering in HUD and console is broken. Im no render expert, so any help is appreciated.

2. the framerates are pretty low. i think this can be fixed when integrating the FBO capabilities within the VBO logic. For now, it just runs "aside".

The main goal could be to provide a full functional render beside the original one in an own branch of ioquakes svn source tree.
I recognized that there are a lot of atempts to improve the renderer ( ioUT, trem12, XreaL etc ) but the most results in lack of platfom compability, hardware issues, etc.

Comment 55 James Canete 2011-09-06 07:13:24 EDT

Created attachment 2963 [details]
V22 of VBO/GLSL patch

v22.

I added r_hdr.

Turning that to 0 turns off all the (visible) changes.

Well, it's not real HDR.  The skybox and entities aren't lit properly yet.  Only lightmapped brushes actually use HDR lightmap values, and any surfaces that use the fallback rendering path aren't HDR lit.  As well, I don't actually render to a float buffer or anything, I just do the tone mapping in the shader.

Also it's super oversaturated and looks sort of like being on uppers, but I can work that stuff out later.

As well, it requires some changes to q3map2, so I've made a patch for netradiant's version, based on code from etxmap.  I'll upload that after this patch.

wrt FBOs, this patch manages to do the adaptive tone mapping without them, using ugly hacks with the frame buffer.  I'll probably get around to implementing FBOs myself though, most likely based on xreal.  In this patch though, I've added a renderer command, RB_PostProcess(), which is the correct function for doing anything post world render but pre UI render.

Comment 56 James Canete 2011-09-06 07:17:23 EDT

Created attachment 2964 [details]
netradiant q3map2 patch for hdr support

One thing about this patch, -fast does work while lighting, but since it cuts off lights with a value smaller than 1, you get better quality at the expense of a lot of speed by turning off -fast.

Comment 57 Adrian Fuhrmann 2011-09-06 08:20:21 EDT

(In reply to comment #56)

it seems, that dark corners get brigther while looking in a light source darkens the display. that seems not to be correct.

Comment 58 James Canete 2011-09-21 05:42:37 EDT

Created attachment 2973 [details]
v23 of renderer patch

v23.

This one includes FBO support, mostly taken from xreal, though modified.  Using this code, the renderer now renders to an fp16 frame buffer and does tone mapping as a post-process.

The cvar r_hdr has changed, it now has four settings:
0: no tone mapping/hdr
1: tone mapping, autoadjust if worldspawn has "adaptiveToneMapping 1" key/value pair, default
2: tone mapping, no autoadjust
3: tone mapping, autoadjust
This is to adapt to existing maps, so they don't have auto-adaptation when they weren't made for it.

As well, r_lightmap 3 now shows the deluxemap for debugging purposes, though this should not be used at the same time as hdr.

There are also some cvars for turning on/off the new GL extensions I use, namely r_ext_framebuffer_object, r_ext_texture_float, and r_arb_half_float_pixel.

Comment 59 James Canete 2011-09-21 05:46:42 EDT

Created attachment 2974 [details]
GTKRadiant q3map2 hdr patch

Additionally, here's a patch for gtkradiant's version of q3map2 to make it export hdr lightmaps.  I didn't mention this before, but like the previous patch, it needs -exporthdr on the command line to do so.

This is still a work in progress, so the lightmaps that come out probably won't be the best looking.

Comment 60 James Canete 2011-10-04 22:17:41 EDT

Created attachment 2990 [details]
v24 of renderer patch

v24.

I've added a key to worldspawn, and changed the old key adaptiveToneMapping to autoExposure.

- autoExposure 1 (previously adaptiveToneMapping) - enable automatic exposure adjustment.  Defaults to 0, can be overridden with the cvar r_hdr set to 3.
- autoExposureMinMax <min> <max> - set minimum and maximum for auto exposure.  Max values -10 to 10, recommended values are -5 to 5, equaling multipliers of 2^-5 to 2^5.  Defaults to -2 to 2, which gives an ok result for standard LDR lightmapped maps with the cvar r_hdr set to 3.

There's also a bug fix or two, and a small addition to the lightall shader to hide zeroed parts of the deluxe map, though they'll still appear with r_lightmap 3.  cg_shadows 4 now works, but I believe r_dlightMode 2 is still broken.

Comment 61 James Canete 2011-10-26 00:07:36 EDT

Created attachment 2998 [details]
v25 of renderer patch

v25.

New in this patch is texture upsampling, based on FCBI (www.andreagiachetti.it/icbi/).  I've added three cvars:

r_imageUpsample: control amount of upsampling: 0 is none, 1 is 2x width and height, 2 is 4x, 3 is 8x, etc.  0 is default.
r_imageUpsampleType: control type of upsampling.  2 is FCBI, which uses first and second order derivatives to fit edges.  1 is similar, but throws out the second order derivatives for speed.  1 is default.
r_imageUpsampleMaxSize: control maximum texture resolution.  Defaults to 1024.

The downsizes to upsampling are increased load time (400ms per texture on my machine for upscaling 256x256 to 2048x2048, type 2), increased memory requirements (using r_ext_compressed_textures is recommended), and alignment issues (FCBI rescales textures to 2*size-1, so I duplicate the bottom and right border).  I've done quite a bit of optimization to relieve the first issue, but I'm sure more could be done.

As well, to help out the Reaction team, I've included some of their renderer modifications in this patch, but behind "#ifdef REACTION" so they aren't in the way.  I've asked permission, and am releasing their changes under GPLv2.  There's some code there for blur and crepuscular rays, but I'm not too familiar with it, so I couldn't tell you how to get it working in your own maps.  Credit to Makro (Andrei Drexler) and JBravo (Richard Allen) for this.

Comment 62 JBravo 2011-10-26 12:47:37 EDT

I need to make a small correction here :)    All renderer improvements from Reaction are Makro's work and I (JBravo) am in awe of his work and understand nothing of it and certainly deserve no credit for it :)

Comment 63 James Canete 2012-01-10 07:01:05 EST

Created attachment 3067 [details]
v26 of renderer patch

been a while, but here's v26.

Bunch of changes in this one.

First off is fake deluxe maps.  I basically take the light direction from the light grid at every bsp model vertex, and use that if the deluxe map is missing.  This means that maps don't have to be compiled with -deluxe to have directional lighting, though of course they will look a lot better if they did. 

Second is hdr vertex colors.  These are stored in an external file, like hdr lightmaps, and requires changes to q3map2.

Third is software overbright bits.  This allows for overbright bits to be used when in a window, but doesn't look exactly the same as before, as the obb are applied before gamma correction instead of after.

Fourth, I remapped a couple of the cvars:

r_hdr: render in hdr, 0: off, 1: on
r_postprocess: apply post-processing effects, 0: off, 1: on
r_tonemap: apply tonemapping, requires r_postprocess 1 and r_hdr 1. 0: off, 1: on with maps with auto-exposure, 2: always on.
r_autoexposure: dynamically change exposure levels depending on brightness, requires r_hdr 1, r_postprocess 1, and r_tonemap >= 1.  0: off, 1: on with maps with auto-exposure, 2: always on.

I think I may have broken anaglyph 3d as well, but I'll look at it later. :)

Comment 64 James Canete 2012-01-12 02:34:20 EST

Created attachment 3070 [details]
v27 of renderer patch

v27.

This is a quick bugfix patch for Reaction.

Fixes are:
1. Videos play again.
2. Added zbuffer to pre-screen vbo, fixing Reaction's UI issues
3. Added hack(s) to get Reaction's fog hull working
4. Removed assumption of vertex lighting when "rgbgen vertex" is used.  To use per-pixel lighting using interpolated vertex light values, use "rgbgen vertexlit" or "rgbgen exactvertexlit" in shaders.  This fixes Reaction's crepuscular rays.

Comment 65 James Canete 2012-02-24 21:07:31 EST

Created attachment 3088 [details]
v28 of renderer patch

v28.

Couple of changes here.
- Added multisample anti-aliasing(MSAA). To use, set r_ext_framebuffer_multisample to 2-32, depending on the amount desired.  I don't recommend anything higher than 8-16x though, since the memory requirements become ludicrous.
- Fixed a bug where moving brushes weren't lit correctly.
- Optimized cg_shadows 4 shader.
- Fixed weird stretched shadows with cg_shadows 4.
- Improved dark areas when promoting LDR lightmaps to HDR.
- Fixed anaglyph rendering.

Comment 66 q3urt.undead 2012-03-10 14:38:54 EST

Hi James,

I think there is a problem with this patch and RAVENMD4. OA 0.8.8 defines RAVENMD4 so when you use this patch, it gives warnings/errors about data type mismatches.

I merged OpenArena 0.8.8 client/server with r2224 of ioquake3 and split the OA changes off into a separate renderer. I made a branch of that which includes your opengl2 renderer. Do you have any interest in helping get this working in OA? I think there is interest in the OA community with having another renderer. In the end, this will probably result in a third renderer which is a merge of yours and the OA since it has features not found in yours.

Thanks for your patch as it applies cleanly against r2224. I made a few changes though that I think upstream may want to consider. One of the problems with the modular renderer is that the files all reference common headers like tr_local.h. In practice, each renderer will have its own tr_local.h. It's difficult to reuse the common *.c files when they all reference their own tr_local.h

I noticed you made a copy of all the files except for the tr_image_*.c files which you modified to remove the tr_local.h references. I ran into the same thing with the OA renderer which changes even less files. Unless I modify the code/renderer files, I have to dump them into code/renderer_oa even though OA doesn't modify them.

The way I fixed this was to introduce a tr_config.h in each renderer and to modify all of the renderer files to use TR_CONFIG_H and TR_LOCAL_H etc. It's then passed in as a compiler option in the Makefile to define TR_CONFIG_H to be the correct location which in turn defines all of the other TR_*_H locations.

As you can see from the OA renderer which touches less files, there are only a handful of files needed in the code/renderer_oa directory. I was able to remove a few files from your code/renderergl2 as well: tr_font.c, tr_image_*.c, tr_noise.c and tr_shadows.c. Long term, I think this will be a lot easier to maintain renderers. After I made those changes, your patch only touches Makefile, q_math.c and code/renderergl2/*.

The changes are at:

https://github.com/undeadzy/openarena_engine

and the branch is called opengl2_renderer.

Comment 67 James Canete 2012-03-13 21:59:48 EDT

I haven't tested with MD4, so it's likely a lot of that code doesn't work.  I'll have a look at it later though.

Wrt integrating OpenArena renderer changes, I'm open to integrating any useful features OA has, though it'll be a bit difficult at the moment.  I'm currently implementing a couple extra features (sRGB, automatic normal map generation) so any integration of other stuff will have to wait.

I'm in favor of keeping this renderer as one renderer instead of forking a separate, OA compatible version, though.  It's already got some #ifdef'ed changes so it fits with the Reaction codebase, and as long as the OA changes aren't too disruptive I'm thinking they could integrate the same way.

Comment 68 James Canete 2012-03-21 03:08:11 EDT

Created attachment 3098 [details]
v29 of renderer patch

v29.

Lots of changes again.

-Added Torrance-Sparrow specular, activated with r_specularMapping 4.  I read up on this (specifically http://www.gamedev.net/topic/594687-finally-nailing-the-torrance-sparrow-shader-once-and-for-all/ and the code linked), and after playing around with the equations, I realized that it's actually pretty close to the TriAce specular I already have, only the scaling factor was different and a couple of other factors were missing, so implementing this was trivial.
-Added automatic normal map generation, activated with r_genNormalMaps 1, defaults to 0.  This uses a Sobel filter to naively create a normal map for pretty much every texture that comes by, and uses them when appropriate.  It's currently pretty wasteful memory-wise, and really doesn't look very good.
-Added sRGB support, activated with r_srgb 1, defaults to 0.  This uses GL_EXT_texture_sRGB to load all diffuse and specular textures as if they're in the sRGB colorspace, and also uses GL_EXT_framebuffer_sRGB to set the pre-screen frame buffer to the sRGB colorspace, so everything displays properly.  This is most useful when dealing with HDR, so lighting calculations are done in proper linear space, but isn't really useful with LDR or assets that weren't created with it in mind, as it tends to wash out those.
-Changed the tone mapping to a more filmic version, as described in http://filmicgames.com/Downloads/GDC_2010/Uncharted2-Hdr-Lighting.pptx . 
-Various small changes and fixes.

Known bugs:
-r_dlightMode 2 is broken (again)
-MD4 is probably still broken (haven't looked at it yet)

Comment 69 James Canete 2012-06-18 23:08:11 EDT

Created attachment 3228 [details]
v30 of renderer patch

After a long while, v30.

The biggest change in this one is cascaded shadow maps.  I haven't quite worked out a good way to hook them up for map makers yet, so currently there's only "r_testSunlight 1" to mess around with.  They work with pretty much any map with a skybox.  There may be a bug with skyportals though, this requires more testing.

There's a bunch of other fixes, but I don't remember all of them, so here's the relevant part of the changelog from Reaction:
- Add r_depthPrepass.
- Improve parallax mapping
- Add AVG_MAP, BLACK_LEVEL, and WHITE_LEVEL defines to tonemap shader
- Fix a bug in fragment shader tangent space calculations
- Fix a bug in new curves code
- Change imagelist cmd to give more relevant information
- Remove DiffuseRoughness shader param, add SpecularExponent shader param
- Change R_SubdividePatchToGrid() to subdivide patches more evenly
- Calculate tangent space in fragment shader instead of storing per vertex
- Fix sun flare with sky portals.  Sun flare must be inside sky portal.
- Speed up tone mapping
- Add fast light shader path when r_normalMapping and r_specularMapping are 0
- Revise FBO blitting code (Still needs more work)
- Detect GLSL version
- Use GL_EXT_draw_range_elements
- Reserve FBOs before shaders, as recommended in nvidia docs
- Minor tweak in VBO allocation.
- Update tr_font.c to ioq3 latest (r2232)
- Minor image code cleanup.
- Added support for LATC(normalmaps) and BPTC(everything else) image compression
- Use faster framebuffer blits whenever possible.
- Optimized lightall shader for older hardware.
- Fixed case in GLSL_PrintInfoLog when log is 0 length.
- Clear render buffer on allocate, fixes corrupt screen issue
- Use GL_RGB16_F instead of GL_RGBA16_F for hdr render buffer
- Don't reserve render buffers when textures are used for a FBO.  Fixes a crash when GPU memory is at a premium.

Comment 70 Evan Goers 2012-06-18 23:34:56 EDT

Is there currently somewhere where all added and changed features are listed? I would like to see all the new cvars and changes/additions in one big list rather than having to comb the bug comments.

Comment 71 James Canete 2012-10-17 06:20:17 EDT

Created attachment 3284 [details]
v31 of renderer patch

Finally, v31.

I know I usually type out a litany of changes in here, but this time I'm just going to copy & paste the commit log from Reaction, then start writing a readme outlining all the (currently working) changes, which I will upload later.

Changelog:
- Added .mtr file support.  .mtr files are just .shader files that are accessed first.
- Added support for q3gl2_sun shaderparm in sky shaders to control sun shadows
- Added r_shadowFilter 0/1/2 for cascaded shadow maps.
- Added sun shadow support cvars.
- Changed r_testSunlight to r_forceSun 0/1/2 and r_sunShadows 0/1.

Comment 72 James Canete 2012-10-18 06:29:04 EDT

Created attachment 3285 [details]
v31a of renderer patch

Oops, r2328 broke this, so here's a quick update.

Comment 73 James Canete 2012-10-18 09:37:51 EDT

Created attachment 3286 [details]
README for Rend2, v31a

And here's the documentation, currently a work in progress, like the patch itself.  I think I've covered most of the (working) features, though.

I've also settled on the name "Rend2" for now, though I'm sure I'll change my mind later.

Comment 74 q3urt.undead 2012-10-19 19:02:45 EDT

(In reply to comment #72)
> Created attachment 3285 [details]
> v31a of renderer patch
> 
> Oops, r2328 broke this, so here's a quick update.

There are two minor issues with this patch.

1) Your r2328 changes are not identical to upstream.  There is one location where you use MAX_REFENTITIES but upstream uses REFENTITYNUM_MASK in tr_main.c:

-       *entityNum = ( sort >> QSORT_REFENTITYNUM_SHIFT ) & MAX_REFENTITIES;
+       *entityNum = ( sort >> QSORT_REFENTITYNUM_SHIFT ) & REFENTITYNUM_MASK;

2) You are including an out of date tr_font.c in renderergl2.  It's missing some minor upstream changes.

Comment 75 James Canete 2012-10-23 23:33:05 EDT

Created attachment 3289 [details]
Rend2, v32.

v32.

This will probably be the last version of Rend2 before it goes into ioq3 proper.  I'm just posting it up here for people to look over in case I have some obvious errors and such before I commit.

The only notable new things are tri-Ace's Oren-Nayar method (r_normalMapping 3), and screen space ambient occlusion (r_ssao 1).

Comment 76 James Canete 2012-10-24 14:27:53 EDT

Created attachment 3290 [details]
Rend2, v32a.

Oops, v32a.

This one fixes a bunch of glitches that showed up in v32, such as the world not rendering with MSAA. :)

Comment 77 James Canete 2012-10-25 21:14:03 EDT

Created attachment 3292 [details]
Rend2 the FINAL

Final version before I merge.

Comment 78 James Canete 2012-10-25 21:25:55 EDT

Applied, r2329

Comment 79 Roman Jay Almaza 2022-01-10 03:31:30 EST

https://www.yelp.com/biz/el-monte-custom-cabinets-el-monte?osq=el+monte+custom+cabinets

Top of page