View Full Version : Xbox 360 vs. PS3 - A Processor Comparison


Pages : [1] 2

mterzich
11-28-06, 12:07 AM
I am trying to be impartial. I am not a gamer and do not have any interest of ever purchasing a game console but from a programmer perspective both the Xbox 360 and PS3 intregues me.

When I initially looked at the specifications of the Cell (PS3) and Xenos (Xbox 360) processors, it appeared that the cell processor had a big advantage over the xenos processor if both were able to harness the maximum amount of power. After looking into more detail I have come to the conclusion that the xenos processor will probably be able to perfom better than the cell processor under almost all conditions.

Both processors are stripped down and modified versions of the IBM 970 PowerPC. Each core executes at less than 1/2 the speed of the IBM 970 at the same clock frequency due to the fact that the IBM 970 has multiple execution units and will perform out-of-order execution (parallel processing) whereas the cell and xenos processors only have a single execution unit and will perform in-order execution (sequential processing). The following link illustrates the performance of a PS3 at 3.2 GHz and a Power Mac G5 at 1.6 Ghz using the linux operating system.

http://www.geekpatrol.ca/2006/11/playstation-3-performance/

Linux runs on the Power Processor Element (PPE) of the cell processor so the results should be similar to one core of the xenos processor since all three cores are the same. Both processors are clocked at 3.2 GHz.

The similarities of the two processors ends there. The xenos processor has 3 identical PPE cores where as the cell processor has only 1 PPE core and 7 SPE cores.

Cell Processor

One general purpose PPE core that is used for the OS and the game application.
512 MB total memory on 2 buses which can be accesed directly only by the PPE core. 256 MB of processor main memory and 256 MB of memory used by GPU.
512 KB L2 cache for the PPE.
32 KB L1 instruction cache and 32 KB L1 data cache for the PPE.
7 specialized SPE cores. One is used for the OS leaving 6 for the game application.
256KB SRAM per SPE. No common memory between SPEs and SPE cannot access the PPEs main memory directly but the PPE can access the SPEs memory directly.
Communications between SPE memory or to the PPE memory is performed via the Element Interconnect Bus (EIB) by either accessing ports or via DMA.
SPEs do not have branch prediction capability.

Xenos Processor

3 General purpose PPE cores that are used for the OS and game application.
512 MB main memory that is shared by all three cores and GPU.
1 MB of L2 cache that is shared by the 3 cores (333 KB per core average).
32 KB L1 instruction cache and 32 KB data cache for each core.
2 Hardware threads per core.

Programming the 360

The OS does not use core 0 and uses only about 3% of the power of core 1 and 3% of the power of core 2. Therefore about 98% of the processor power of all three cores are available for the game application.

Programming the 360 is fairly easy and straight forward since a large amount of shared main memory is available, a relatively large amount of shared L2 cache is available, and information can be quickly and easily passed between different threads (cores) of the application by just passing pointers.

Typically an application will initially be developed using only one thread of a core. Once the application is developed the application can then be segmented to use multiple cores and possibly multiple hardware threads of each core. The easiest seqmentation would be to place the game control plus AI code in one core and graphics rendering code in another core. As soon as the AI code completes its operation, it would queue the information for graphics rendering core and immediately start to process the next frame. The graphics rendering code will be executing code for the current frame and the AI will be executing code for the next frame simultanously.

Segmenting a program beyond that becomes more difficult. The developer would have to first determine where the bottleneck is occuring. If it was in the AI code, he would then have to determine if parallel processing can be performed on the code (ex. In a racing program, it may be possible for the main program to process the AI for 5 racing cars and another core process the AI for the other 5 racing cars on the track at the same time). If the bottleneck was in the graphics rendering code, it may be possible for part of the graphics rendering code to be done in parallel in another core.

When a program is seqmented among all three cores, one of the cores may be active 100% of the time but the other two may only be active a very small time (10%, 20%, 50%, etc.). In this case more segmentation may be required of the core that is active 100% of the time. In this case, a new hardware thread can be added to one of the less active cores to handle 2 processes at one time. Once all the available hardware threads are used and more segmentation is still required, software threads (although not as efficient as hardware threads) can then be added until that core approaches 100% usage.

Once all three cores are executing near 100%, the maximum frame rate, sophistication, and detail capabilities will have been acheived. If the AI is issueing frames faster than the GPU can process them (maximum 60 fps at 720p or 30 fps at 1080i), more detail or sophistication can be added

Programming the PS3

The PS3 is so much more difficult to program than the 360. In a sense it is designed similar to multiprocessor systems used by specialized customers such NASA Ames Research Center. The concept is based on the principle that there is a very large amount of repetive mathematical data that can be performed in a parallel or a segmented sequential fashion (ex. one core multiples two arrays of 10000 numbers and then passes the output array to another core which performs divides on individual elements in the array which will pass the array to another core which performs some other operation on the data, etc. After the first core finishes its operation, it will acquire more data and perform the same operation).

Like the 360, the application would initially be developed using the PPE core. Next you would think that the PS3 (just like the 360) would be able to segment the game control plus AI code into one core and the graphics rendering code into another core. However that is not possible! Since the total application code may be about 100 MB and the SPE only has 256KB of memory, only about 1/400 of the total code can fit in one SPE memory. Also since there isn't any branch prediction capabilities in an SPE, branching should be done as little as possible (although I believe that the complier can insert code to cause pre-fetches so there may not be a big issue with branching).

Therefore the developer has to find code that is less than 256KB (including needed data space) that will execute in parallel.

Even if code can be found that can be segmented, data between the PPE and the SPE has to be passed back and forth via DMA which very slow compared of a pointer to the data like the 360.

If we assume that enough segment code was found that could use all the 6 SPE cores assigned to the game application, now the developer would try to balance the power among the cores. Like the 360, some or all the cores may have a very low utilization. Adding more hardware threads are not possible since each core has only one hardware thread. Adding software threads probably will not work due to the memory constraint. So the only option is an overlay scheme where the PPE will transfer new code using DMA to the SPE when the last overlay finishes processing. This is very time consuming and code has to be found that does not overlap in the same time frame.

Future Generation Consoles

In a few years both Microsoft as well as Sony may want to release the next generation game console. When they do that they usually want to maintain backward compatibility with their present console (multi-core applications). You would think that they could get 3 times the processor power by using the same design but instead running the processor at 9.6 Ghz. If that could be done, there wouldn't be any problems maintaining backward compatability.

Degrading the internals of this generation game console processors was done purposely by both Microsoft and Sony as a cost saving issue. It was cheaper to increase the clock frequency and degrade the internals that it would be to have a full featured PowerPC at 40% of the current clock frequency.

However, over the last several years it has been more and more dificult to increase the clock frequency and performance has been improved primarily redesigning the internals of the processor as well as implenting dual core processors and occasionally quad core processors.

In the case of 360, that should be a fairly easy and cost effective upgrade in several years. Since the Xenos processor is a fairly standard design, Microsoft probably would be able to purchase an off the shelf 3.2 GHz IBM 970 Quad processor for a pretty reasonable price (only a dual processor currently exists at that frequency). In this case three cores should give about 2.5x the power the current Xenos processor plus an additional core for a total of over 3x the processor power. In the worst case, Microsoft could purchase an Intel based quad processor like apple did (single and dual core processors) for all its new systems (I expect that PowerPC prices got too high) and use emulation for all old game applications. The following link indicates that a single core Intel processor with the same clock frequency as a single core PowerPC executes about the same overall even though emulation is being performed by the Rosetta operating system. All new applications would then use Intel native compiliers for better performance.

http://www.macworld.com/2006/03/firstlooks/minibenchmarks/index.php

If developers develop game applications using the SPEs, Sony will probably have a problem. If applications used SPEs it would be difficult to change the processor design although it may be possible (but extremely difficult) to emulate the SPEs and keep a decent level of performance. Upgrading all SPEs and the PPE to be fully featured would probably not be cost effective. Sony may decide to keep the SPEs at their current speed and capabilities and add 4 fully featured PowerPC cores. New compiliers would then probably not allow developers to use the SPEs for future development.

Conclusion

In my opinion, I have serious doubts that very many developers (except exclusive developers) will program the SPEs on the PS3. Complexity is enormous, development time is large, and potential for bugs is also great.

I suspect that Gears Of War is already using the multiprocessor capability but still only using about 50% of the total power available.

It would be hard to imagine that a PS3 game application could perform better than a 360 game application without a great deal of development time.

Important updates

Read the following link for important updates to this document.

http://www.avsforum.com/avs-vb/showthread.php?p=9027534#post9027534

References

http://dpad.gotfrag.com/portal/story/35372/?spage=1
http://www.hardcoreware.net/reviews/review-348-1.htm
http://en.wikipedia.org/wiki/Synergistic_Processor_Element
http://en.wikipedia.org/wiki/Xenos
http://arstechnica.com/cpu/03q1/ppc970/ppc970-2.html

BuGsArEtAsTy
11-28-06, 12:28 AM
Not a programmer but here are my 2¢.

The Xenon CPU seems like a good chip that fits in with Microsoft's programming preferences - a more PC-like design as it were, more suited for quick and slick creation of developer tools, and more suited to the way most programmers think of CPUs.

OTOH, I would guess that Cell may be more well suited for some streaming media type applications.

Also, I would guess that some x-platform games may be target mainly only the PPE core, which means they would likely run better on the 360's Xenon CPU in most instances, despite the lower clock speed.

I suspect that Gears Of War is already using the multiprocessor capability but still only using about 50% of the total power available.
The HD DVD group claims that HD DVD playback is the toughest application out there for the Xbox 360, and tougher than every 360 game, except for Gears of War in some parts of the game.

mterzich
11-28-06, 12:42 AM
OTOH, I would guess that Cell may be more well suited for some streaming media type applications.
The PS3 should excel in areas such as decoding MPEG streams. The data is vast, repetive, and the developer could possibly use either a parallel or sequential fashion concept.

Also, I would guess that some x-platform games may be target mainly only the PPE core, which means it would likely run better on the 360's Xenon CPU.
I suspect that 90% or more of the current Xbox 360 applications are currently using a single core although dividing the game control plus AI and the graphics rendering is not that complicated of a task. However when developers get the program working fairly well on a single core and bug free, they tend to leave the segmentation and new enhancements for the next release to reduce the possibility of bugs.

Single core applications should run about the same on both consoles.

The HD DVD group claims that HD DVD playback is the toughest application out there for the Xbox 360, and tougher than every 360 game, except for Gears of War in some parts of the game.
I don't think it is a tough application. It just requires a lot of CPU power so they probably have to use all three cores and MPEG streams are designed for milti-core processors..

calv1n
11-28-06, 12:49 AM
mterzich
Good post some great (and technical) information there. It breaks down in more detail what some programmer have been saying for awhile now.
I think graphically we are not going to see a big difference between the 2 consoles over their lifetime but the next round will be interesting.
Cheers

DLove23
11-28-06, 12:51 AM
I suspect that Gears Of War is already using the multiprocessor capability but still only using about 50% of the total power available.


Scary.

mterzich
11-28-06, 12:54 AM
mterzich
Good post some great (and technical) information there. It breaks down in more detail what some programmer have been saying for awhile now.
I think graphically we are not going to see a big difference between the 2 consoles over their lifetime but the next round will be interesting.
Cheers
Once applications are witten using the multi-core capabilities of the 360, I would expect that the 360 may pull further away from the PS3. As a programmer the PS3 appears to be on a scale of massively more compliciated to program when using the SPEs.

BuGsArEtAsTy
11-28-06, 12:54 AM
I don't think it is a tough application. It just requires a lot of CPU power so they probably have to use all three cores and MPEG streams are designed for milti-core processors..
That's what I was meaning. ie. Toughest on the CPU.


I suspect that 90% or more of the current Xbox 360 applications are currently using a single core although dividing the game control plus AI and the graphics rendering is not that complicated of a task. However when developers get the program working fairly well on a single core and bug free, they tend to leave the segmentation and new enhancements for the next release to reduce the possibility of bugs.
I must say when the specs of the 360 CPU came out, I was a little bit surprised that it was a triple-core design. I wondered why it wasn't just a dual-core design, for less money and higher yields.

Could it have something to do with AVC and VC-1 decoding? Or is it something else?

uzziah
11-28-06, 01:04 AM
Scary.


shows you how important optimization is; the 360's hardware isn't THAT incredible, it's just that the dev's have the ability to optimize so greatly knowing exactly what their game is playing on; this is the difficulty of PC's; even though they may (and will) become much more powerful than our cool white box

mterzich
11-28-06, 01:08 AM
I must say when the specs of the 360 CPU came out, I was a little bit surprised that it was a triple-core design. I wondered why it wasn't just a dual-core design, for less money and higher yields.

Could it have something to do with AVC and VC-1 decoding? Or is it something else?
I suspect it was more to do with marketing than a definate need. Although they may need that much processor power for the HD-DVD (not certain). Already many consider the PS3 better than the 360 since it has 8 cores but I think the 360 is a much better and more powerful design. If Microsoft delivered a dual processor console many gamers may not perceive it as not much better than the Wii (not knocking the Wii but it is not in the same catagory as either the 360 or PS3). Also the third processor will probably give enough processor power for the next couple of years.

mterzich
11-28-06, 01:16 AM
shows you how important optimization is; the 360's hardware isn't THAT incredible, it's just that the dev's have the ability to optimize so greatly knowing exactly what their game is playing on; this is the difficulty of PC's; even though they may (and will) become much more powerful than our cool white box
You are correct. The 360 has 3 degraded cores but together they produce about the same power as the highest end desktop processors. Also dual core desktop processors are becoming more common for desktops. However a PC game cannot be designed only for the most powerful PC or dual core PC because that would reduce sales. So they design those for the current average desktop and will probably play OK on a lower end desktop.

FiveMillionWays
11-28-06, 01:34 AM
I don't care which one is better then the other because I just get all the good consoles anyway. When I got my first XBOX I stopped playing my PS2. I never really take anything Kidtendo makes. This time though I may get one for gimmicks. They should try and enable 720 P for those of us with high end gaming rigs though. 480P sucks on hdtvs.

mterzich
11-28-06, 02:06 AM
I don't care which one is better then the other because I just get all the good consoles anyway. When I got my first XBOX I stopped playing my PS2. I never really take anything Kidtendo makes. This time though I may get one for gimmicks. They should try and enable 720 P for those of us with high end gaming rigs though. 480P sucks on hdtvs.
It wouldn't be that easy to enable 720p on the Nintendo. The are only using a 729 MHz single core full features PowerPC processor. This is about 1/2 the processor power of a single PPE core on the 360 or PS3. I suspect that would be enough processor power to produce 720p graphics. However I suspect that Nintendo determined that they would rather have sophistication at 480p rather than previous generation applications upgraded to 720p. I suspect that it is one or the other but not both.

Extra
11-28-06, 02:28 AM
It wouldn't be that easy to enable 720p on the Nintendo. The are only using a 729 MHz single core full features PowerPC processor. This is about 1/2 the processor power of a single PPE core on the 360 or PS3. I suspect that would be enough processor power to produce 720p graphics. However I suspect that Nintendo determined that they would rather have sophistication at 480p rather than previous generation applications upgraded to 720p. I suspect that it is one or the other but not both.

I would image the resolutions would be more directly impacted by the GPU rather than the CPU no?

From what I understand, CPU seriously takes a backseat to GPU in the direct realm of graphics. CPUs might perform some geometry setups and whatnot, but I'd imagine the GPU matters a whole lot more than the CPU for pushing actual pixels (720p).

mterzich
11-28-06, 02:51 AM
I would image the resolutions would be more directly impacted by the GPU rather than the CPU no?

From what I understand, CPU seriously takes a backseat to GPU in the direct realm of graphics. CPUs might perform some geometry setups and whatnot, but I'd imagine the GPU matters a whole lot more than the CPU for pushing actual pixels (720p).
I'm now not sure if 243 MHz Hollywood GPU is powerful enough to render the graphics very well. You would think that at 243 MHz it should work ok but probably not too great. However after searching the web it appears that it is only about 2x the speed of the GameCube and that is based on speculation since the vendor has not released specifications. In that case it probably wouldn't work to well.

Still the processor requires quite a bit more CPU power when a lot of detail is included in the frames so Nintendo may have chose to not support 720p for both reasons.

Extra
11-28-06, 03:01 AM
Regarding the Hollywood GPU:

720p is 3x the pixel count of 480p, so it requires 3x more raw fillrate.

They could do 720p, just like PS3 could do 1080p, but it would only work on certain games. I could see Mario being in 720p, for example, since the game has that "cartoon" look they could get away with less detailed textures. The thing to keep in mind was Soul Calibur 2 for Xbox 1 was 720p (but only in 4:3 not widescreen). So they could certain do it if the chose to.

But as a norm...I don't think so.

JerryNY
11-28-06, 04:36 AM
You can make an argument either way about which approach is better as far as CPU's go; 3 identical general purpose cores (each hyper-threaded for a total of six threads simultaneously on the 360) or one general purpose core in the PS3 with the additional 6 useable SPE's but the GPU is where it's at in the current state of these consoles. Both these consoles are acting in much the same way Multiple CPU machines would on the desktop. Getting any code to utilize all that CPU hardware all the time is nearly impossible. Anyone that has ever used a multi-CPU PC/Mac can tell you how truly rare it is to see all the CPU resources eaten up. I have had dual CPU Macs for years and now own a quad core Xeon Mac Pro which is a monster but getting something other than a professional rendering or video encoder to use all that horsepower is truly rare.

When it comes to GPU's I think the 360 is a generation ahead of the PS3. It doesn't seem to make sense that a console GPU that debuted a year ago would be more advanced than one that is brand spanking new but this appears to be the case. The 360's GPU is essentially Ati's next gen chip (R600) which is a major break from what is out there. The PS3's GPU is a modified version of a 7800GTX which is nice but not even what would be considered leading edge in a PC today. The 360's GPU essentially eliminates memory bottlenecks and allows for things like 4xAA with almost no penalty in performance because of this. It is a very efficient design and far easier to squeeze every ounce of resources out of than the PS3's GPU. Shaders are the big thing with graphics today and in real world performance the 360's GPU is far more efficient and in the end more powerful.

-Jerry C.

Extra
11-28-06, 05:41 AM
^^ I agree.

I find it ironic and comical that nVidia criticized the Xenos' unified shader structure at first...and now the Geforce 8800 uses unified architecture. MMmmm yea. Nothing says "owned" more than following your competitor's footsteps after trashing it.

Sony seems like they just didn't spend much effort on the GPU. The story goes that they wanted to produce their own GPU like they did with the PS2, but time restraints (and let's face it, nVidia/ATI are so much ahead of anyone else right now there's no way Sony can compete with them) forced them to change their mind. The problem is they weren't working with nVidia since the beginning like MS did with ATI, therefore their options were limited. So it doesn't surprise me at all that MS got a better GPU out of their deal.

I think from the PC gaming perspecive, it's been basically established that GPU > CPU for games. A video card upgrade does a LOT more for a game than a CPU upgrade.

mterzich
11-28-06, 06:00 AM
You can make an argument either way about which approach is better as far as CPU's go; 3 identical general purpose cores (each hyper-threaded for a total of six threads simultaneously on the 360) or one general purpose core in the PS3 with the additional 6 useable SPE's but the GPU is where it's at in the current state of these consoles. Both these consoles are acting in much the same way Multiple CPU machines would on the desktop. Getting any code to utilize all that CPU hardware all the time is nearly impossible. Anyone that has ever used a multi-CPU PC/Mac can tell you how truly rare it is to see all the CPU resources eaten up. I have had dual CPU Macs for years and now own a quad core Xeon Mac Pro which is a monster but getting something other than a professional rendering or video encoder to use all that horsepower is truly rare.

When it comes to GPU's I think the 360 is a generation ahead of the PS3. It doesn't seem to make sense that a console GPU that debuted a year ago would be more advanced than one that is brand spanking new but this appears to be the case. The 360's GPU is essentially Ati's next gen chip (R600) which is a major break from what is out there. The PS3's GPU is a modified version of a 7800GTX which is nice but not even what would be considered leading edge in a PC today. The 360's GPU essentially eliminates memory bottlenecks and allows for things like 4xAA with almost no penalty in performance because of this. It is a very efficient design and far easier to squeeze every ounce of resources out of than the PS3's GPU. Shaders are the big thing with graphics today and in real world performance the 360's GPU is far more efficient and in the end more powerful.

-Jerry C.
I agree that the GPU is very important and the the 360 GPU is well advanced of the PS3 GPU but without the CPU power the sophistication, AI, and detail could not be produced. Also there is not much that can be done about the GPU right now since it is already in the console.

I also agree that very few programs are available today for a PC or Mac that use more than one processor core. The reason is twofold, the complexity and manpower to develop programs that work with either single or multiple cores as well as the lack of multi-core systems that would make the development profitable. It will be a long time before more than one or two programs will use the quad capability due to its extreme rarity.

In the case of the game consoles, the same mutli-core hardware is currently available for all customers so developers have a reason (and profit motive) to develop multi-core applications. Just like HDTV it is a chicken and egg issue. Very few multi-core applications will be developed until there are a lot of consoles available. Eventually to survive most applications that desire a high degree of sophistication and a large amount detail will have to use multi-core.

Developing any application for multi-core use is difficult. However developing a multi-core application that uses SPEs is extremely complex. Just image that the SPE does not even have more memory than the original MS-DOS system from 1981 (although some initially had 128KB but quickly most went to 256K and than 512KB and finally to the maximum 640KB). Then the constant use of DMA to transfer data to and from the SPE is very slow and complex. Finally synchronization is complex and is usually controlled by the PPE core. In comparison the Xenos processor uses common memory, has a large amount of memory per process, can use a memory queue to communicate, and can therefore be totally asynchronous.

For example on the 360, during a certain time in the program the AI process could be generating frames faster than the graphics rendering process could be processing the frames. In this case there isn't any great concern since when the graphics rendering procces finishes processing the last frame, it just deletes that information from the queue and starts processing the next frame. No synchronization is necessary. However if the graphics rendering process gets too far behind, noticeable lag could start to occur so the AI process would just temporarily suspend itself if more than an acceptable amount of frames are in the queue.

On a PS3 that is much more complex. Since data is transferred via DMA, that operation is usually initiated by the PPE and usually won't be initiated until the SPE is idle otherwise it would probably corrupt the currently used data in SPEs memory. The PPE could transfer the block of data to a different part of the SPEs memory but with the very limited memory available for the SPE that may not be feasible. Also the PPE cannot easily determine what is happening in the SPEs memory unlike the 360 where processes can aquire semaphores during critical code while the queue is being updated. I'm sure there is some safe mechanism for performing that type of operation but it is probably quite complex and was probably not well designed for an asynchronous operation.

Rhegaana
11-28-06, 10:31 AM
It seems like the 360 is a winner in every aspect, Is there a situation where the PS3 hardware has an advantage? BTW Im no techie or programmer just curious so I can wow my friends during gatherings!

mboojigga
11-28-06, 11:08 AM
It seems like the 360 is a winner in every aspect, Is there a situation where the PS3 hardware has an advantage? BTW Im no techie or programmer just curious so I can wow my friends during gatherings!


:D Your silly.

Jetrii
11-28-06, 01:49 PM
Excellent post mterzich. Perhaps there is something you could answer for me, how did Sony come up with 1.8 Teraflops for RSX? It appears that the X1950XTX and 8800GTX can achieve around 500-600 GFLOPS respectively. Seems like they just pulled that number out of their ass.

skogan
11-28-06, 02:16 PM
Interesting thread, though I only understand half of it. :)

mterzich
11-28-06, 02:29 PM
Excellent post mterzich. Perhaps there is something you could answer for me, how did Sony come up with 1.8 Teraflops for RSX? It appears that the X1950XTX and 8800GTX can achieve around 500-600 GFLOPS respectively. Seems like they just pulled that number out of their ass.
I'm not an expert on GPUs so I can't really answer that but it does appear that they pulled it out of the air. The following link gives a good analysis of the 2 GPUs.

http://dpad.gotfrag.com/portal/story/35372/?spage=7

DLove23
11-28-06, 02:45 PM
It really is surprising to me that the PS3 had a year after the 360 launched and its hardware is inferior. I'm no fanboy of either, but right now I really don't care as much that I didn't pick up a PS3 at launch. I wanted one and figured it would be the next big thing in gaming but it appears I already have that console in the Xbox 360. The only thing, it appears, that the PS3 has going for it is a built in Blu-Ray drive but if Blu-Ray doesn't catch on, oh boy. I'll tell ya what, I sure am enjoying my HD-DVD drive for the 360 and the selection of movies out there right I think looks just as good if not better for HD-DVD than Blu-Ray. The PS3 maybe be the most wanted unit right now in terms of demand with what the barnstorming of stores and crazy ebay auctions but you gotta think that Microsoft has really come out of this smelling like a rose. The 360 not only has a years headstart in the marketplace, a superior online gaming platform in Live, but also a superior console even after a years' head start.

Jetrii
11-28-06, 03:04 PM
I'm not an expert on GPUs so I can't really answer that but it does appear that they pulled it out of the air. The following link gives a good analysis of the 2 GPUs.

http://dpad.gotfrag.com/portal/story/35372/?spage=7

Yea, I read that last week. Pretty much confirmed what I already knew.

Some people don't understand how a console released a year after the Xbox 360 could possibly be inferior. Simply put, while the console was released a year later, the technology is not a year ahead. The specs for the PS3 were pretty much finalized back in March. Sony was able to add small things (such as HDMI to the core) but they couldn't change the processor/GPU/Ram.

JData
11-28-06, 03:13 PM
Yea, I read that last week. Pretty much confirmed what I already knew.

Some people don't understand how a console released a year after the Xbox 360 could possibly be inferior. Simply put, while the console was released a year later, the technology is not a year ahead. The specs for the PS3 were pretty much finalized back in March. Sony was able to add small things (such as HDMI to the core) but they couldn't change the processor/GPU/Ram.


I think there was a major or minor design flaw in the PS3 that pushed back the launch from Spring to Fall. From what I gather from "Darknight's" posts at the PS3 forum. Hence, why they couldn't delivers during launch.

mterzich,

I applaud your efforts in explaining the programming differences between the two products. I am still curious on the processing or GPU power of the PS3 if it can actually play games at 1080P or just upscale them from 720P.

I recall another developer cited that the 360 has a lot more power to do 1080P easily than the PS3.

mterzich
11-28-06, 03:20 PM
It seems like the 360 is a winner in every aspect, Is there a situation where the PS3 hardware has an advantage? BTW Im no techie or programmer just curious so I can wow my friends during gatherings!
IMO the only advantage to the PS3 is that it has 8 cores instead of the 3 cores on the 360 but if the power is not harnessable or extremely difficult to harness, it means very little.

There as so many numbers games going on by both parties. Floating point operations is another example which are expressed as GFLOPS (billions of floating point operations per second). The 360 claims 115 GFLOPS whereas the PS3 claims 218 GFLOPS. That would amount to 38.3 GFLOPS per core on the 360 and assuming the PPEs are the same on the PS3 and 360, that would amount to the PPE core being approximately 38.3 GFLOPS and each SPE being approximately 25.6 GFLOPS on the PS3 by my calculations but others have calculated it a different way.

The problem is that the GFLOPS that are expressed are spread among cores and they are vector operations (massive repetive data using the exact same calculation has to be streamed through the vectors unit to acheive close to those figures). Only in physicis, nuclear research, weather prediction, etc. is there that much repetive data that is availabe that could possibly use multiple cores.

On a games console it would be extremely difficult to get multiple cores executing vector code simultanously let alone one core. In reality most developers can not use vector code (calculations are not repetive and the amount of data is not vast). So when they need floating point capabilities, they use the non-vector capabilities of the processors which typically give about 5 GFLOPS or less per core.

However, if developers can find such data that can be coded using the vector units and spread among many cores, the PS3 will have the advantage since SPEs were designed for those such operations and there is a large amount of SPEs.

As long as applications are using only one core, both the PS3 and 360 are on a pretty level playing field and it would be expected that the PS3 would outsell the 360 due to fan loyality. However once multiple cores are used on the 360, it should control the market for game sophistication, detail, and performance.

Jetrii
11-28-06, 03:34 PM
IMO the only advantage to the PS3 is that it has 8 cores instead of the 3 cores on the 360 but if the power is not harnessable or extremely difficult to harness, it means very little.

There as so many numbers games going on by both parties. Floating point operations is another example which are expressed as GFLOPS (billions of floating point operations per second). The 360 claims 115 GFLOPS whereas the PS3 claims 218 GFLOPS. That would amount to 38.3 GFLOPS per core on the 360 and assuming the PPEs are the same on the PS3 and 360, that would amount to the PPE core being approximately 38.3 GFLOPS and each SPE being approximately 25.6 GFLOPS on the PS3 by my calculations but others have calculated it a different way.

The problem is that the GFLOPS that are expressed are spread among cores and they are vector operations (massive repetive data using the exact same calculation has to be streamed through the vectors unit to acheive close to those figures). Only in physicis, nuclear research, weather prediction, etc. is there that much repetive data that is availabe that could possibly use multiple cores.

On a games console it would be extremely difficult to get multiple cores executing vector code simultanously let alone one core. In reality most developers can not use vector code (calculations are not repetive and the amount of data is not vast). So when they need floating point capabilities, they use the non-vector capabilities of the processors which typically give about 5 GFLOPS or less per core.

However, if developers can find such data that can be coded using the vector units and spread among many cores, the PS3 will have the advantage since SPEs were designed for those such operations and there is a large amount of SPEs.

As long as applications are using only one core, both the PS3 and 360 are on a pretty level playing field and it would be expected that the PS3 would outsell the 360 due to fan loyality. However once multiple cores are used on the 360, it should control the market for game sophistication, detail, and
performance.

The PS3 does not have 8 core. It has 1 core and 7 SPEs. As for Gigaflops, they mean nothing. The PS2's CPU has 2X more gigaflops than the Xbox's CPU and it wasn't as powerful. Gigaflops are a very crude way to measure performance.

mterzich
11-28-06, 03:58 PM
I am still curious on the processing or GPU power of the PS3 if it can actually play games at 1080P or just upscale them from 720P.

I really don't know the answer to that but I expect that games could be developed for 1080p. The question will be how much detail can be added before major issues start to occur with the GPU.

A more interesting observation is the different ways that the 360 and PS3 appear to handle scaling. Again I'm not a gamer so I am going on what other people say.

On the 360, I believe that if your console is set to 1080i and a 720p game is played, the 360 will scale the resolution to 1080i (correct me if I am wrong). If you look at the average GPU on a PC, I don't think there is such a capability. I think that the GPU will normally be instructed to set the output to the game resolution and all data will be passed at the game resolution. It appears that the 360 has the capability to scale the data to whatever resolution that is defined for the game console. Is this done by the GPU or is it done by the CPU? Normally scaling is performed on the raw data so usually it is done by the GPU. But I expect that it can be done by the CPU working on the pre-rendered graphics. Is it possible that there is a CPU call to a library such as NewResolutionDataPtr = ChangeResolution(CurrentResolutionDataPtr, Game Resolution) which will return the same pointer if the resolutions are the same or else scale the data to the correct resolution. If this scheme was used, that code would probably use a large amount of CPU power so it should be placed in another core.

On the PS3, if you HDTV does not support 720p, the console resolution is set to 1080i and a game may be played that was developed using 720p data, the PS3 will downscale to the data 480p. If it is required that the scaling take place on the raw data, will Sony have the ability to upgrade the GPU firmware to fix the problem? If the scaling can be performed by the CPU, does a SPE have enough memory available to hold both the library as well as all one input frame as well as one output frame?

mterzich
11-28-06, 04:10 PM
The PS3 does not have 8 core. It has 1 core and 7 SPEs.
Both SPEs and PPE are Processor Elements which are cores. The main difference between an SPE and a PPE is the internal bus supporting elements but the instruction processor element is almost idential.

Jetrii
11-28-06, 04:19 PM
Both SPEs and PPE are Processor Elements which are cores. The main difference between an SPE and a PPE is the internal bus supporting elements but the instruction processor element is almost idential.

That is the thing, when people refer to them as 'cores', they are under the impression that each individual SPE is identical to one of the 360's core. 3 core or 8 cores...Someone without much knowledge on the topic could get the wrong impression.

mterzich
11-28-06, 04:23 PM
That is the thing, when people refer to them as 'cores', they are under the impression that each individual SPE is identical to one of the 360's core. 3 core or 8 cores...Someone without much knowledge on the topic could get the wrong impression.
That is why Sony marketing promptes the 8 core cell processor to confuse people. Also the average person does now the difference between a PPE and an SPE anyway so it doesn't make much difference.

Extra
11-28-06, 07:53 PM
I don't think either the CPU or GPU handles the scaling on the 360.

They have a hardware scaler outside of that.

But the fact is both consoles are rather weak for 1080p. Their memory buses are both 128-bit (as opposed to 256+ bits on the PC for the high end cards) and they lack that brute force to push the resolution. The PS3 GPU WILL be a lot weaker than the 7800 due to the 128-bit bus alone, however the 360 is a bit more equipt to handle this bottleneck because it's more efficient.

To put it in perspective: the bit rate for the memory bus really comes into play at high resolutions. At lower resolutions there will be next to no difference between identical cards at 128-bit and 256-bit - but once you crank up the resolution, add FSAA+AF etc, the 256-bit card will pull ahead by a huge margin. That's how it goes.

Blkout
11-28-06, 08:34 PM
But the fact is both consoles are rather weak for 1080p.


Maybe that's because 1080p doesn't really matter. 1080p is a marketing gimmick and Microsoft knew it. The only reason they recently added support is to keep Sony from throwing it out there as one of their "advantages."

Fact is, 720p and 1080i are going to be around for many more years and those are the standards for HD today.

darthrsg
11-28-06, 09:08 PM
Please, please, please post this in the PS3 Forum. The replies will be priceless. Facts like these keep the pimp hand strong.

5150
11-28-06, 09:22 PM
Is that English?

mterzich
11-28-06, 10:38 PM
I don't think either the CPU or GPU handles the scaling on the 360.

They have a hardware scaler outside of that.

Usually scaling is performed by a hardware scaler. If you look at a HDTV, STB, or upscaling DVD Player the hardware scaler resides in the chain either while or after the raw data (bit map of a screen) is created. On the 360, this would seem to put a hardware scaler directly in the the GPU.

In the case of the 360 I don't believe that a standard hardware scaler could be implemented prior to the GPU receiving the data since the data at the time is not in a raw data form.

It also doesn't seem possible that a hardware scaler can be implemented on the output of the GPU like an external scaler either. The reason that I say that is that it would appear that the data at that time could be basically corrupted. The following example is the reason that the data may be corrupted.

The GPU may be set to output 720p. Therefore it is expecting data that will have a maximum resolution of 1280x720.
The GPU than receives game data that was created for 1080i.
If scaling is not performed while the data is being received, what does the GPU do with the data that exceeds those parameters?

Therfore I have come to the following assumptions.

It seems highly improbable that the GPU chip was redesigned to include a hardware scaler.
It is possible that the GPU firmware was changed in a way (different from a standard PC GPU) to allow it to receive any resolution size data and than software scale it to the desired output.
A new hardware scaler could have possibly been designed that can work on the data other than raw data that could be implemented between the CPU and GPU.
The CPU performs all scaling capabilities on non-raw data prior to sending the data to the GPU.

It seems that the PS3 currently has limited scaling capabilities. If a game application is producing 720p data, the output resolution of the GPU will be changed to 720p. If the HDTV does not support 720p (many older HDTVs do not support 720p), the output resolution of the GPU will be set to 480p and the 720p data will be scaled to 480p (I don't know whether the scaling is performed by the CPU, GPU, or hardware scaler) but some believe that the PS3 does not have a hardware scaler.

rocko1290
11-28-06, 11:33 PM
good thread

HorrorScope
11-28-06, 11:49 PM
One can say this or that is factual or not but there is a certain reality here and that is it can't be impossible to program for otherwise there wouldn't be any games for it. Since there are games then it goes without saying the cpu can and is being harnessed, already. That will only improve, so maybe there is a bottleneck here or one over there, but the programmers are understanding it and like anything will get better as they go. I felt there was too much verbiage used towards how hard and complex it was to program for. But good stuff nonetheless.

mterzich
11-29-06, 12:43 AM
One can say this or that is factual or not but there is a certain reality here and that is it can't be impossible to program for otherwise there wouldn't be any games for it. Since there are games then it goes without saying the cpu can and is being harnessed, already. That will only improve, so maybe there is a bottleneck here or one over there, but the programmers are understanding it and like anything will get better as they go. I felt there was too much verbiage used towards how hard and complex it was to program for. But good stuff nonetheless.
No one ever stated that the PS3 was impossible or even difficult to program. In fact if an application is using only the PPE core, it is no more difficult to program than the Xbox 360.

Since the vast majority of 360 games are currently single core, porting of those games should be very simple and the PS3 should catch up to the Xbox 360 very quickly in the games available and the quality of the games.

A single PPE core has a clock frequency of 3.2GB as compared to 295 MHz PS2 processor which is 11x more CPU power as compared to the PS2 just using the clock frequency for calculations. However, the CPU power may be more or less than that depending on the internal design of each of the processors. That is one hell of a lot of CPU power and should be enough for most applications to create a lot of sophistication. I expect that most developers for both game consoles will try to create single core applications so that porting will be very easy.

The only current application that I suspect may use multi-core is Gears Of War. But even that application may still have enough CPU power to use only one core. It is hard to determine. Only in the future will it be determined if the PS3 is designed well enough to compete effectively with the 360.

However, both games consoles are well designed for the immediate future and one may not show any advantage over the other during that time. It was never meant to imply that one concept is superior to the other in the near term.

Just as the 360 has a dvd player and the PS3 has a blue-ray player for games, that will not make much of a difference in the immediate future. Most game developers for both consoles will keep the need for dvd space to the maximum allowed on the 360. However, just like the different processors, in the future an application may be written for the PS3 that exceeds the capacity of the 360 dvd player giving the PS3 an advantage.

I suspect that exceeding a one core design or using more dvd space than a 360 can accomidate will be more of an accident than by design. A game designer that wants to create a killer application may believe that he has enough single core CPU power or enough dvd space so the design process is performed. Finally all the pieces are put together and it is found that a performance problem is occuring or the 360 dvd space is too small. Instead of wasting all that development time and creating less sophistication, the product may ship on the PS3 for dvd capacity that is too small for the 360 and if the performance is bad, the multi-core capability may be implemented.

Fangrim
11-29-06, 01:41 AM
There's too many "I assume", and "I suspect" in the OP's post.

No doubt the 360 and PS3 are close, but I don't think the PS3 is as hard programming as some people would like others to believe.

Reading blogs and interviews with people who's actually programmed the 2 CPU's (not just read about them), lots agree that the Cell isn't easy, true, but it's far from impossible.

Lots of programmers see it as a challenge, and compares programming the Cell with the PS2 (actually, some think the Cell is easier to grasp than the PS2).

Sony's dropped the ball somewhat by not giving devs the tools they need, but when this gets fixed, I believe that the PS3 will put some distance to the 360. It's a close race, but I think that the PS3 do have an edge to the 360.

Comparing ports of games won't be fair, as the system it's originally developed on almost always will look better than the system onto which a title is ported.

And yes - I'm a Sony fan(boy), but I do respect the 360 as an awesome platform. There's just nothing I've seen on the 360 so far that makes me go "OOOH!". Sony's always delivered, and I believe they'll continue to do so.

Extra
11-29-06, 02:20 AM
^^ Aren't you "assuming" too much yourself? What do you base your opinion on? for every dev that says "programming for the cell is ok", I bet I can find at least 5 devs that says programming for it is a nightmare.

Besides, it's not like people are guessing at it from thin air. When you analyse the structure it's not that hard to see what prompts people to say that.

No one ever said it's impossible. In fact it's pretty damn easy if you just use the PPE. The difficulty has always been in maximizing performance, and the undisbutable fact is it's easier to maximize the performance out of a Xenon than from the cell.

You might take it as the cell has "more potential", but just know that somewhere down the line there might be a case of diminishing returns where the additional time spent will simply not be worth it.

JerryNY
11-29-06, 02:38 AM
The most important issue in the entire situation is INERTIA. Programers know what they know and like to continue doing what they know best. The reaction to multicore systems from developers has been somewhat mixed with regard to the paradigm shift from single threaded to multi-threaded systems. John Carmack, creator of Doom and one of the most knowledgeable people on the planet about 3D game engines, seems to have little love for the onslaught of multicore systems on both the desktop and in consoles.

You have to remember it wasn't so much a choice of going parallel as it was and is a matter of necessity. If Intel, AMD, IBM et. al could continue to keep ramping up the GHz w/o creating small furnaces that suck enough power to keep a small town lit they would be doing so today. The huge jumps in raw chip speed may be over for a while and the only way to get more power is add more cores which is really just another way of saying " let's throw more chips at a problem, not necessarily faster ones".

The PS3 has somewhat of an exotic architecture, not so much in the GPU department IMHO, which is impressive but the inertia I mentioned in the first line of my rambling post here is the problem. The 360 has been out for a year now and comes with excellent development tools AFAIK and why shouldn't it? MS has been making tools for game development on the Win/PC platform for years now and to most game houses developing on the 360 probably appears like developing on a specialized PC with frozen specifications. Even with all this going for it barely any 360 games are reported to use much more than one core in reality. Inertia rears its ugly head again.

The situation actually will improve for the 360 as more and more multicore PC's are out there and more and more game engines are written to take advantage of this fact. The 360 has 3 core so they will get more use if the ported game engine already has the architecture to make use of more than one core. The situation on the PS3 is somewhat different. It has only one general purpose CPU and the 6-7 SPE's which are really specialized co-processors that can not and most likely will not be able to handle certain kinds of code.

Programing a multi-threaded application on console with 3 general purpose CPU's like the 360 will inherently benefit from experience that a programer gets working on PC's with multiple cores. This same experience will not help a programer working on the architecture in the PS3.

The situation is not all bad though as first party exclusive titles on each console will most likely be impressive. The real problem is with multi-platform games. If developers take a lowest common denominator approach, as many bean counters like to dictate, they won't be spending too much time optimizing for much more than one CPU and this probably hurts the PS3 more than it does the 360.

-Jerry C.

Fangrim
11-29-06, 03:09 AM
^^ Aren't you "assuming" too much yourself? What do you base your opinion on? for every dev that says "programming for the cell is ok", I bet I can find at least 5 devs that says programming for it is a nightmare.

Besides, it's not like people are guessing at it from thin air. When you analyse the structure it's not that hard to see what prompts people to say that.

No one ever said it's impossible. In fact it's pretty damn easy if you just use the PPE. The difficulty has always been in maximizing performance, and the undisbutable fact is it's easier to maximize the performance out of a Xenon than from the cell.

You might take it as the cell has "more potential", but just know that somewhere down the line there might be a case of diminishing returns where the additional time spent will simply not be worth it.

As I said in my previous post, my assumptions are based on the interviews and articles from real developers. Some devs have spoken against the Cell processor, but on the other hand, some has stated that programming it is a challenge, as it is with every new platform. Noone ever said it was going to be easy, but I don't think it's as hard as this thread is trying to make it sound.

I do agree that the structure of the Cell poses a bigger challenge than the Xenos, and that it will take a bigger effort to utilizing the power. With that said, the 360 doesn't have a lot to show when it comes to utilizing the 360's power. A year in, and there's only a handful of games that stand out from the rest.

No game on the PS3 surpasses Gears of War (which is what most people use at benchmark of the 360's capabilities these days), but Resistance: Fall of Man is not that far behind, and that's a launch title.

Lots of comparisons with duel-platform games (like Ridge Racer, Call of Duty 3, Fight Night 3, etc.) have been made, and the resemblance is pretty close, but remember that the PS3 version is a port from the 360 version, and that very rarely utilizes any of the advantages of the system.

The Cell is harder, but I believe that if the devs makes the effort to understand is and unlock its power, it'll prove to be more powerful than the Xenos.

Either way, the race is close.

Blkout
11-29-06, 04:32 AM
There's too many "I assume", and "I suspect" in the OP's post.

No doubt the 360 and PS3 are close, but I don't think the PS3 is as hard programming as some people would like others to believe.

Reading blogs and interviews with people who's actually programmed the 2 CPU's (not just read about them), lots agree that the Cell isn't easy, true, but it's far from impossible.

Lots of programmers see it as a challenge, and compares programming the Cell with the PS2 (actually, some think the Cell is easier to grasp than the PS2).

Sony's dropped the ball somewhat by not giving devs the tools they need, but when this gets fixed, I believe that the PS3 will put some distance to the 360. It's a close race, but I think that the PS3 do have an edge to the 360.

Comparing ports of games won't be fair, as the system it's originally developed on almost always will look better than the system onto which a title is ported.

And yes - I'm a Sony fan(boy), but I do respect the 360 as an awesome platform. There's just nothing I've seen on the 360 so far that makes me go "OOOH!". Sony's always delivered, and I believe they'll continue to do so.


The fact that you said you're a Sony fanboy explains your total blindness to reality and ignorance to what's been posted here. You need not continue to post on this subject.

yuzna
11-29-06, 04:34 AM
All this really means very little. Sure the PS3 is a lot more complicated to program for, but this will eventually be made easier by programing software and tools. As far as the Xbox 360 goes it definitley is a strong machine and seems better constructed than the PS3, though as I said this means little. The consumer will be the final judge, and since a lot of titles are coming out to the PS3 plus its enormous fanbase this will be a close battle indeed :)

Cynic
11-29-06, 06:12 AM
No game on the PS3 surpasses Gears of War (which is what most people use at benchmark of the 360's capabilities these days), but Resistance: Fall of Man is not that far behind, and that's a launch title.

Gears seems to be just about as much a "launch title" as Resistance; after all, both games have been in development before the launch of each console. Unfortunately, I don't know the exact dates, but I hardly believe Gears has had a full year of development time more than Resistance. In other words, if the 360 only launched this month, Gears would have been a "launch title" as well.


Like someone already pointed out earlier in this thread: "Some people don't understand how a console released a year after the Xbox 360 could possibly be inferior. Simply put, while the console was released a year later, the technology is not a year ahead. The specs for the PS3 were pretty much finalized back in March. Sony was able to add small things (such as HDMI to the core) but they couldn't change the processor/GPU/Ram".

I'd go even farther and say that both platform's specs were pretty much finalised back in May 2005, so it's not the same as when Dreamcast's specs were announced in May 98, PS2's in March 99, and Xbox/GameCube's in 2000/2001 (sorry, once again I don't know the exact dates).

IMHO, PS3 and 360 seem to be as "same generation" as they can be, much more so than Dreamcast/PS2/Xbox/GameCube.

Scotty L
11-29-06, 09:45 AM
Maybe that's because 1080p doesn't really matter. 1080p is a marketing gimmick and Microsoft knew it.
Says you.

http://en.wikipedia.org/wiki/1080p#Storage_format

However, the second generation American/Japanese HD DVD players and the first generation of European HD DVD players (both launched in Q4-2006) support direct output of 1080p signal.
There is a difference. After 50" 720p just doesn't look quite as sharp. And interlacing is a bandwidth-saving, quality-reducing technique that needs to die, the sooner the better.

BuGsArEtAsTy
11-29-06, 10:14 AM
The most important issue in the entire situation is INERTIA. Programers know what they know and like to continue doing what they know best. The reaction to multicore systems from developers has been somewhat mixed with regard to the paradigm shift from single threaded to multi-threaded systems. John Carmack, creator of Doom and one of the most knowledgeable people on the planet about 3D game engines, seems to have little love for the onslaught of multicore systems on both the desktop and in consoles.
OTOH, Carmack has also said he likes the 360's symmetric design, and the fact that the 360's development tools and technical documentation are the most robust.

5150
11-29-06, 10:21 AM
The fact that you said you're a Sony fanboy explains your total blindness to reality and ignorance to what's been posted here. You need not continue to post on this subject.

It's comments like this that make it so difficult to have an intelligent discussion on the subject. This is a good discussion. Don't derail it with this kind of petty, childish BS.

OfficerDibble
11-29-06, 11:13 AM
As I said in my previous post, my assumptions are based on the interviews and articles from real developers. Some devs have spoken against the Cell processor, but on the other hand, some has stated that programming it is a challenge, as it is with every new platform. Noone ever said it was going to be easy, but I don't think it's as hard as this thread is trying to make it sound.


From an interview with EA Canada Producer Michael Blank re Fight Night Round 3 on the PS3. I cannot post the link. Go to ign and search for Fight Night Round 3: The PS3 Interview.

IGN: Are the visual improvements specifically thanks to "The power of PS3" or is it simply because the developers had more time to work on next-gen hardware?

Blank: The answer is both. More time allows us to think about how to do things in different ways. At the same time, the PS3 is a powerful system and there are things we can do on this system that are unique. Each system has its advantages but both systems are really powerful tools that help us to make the great games we want to make.

IGN: Well then, be as frank as possible: How difficult or easy was it to port the game from Xbox 360 to PlayStation 3?

Blank: Making the game work on the PS3 was not an easy feat but this is the experience all game makers have at every hardware transition. The PS3 works differently than the 360 in many respects. That being said, once the initial learning curve was overcome, we've become very adept at figuring out how to get the most out of the platform.

mrcalypso
11-29-06, 11:27 AM
I find it funny when people say that the same versions of games look better on the 360 because they are ported over. Does anybody remember the last generation where in almost every single port from ps2 to xbox, the xbox version was graphically superior sometimes by a large amount. Also when they say that gears has better graphics because the 360 was out a year before, when the xbox first came out the graphics were superior especially in games like halo.

OfficerDibble
11-29-06, 11:58 AM
I find it funny when people say that the same versions of games look better on the 360 because they are ported over. Does anybody remember the last generation where in almost every single port from ps2 to xbox, the xbox version was graphically superior sometimes by a large amount.

Given the Xbox was released over 18 months after the PS2 why is that a surprise?

mboojigga
11-29-06, 12:21 PM
Given the Xbox was released over 18 months after the PS2 why is that a surprise?


Shouldn't be just like Splinter Cell started the trend along with Ghost Recon showing the difference when games came to xbox exclusive and then moved to PS2 or how games like Grand Theft Auto looked better on the 360 then the PS2. Honestly I don't think it really would had matter how long it took the xbox to come out. It was clearly from technical standpoint superior to the PS2 and again easier to program for. Same situation as before except the PS3 came out after the 360 and some games look less then better then the 360. Might as well accept the facts that between both consoles we will see some great looking games from multi too exclusives for each one. When those exclusives arrive on the PS3 that I want I will be picking it up for sure and will probably do the same thing I did last generation. Pick up the Multi platform and exclusive games for the 360 and pick up the exclusive games for the PS3 due to obvious reasons.

OfficerDibble
11-29-06, 12:44 PM
Shouldn't be just like Splinter Cell started the trend along with Ghost Recon showing the difference when games came to xbox exclusive and then moved to PS2 or how games like Grand Theft Auto looked better on the 360 then the PS2. Honestly I don't think it really would had matter how long it took the xbox to come out. It was clearly from technical standpoint superior to the PS2 and again easier to program for.

At a time when processor power and GPUs were continually improving the fact that Xbox was released over 18 months after the PS2 gave MS the time they need to develop a technically superior console so again why should anyone be surprised that Xbox games look superior to PS2 games? It's like arguing the P4 is a superior CPU to the P3.

With regards to the PS3 and Xbox 360 that's not the case, there is little to choose between them technically.

JData
11-29-06, 12:48 PM
I don't see it that way as I think it is more like...

.. with the PS3 delayed, they are given more time to 'finetune and optimize' their product(s).


____________________________

Back to programming.

Is there a game out there besides "Blue Dragon" that needs a lot of space? The only thing I can think of for the needed space is for FMV and textures.

Now in the PC world, I know MS' FSX comes on 2 DVDs because of terrain data and textures.

Daekwan
11-29-06, 01:11 PM
I don't see it that way as I think it is more like...

.. with the PS3 delayed, they are given more time to 'finetune and optimize' their product(s).


____________________________

Back to programming.

Is there a game out there besides "Blue Dragon" that needs a lot of space? The only thing I can think of for the needed space is for FMV and textures.

Now in the PC world, I know MS' FSX comes on 2 DVDs because of terrain data and textures.


Exactly why the PS3 had a 1.1 update waiting for it on its release day. There simply isnt an excuse both systems have been in development for the same period of time.

The whole idea of the 1 year head start is nothing but a lame excuse.

MisterNJ
11-29-06, 01:25 PM
http://img62.exs.cx/img62/295/ogre3nf.th.jpg
NEEEEERRRRRRDDS

LOL

I don't really know much about the technical stuff---all I know is, I own 10 games and a whole bunch of arcade games and the machine takes up a lot of my free time.

ANGELUS
11-29-06, 01:38 PM
As I said in my previous post, my assumptions are based on the interviews and articles from real developers. Some devs have spoken against the Cell processor, but on the other hand, some has stated that programming it is a challenge, as it is with every new platform. Noone ever said it was going to be easy, but I don't think it's as hard as this thread is trying to make it sound.

I do agree that the structure of the Cell poses a bigger challenge than the Xenos, and that it will take a bigger effort to utilizing the power. With that said, the 360 doesn't have a lot to show when it comes to utilizing the 360's power. A year in, and there's only a handful of games that stand out from the rest.

No game on the PS3 surpasses Gears of War (which is what most people use at benchmark of the 360's capabilities these days), but Resistance: Fall of Man is not that far behind, and that's a launch title.

Lots of comparisons with duel-platform games (like Ridge Racer, Call of Duty 3, Fight Night 3, etc.) have been made, and the resemblance is pretty close, but remember that the PS3 version is a port from the 360 version, and that very rarely utilizes any of the advantages of the system.

The Cell is harder, but I believe that if the devs makes the effort to understand is and unlock its power, it'll prove to be more powerful than the Xenos.

Either way, the race is close.

Gears is WAY beyond Resistance (Just look at the blurry when up close textures in Resistance)... But forget about Gears for a sec: I have seen no PS3 games that look as technically impressive than Kameo which was a launch title. Those who disagree please remember I'm only talking about the technical aspects of graphics like texture quality and framerate, not whether you happen to like the art style or want to fly around as a fairy :D.

Historically speaking the later released console has always been more powerful and had better looking games rigt from launch than the competition.

SNES did over Genesis
PS2 did over DC
Xbox did over PS2 (Sony even said that the Xbox launched a year after PS2 so they expected it to be better)

But this time around it seems that they are at least even and until I DO see something that looks as good as Gears running on it or hell even some games with really sharp textures (maybe the single most important thing to make your game look good), I can't even consider buying one to keep.

I'm actually kinda bummed about it because I really wanted one just based on the PS2 and how much I loved that system. I even pre-ordered one a year ahead of time. The fact that a guy like me doesnt want one should worry Sony greatly...

BTW we aren't the only ones who noticed this: http://www.1up.com/do/feature?cId=3155393

Ding62
11-29-06, 01:41 PM
A few random comments:

Some 360 games do use multiple cores. Geometry Wars might be the earliest and most trivial example. (The warping "gravity grid" calculation that underlies all the action has a core all to itself, for instance.) I do agree that no game has fully exploited all three cores, and most games don't even come close.

I believe that most modern PC video cards include a hardware scaler. I'm not sure whether it resides on the GPU, or is on it's own chip further downstream. Scalers became important with the advent of fixed-resolution LCD monitors in the PC space. Nowadays, one can choose to let the video card scale all content to a single display resolution, or you can let the scaler in the display itself do it. (In my case, I let the monitor do the scaling since I can't detect any quality difference between the two scalers in the chain.)

Anybody that thinks Resistance looks "almost as good" as GoW, is deluding themselves IMO.

DaveC19
11-29-06, 02:20 PM
There's too many "I assume", and "I suspect" in the OP's post.

Sony's dropped the ball somewhat by not giving devs the tools they need, but when this gets fixed, I believe that the PS3 will put some distance to the 360. It's a close race, but I think that the PS3 do have an edge to the 360.



Well since it has been stated that the 360's GPU is superior to the PS3's how do you know that the opposite won't be true? That the 360 will put some distance to the PS3?

mterzich
11-29-06, 02:21 PM
For single core applications, porting between platforms should be a fast and easy operation. The primary differences are in the way to interface to the libraries since the API (Application Program Interface) to the libraries pass parameters differently between the game consoles. However both APIs usually have the same capabilities. Typically developers use the same source for both consoles but use #ifdefine directives to call the libraries.

Example:

#ifdefine PS3
{code to call PS3 library}
#else
{code to call 360 library}
#endif

When compiling the code for the PS3 the PS3 flag will be defined to compile code only for the PS3 and when compiling the code for the 360, the PS3 flag will not be defined.

Since both consoles will be running the application on a single core, any difference that can be observed will be due to supporting hardware such as the GPU or compiler optimization code.

If Microsoft is providing Visual Studio as the development platform, that is one sweet development platform. It is highly doubtful that Sony would ever be able to create a development platform that works as well at Visual Studio due to the amount of years the platform has been under development by Microsoft.

A good development platform could save more than 2x the developers time over a poorer designed platform. I would suspect that for single core development that would run on both game consoles, the application would always be developed on the 360 first and then ported to the PS3 due to the superior development tools. It wouldn't suprise me if even single core applications that are designed only for the PS3 would be developed first for the 360 and than ported to the PS3 due to the superior development tools for the 360.

Srecko1
11-29-06, 02:29 PM
@mterzich

Even for someone like me its obvious that there is no much difference beetween these two consoles but i have to admit i am a little dissapointed with Sony. I think they have underestimated 360 and even thou i am not MS fan i am glad they were wrong.
In the end its all about games! I think PS2 is best example of that.

Since it seems that you know a lot about this stuff and you gave some comments about future consoles, i ask you is it possible to predict how powerfull will be consoles after this generation?
At list processors?

Bailey151
11-29-06, 02:31 PM
Lots of programmers see it as a challenge, and compares programming the Cell with the PS2 (actually, some think the Cell is easier to grasp than the PS2).

Sony's dropped the ball somewhat by not giving devs the tools they need, but when this gets fixed, I believe that the PS3 will put some distance to the 360. It's a close race, but I think that the PS3 do have an edge to the 360.
Sorry it's just not possible. It will work for now but the 360 has legs that the PS3 can never match. From the beginning I've suspected Sony screwed the pooch with this architecture, it's always looked like a pricy decoder & not a gaming platform. The DMA calls alone will kill any chance at truly superb performance - no unified RAM, Sony what were you thinking? Were you thinking.

Thanks for the link to the GPU article - seems like the same age old architecture battle NV & ATI have been having for years (at least since the 9800 series).......apparently NV is a tad slow on the uptake :D

Blank: Making the game work on the PS3 was not an easy feat but this is the experience all game makers have at every hardware transition. The PS3 works differently than the 360 in many respects. That being said, once the initial learning curve was overcome, we've become very adept at figuring out how to get the most out of the platform.
Programmer's translation - we figured out how to squeak it out of the PPE core ;)

Bertil
11-29-06, 02:43 PM
... It's like arguing the P4 is a superior CPU to the P3.
...


Well, most agree the P3 was a better design then the P4, but I get your point. :)

DLove23
11-29-06, 02:55 PM
I'm starting to get the feeling that while still a very capable game console, the PS3 was less about pushing the limits of gaming as it was a trojan horse to get hordes of Blu-Ray players into households to help that format win the next gen war.

mterzich
11-29-06, 03:27 PM
Even for someone like me its obvious that there is no much difference beetween these two consoles but i have to admit i am a little dissapointed with Sony. I think they have underestimated 360 and even thou i am not MS fan i am glad they were wrong.
In the end its all about games! I think PS2 is best example of that.

Why was the cell processor chosen as a design? I suspect that the design of the cell processor was defined by the chief engineer (probably due to ego) at Sony and subordinates didn't dare to question his authority. Management was not technical enough to understand the consequences of the design since price/performance was acceptable.

Something similar happened in the early 1970s at Control Data Corporation. Seymour Cray was the chief engineer and was able to get another most powerful general purpose computer to the market about every 18 months. However at that time he developed an extremely powerful vector type computer that on paper blew away his previous general purpose computers. The problem is that vector machines only had a niche market so sales were few. Management decided that Seymour needed to get back to developing general purpose computers but Seymour liked the vector systems. So they parted ways and Seymour started Cray Research. Unfortunately for Control Data they could never again find a strong leader in the engineering department and development cycles increased. Eventually Control Data broke up into small pieces. Cray Research is still a viable company today but for a small niche market.

Since it seems that you know a lot about this stuff and you gave some comments about future consoles, i ask you is it possible to predict how powerfull will be consoles after this generation?
At list processors?
I think some people may look at Gears Of War which probably uses only about 1/3rd of the processor power available and may think that there won't be any need for a future console. However, give developers a platform with unlimited resources avaiable and they will use it. Predicting the power of future consoles will be based on the imporvements of the processors in the next few years. Game console manufacturers are very price conscious so can never be cutting edge. This is why the current processors are stripped down versions of the PowerPC.

Currently Intel seems to be producing reasonably priced and relatively powerful dual core processors. In a few years the quad core processor should be fairly cheap and have enough power for the next generation consoles. Intel could be an option but don't count out AMD, IBM, or MIPS. The games console manufacturers could chose an off the shelf processor or possibly a special designed processor such as the current generation. It will be based on the best price/performance. I wouldn't venture to predict what would happen at that time.

Bailey151
11-29-06, 03:29 PM
I'm starting to get the feeling that while still a very capable game console, the PS3 was less about pushing the limits of gaming as it was a trojan horse to get hordes of Blu-Ray players into households to help that format win the next gen war.
That's always been my opinion..............hmmmmm, let's see.............

Decoding invloves processing vast amounts of repetitive data & the SPE's are superb at???????

I remember reading the 1st PS3 architecture leaks & thinking "why are they making a "ripping box & not a gaming console"? (it's power is much more biased to decoding type operations IMHO).

..........unlimited resources avaiable and they will use it.
Nah, now way - nobody will ever need more than 640K :D

And let's not forget MS is at it's core a software company. It's easy to see why they would see the benefits of the 360's architecture(& ATI's GPU). Sony is an entertainment company - they would likely see the media processing power as very attactive.

Cynic
11-29-06, 03:35 PM
For single core applications, porting between platforms should be a fast and easy operation.
[...] It wouldn't suprise me if even single core applications that are designed only for the PS3 would be developed first for the 360 and than ported to the PS3 due to the superior development tools for the 360.

One thing that bothers me a bit in this generation is how the development of multiplatform titles may turn out.
Previously, with PS2-Xbox ports, I got the impression that developers could try to make good use of PS2's hardware because, since Xbox was more powerful and easier to develop for, they had the "extra space" needed to port the game without major compromises.

But now that 360 and PS3 seem more similar to each other than their predecessors, do you think there's a risk that most developers will be afraid of exploiting either 360's or PS3's specific advantages simply because it would mean that porting would then be much more difficult, if not impossible?
Will the "lowest common denominator" play a bigger role now than during the previous generation?

Srecko1
11-29-06, 03:41 PM
@mterzich

Well yes i got your point.

Also i wanted to ask you this. According to Sony PS3 is 40 times "stronger" than PS2, do you think that generation after this one will be that much stronger than the current or there wont be any need for that much power.
And i might be wrong but it seems that in future technological(CPU,GPU) development might slow down a bit.

mterzich
11-29-06, 03:44 PM
That's always been my opinion..............hmmmmm, let's see.............

Decoding invloves processing vast amounts of repetitive data & the SPE's are superb at???????

I remember reading the 1st PS3 architecture leaks & thinking "why are they making a "ripping box & not a gaming console"? (it's power is much more biased to decoding type operations IMHO).
At first glance what you are saying appears 100% correct. However with only 256KB of memory avaiable in the SPE, how do you get a program, one MPEG 720p frame (166KB at 10 mb/s) and one output buffer (3 MB) in the memory at the same time? Even seqmenting the process into 7 processes, that would seem to require about 500KB of memory per SPE.

Fangrim
11-29-06, 03:50 PM
The fact that you said you're a Sony fanboy explains your total blindness to reality and ignorance to what's been posted here. You need not continue to post on this subject.

Erm - just because I am partial to Sony, I'm hopefully still allowed to have an opinion.

If you actually read what I wrote, you'll notice that I do respect the 360, albeit I feel the PS3 has bigger potential.

I don't see myself as blind nor ignorant - however, by posting your comment, I do feel that you have those traits.

Well since it has been stated that the 360's GPU is superior to the PS3's how do you know that the opposite won't be true? That the 360 will put some distance to the PS3?

I don't know. It might do just that - it certainly has some good games to show. THat aside, I do feel that the PS3 *also* has games coming up that will show the PS3's power.

Regarding the comparison on GoW vs. R:FoM - it's all in the eye of the beholder. GoW is a very impressive game, but R:FoM is also impressive in its own right. Give the PS3 a year, and I beleive that you'll see PS3 games just as technically advanced as GoW, if not better.

mterzich
11-29-06, 03:55 PM
One thing that bothers me a bit in this generation is how the development of multiplatform titles may turn out.
Previously, with PS2-Xbox ports, I got the impression that developers could try to make good use of PS2's hardware because, since Xbox was more powerful and easier to develop for, they had the "extra space" needed to port the game without major compromises.

But now that 360 and PS3 seem more similar to each other than their predecessors, do you think there's a risk that most developers will be afraid of exploiting either 360's or PS3's specific advantages simply because it would mean that porting would then be much more difficult, if not impossible?
Will the "lowest common denominator" play a bigger role now than during the previous generation?
Single core applications will be the norm for developers that want to support both platforms so don't expect much difference between most games. Developing code to use the multi-core capabilities of both consoles will require totally different code bases, dratmatically increased development time, 2 design cycles, 2 full QA cycles, 2 full debug cycles, etc. So expect mulit-core applications to be performed for only on one console (most probably the 360 due to its easier development/design cycle).

Bailey151
11-29-06, 04:20 PM
At first glance what you are saying appears 100% correct. However with only 256KB of memory avaiable in the SPE, how do you get a program, one MPEG 720p frame (166KB at 10 mb/s) and one output buffer (3 MB) in the memory at the same time? Even seqmenting the process into 7 processes, that would seem to require about 500KB of memory per SPE.
I was basing my thoughts on the processors strengths & not the actual feasibility :)

Erm - just because I am partial to Sony, I'm hopefully still allowed to have an opinion
Damn, I would hope so - a single opinion world is pretty boring.

dratmatically increased development time, 2 design cycles, 2 full QA cycles, 2 full debug cycles, etc.
Yeah, that's going to cost way too much $$$$$ - you'd have to sell a ton to make a profit.

Give the PS3 a year, and I beleive that you'll see PS3 games just as technically advanced as GoW, if not better.
Maybe, maybe not. IMHO the GPU will limit the upper end of the PS3's capabilities......but once you're at these resolutions does it really matter all that much?

OfficerDibble
11-29-06, 04:28 PM
Exactly why the PS3 had a 1.1 update waiting for it on its release day. There simply isnt an excuse both systems have been in development for the same period of time.

The whole idea of the 1 year head start is nothing but a lame excuse.

No it doesn't, it depends when the development kits were finished and if the PS3 is more difficult to program then obviously it will take longer for developers to get more out of the hardware.

mterzich
11-29-06, 04:32 PM
Well yes i got your point.

Also i wanted to ask you this. According to Sony PS3 is 40 times "stronger" than PS2, do you think that generation after this one will be that much stronger than the current or there wont be any need for that much power.
And i might be wrong but it seems that in future technological(CPU,GPU) development might slow down a bit.
In theory Sony is probably correct but it is all in the calculations.

The PS2 had a MIPS processor at 296 MHz clock and the PS3 uses a cell processor at 3.2 GHz clock. If you divide the clock speeds, the cell is about 11x more powerful than the MIPS. If you multiple that times the number of cores (8 cores total) that equals 88 times the power of the MIPS. However the MIPS processor is a fully featured processor (although not as well designed and as powerful as current full featured processors at that clock speed) and the cell is stripped down (in-order execution) processor. That probably reduces the processing power to about 45% or less than a full featured processor which would be 88 *.45 = 39.6. However using only one core (which will be the case of most applications) will give an effective increase of processing power of about 5x.

The 360 also has a performance increase of about 5x per core over the PS2 or a total of about 15x using all three cores.

Currently there are 2.5 GHz dual core full featured PowerPCs and lower frequency full featured quad PowerPCs available but both are currently too expensive for a game console. In a few years, a 3.2 GHz quad full featured PowerPC may be available and reasonably priced for a game console. This should give enough usuable power increase for both the PS3 and the 360. It wouldn't give a large significant increase in total power for the PS3 but a much more significant increase in more easily usable power. That much or more of processing power should be available from many vendors of processors at a reasonable price in a few years.

As far as the GPU, that is the area that needs the most improvement. The problem is cost but I'm sure that will come down.

thatdude90210
11-29-06, 04:36 PM
Gears is WAY beyond Resistance (Just look at the blurry when up close textures in Resistance)... But forget about Gears for a sec: I have seen no PS3 games that look as technically impressive than Kameo which was a launch title. Those who disagree please remember I'm only talking about the technical aspects of graphics like texture quality and framerate, not whether you happen to like the art style or want to fly around as a fairy :D.
That brings up another question I saw somewhere else. Does the PS3 use some sort of filtering optimization like "brilinear" filtering? "Brilinear" is mix of trilinear & bilinear filtering that GPU makers use as performance optimizations. If it's overdone, looks very poor, more like bilinear filtering.

Because if you look at this video comparison at gametrailers.com (http://www.gametrailers.com/player.php?id=14934&type=wmv&pl=game) of Tony Hawk on the PS3 & the 360, you can clearly see the moving line in front of the skater where the ground texture goes blurry in PS3 parts, like it's using bilinear filtering. Whereas the 360 parts look like normal trilinear filtering (http://en.wikipedia.org/wiki/Texture_filtering#Trilinear_filtering).

Extra
11-29-06, 04:52 PM
PS3 is not "40" times more powerful than PS2. IMO it's always errorneous when attempting to summarizing the difference between 2 systems in one number like that. PS3 might be "40" times more powerful at ONE certain specific performance than the PS2, but as a whole the system is in no way 40 times stronger.

It's difficult to anticipate the performance of the next generation mainly because GPUs at the moment are much less predictable than CPUs. GPU development can totally blindside you - I sure as hell did not anticipate that nVidia will have released the 8800 series with a unified shader. And what happened to the Geometry shaders?

AMD and ATI are also currently investigating CPU+GPU hybrids - there might be a lot of exciting things that could happen there. But we won't know for a while yet.

I also want to add that many 360 launch games looked amazing at certain areas. Perfect Dark Zero, for example, had some simply astonishing textures in the environment. And as someone mentioned earlier, Kameo is amazing as well (let's face it, Rare is amazing at hardware ;) Viva Pinate looks amazing yet again).
PS3's launch did not even "beat" the 360's launch in graphics as far as I'm concerned, what makes people think it will outdo the 360 later down the line?

O wait, I think I just answered my own question. It's because of people like the developer of Killzone stating that the in-game graphics WILL look as good as E3 trailer of course.

Srecko1
11-29-06, 04:52 PM
@mterzich

One more thing. Have you seen Killzone 2 trailer? IS it really possible that such graphics can "run" on PS3?

mterzich
11-29-06, 05:00 PM
One more thing. Have you seen Killzone 2 trailer? IS it really possible that such graphics can "run" on PS3?
I am not a gamer so the only things I see on the PS3 or 360 is at stores. I actually get a lot more information about how things are working from this forum. Of course I have to filter out the crap and try to understand what people are talking about. I haven't seen Killerzone trailer.

Srecko1
11-29-06, 05:09 PM
I am not a gamer so the only things I see on the PS3 or 360 is at stores. I actually get a lot more information about how things are working from this forum. Of course I have to filter out the crap and try to understand what people are talking about. I haven't seen Killerzone trailer.
Sorry

Extra
11-29-06, 05:13 PM
One more thing. Have you seen Killzone 2 trailer? IS it really possible that such graphics can "run" on PS3?

I don't believe it for a second.

There's way too much subtle things regarding the E3 trailer that simply cannot be rendered in real time right now.
-Volumetric clouds that look even BETTER than the crytek engine...
-The smoothness of the characters (especially faces) indicate a VERY high poly count. This is not Z brush normal mapping cheating details - this is the real deal.
-Look at those soft shadows. Uhh, WOW? Real time accurate soft shadows is essentially impossible to be rendered in real time right now, whether it be on the PC OR console.

And yes, I'm calling their developers on it. They're full of crap when they said the game'll look as good as the trailer did. I have experiences with 3D modelling, and right off the bat from looking at their E3 trailer I know the polycount was wack.

mterzich
11-29-06, 05:22 PM
I am not a gamer so the only things I see on the PS3 or 360 is at stores. I actually get a lot more information about how things are working from this forum. Of course I have to filter out the crap and try to understand what people are talking about. I haven't seen Killerzone trailer.
All sophisticated high detail applications can be developed for either the 360 or PS3 using a single core. But since realtime is the big issue, it may just work too slowly or have too many issues with the GPU to be any good.

On older consoles, developers may try to optomize code to try to improve frame rate and stalls (if the reason is the processor power and not the GPU). But you can only go so far by optomizing code (maybe only 1 or 2 fps increase). After that you may need a more powerful processor (which you don't have) so the only option is to try to segment the application for parallel processing to use the other cores. In theory this can done on either the 360 or the PS3.

Srecko1
11-29-06, 05:25 PM
I don't believe it for a second.

There's way too much subtle things regarding the E3 trailer that simply cannot be rendered in real time right now.
-Volumetric clouds that look even BETTER than the crytek engine...
-The smoothness of the characters (especially faces) indicate a VERY high poly count. This is not Z brush normal mapping cheating details - this is the real deal.
-Look at those soft shadows. Uhh, WOW? Real time accurate soft shadows is essentially impossible to be rendered in real time right now, whether it be on the PC OR console.

And yes, I'm calling their developers on it. They're full of crap when they said the game'll look as good as the trailer did. I have experiences with 3D modelling, and right off the bat from looking at their E3 trailer I know the polycount was wack.
If Gears of War can look that good, then maybe Killzone 2 can come close to that trailer in quality.

Srecko1
11-29-06, 05:27 PM
All sophisticated high detail applications can be developed for either the 360 or PS3 using a single core. But since realtime is the big issue, it may just work too slowly or have too many issues with the GPU to be any good.

On older consoles, developers may try to optomize code to try to improve frame rate (if the reason is the processor power and not the GPU). But you can only go so far by optomizing code (maybe only 1 or 2 fps increase). After that you may need a more powerful processor (which you don't have) so the only option is to try to segment the application for parallel processing to use the other cores. In theory this can done on either the 360 or the PS3.
Anyways please check this "famous" trailer of Killzone 2

http://www.gametrailers.com/gamepage.php?fs=1&id=1668

Browse little down and you will see video clips, press first one where guy with a gun is.

Extra
11-29-06, 05:37 PM
If Gears of War can look that good, then maybe Killzone 2 can come close to that trailer in quality.

Gears of war looks very nice yes, thanks to normal mapping. The texture quality and polygon detail of GOW's original models (in Z brush) look very impressive.

The difference is, the KillZone2 trailer looks that good (better than GOW's in-game graphics - technically speaking there's no contest) and has that "CG" look is by brute polycount force combined with soft shadows.
Those two elements are the most prominent reasons of what's separating real-time from "CG" right now.
When stripped of those 2 elements (and I guarantee you they will be), killzone2 will no longer look impressive.

Edit: I should add, by "no longer look impressive" I mean comparatively speaking. The game might still look nice, but it's not going to touch the trailer. There's only so much normal mapping can fake before the limitations of what can be done in realtime today will be shown.

mterzich
11-29-06, 05:41 PM
Anyways please check this "famous" trailer of Killzone 2

http://www.gametrailers.com/gamepage.php?fs=1&id=1668

Browse little down and you will see video clips, press first one where guy with a gun is.
Your guess is as good as mine on that one. It looks like a lot of detail and action and looks quite impresive. Also they must have a prototype working since they made the trailer. The real question is whether they can get it working satisfactorily without stalls and a decent frame rate on one core. If not, they will have figure out a way to use the SPEs.

I think the developers know that they may have issues with performance so don't dare to estimate a release date. Usually when features are the only issue (not performance) developers can judge an estimated ship date and be fairly close. Also when it is features only, they can deliver the product without all the envisioned features to meet the estimated release date.

mking2673
11-29-06, 05:48 PM
When talking about killzone 2 lets not forget that it was shown at E3 in May of 2005 and we have not seen so much as a screenshot since. There was word this week that they are in fact still working on it, but lets be honest- the video has nothing to do with what the game will look like.

mterzich
11-29-06, 05:57 PM
When talking about killzone 2 lets not forget that it was shown at E3 in May of 2005 and we have not seen so much as a screenshot since. There was word this week that they are in fact still working on it, but lets be honest- the video has nothing to do with what the game will look like.
I they had shown that in May 2005 and they do not yet have an estimated release date, I would suspect that the chances of being released at that quality would not be very good.

Daekwan
11-29-06, 06:50 PM
No it doesn't, it depends when the development kits were finished and if the PS3 is more difficult to program then obviously it will take longer for developers to get more out of the hardware.

So I guess next year when games on the 360 still look as good or better in most cases than what they do on the PS3, you will still be crying:

The 360 is in its 3rd year of development while the PS3 is just hitting its 2nd year..


Like i said.. its always gonna be their excuse for coming up short.. and its an excuse most people wont accept

Jetrii
11-29-06, 07:00 PM
Resistance was announced on March, 2005, right around the same time the early GoW screen shots started appearing. While Resistance is a launch title, it has been in development for a long time.

mterzich
11-29-06, 07:03 PM
Important Update Of Original Post

Since I'm not a game developer when I did the original post I did not understand how a game application was structured so I made some assumptions that current game applications were single threaded except for some minor parallelism. Since than I have been reading some basic concepts of game development and it appears that games are usually developed using multiple threads. What does that mean? It means that if a games application is already multithreaded, it takes very little effort to make it multi-core as long as the game console already has cores that are general purpose. The only difference betwen using a thread and a different core is that synchronization and semaphore use can be more difficult.

That means that the 360 can outperform the PS3 with only a few minor change in the code. Code source would still remain the same but would have a few more #ifdefines in the code.

On the 360, this does not triple the power but it may give it significant more power than the PS3 even in it's early stages of development. So I've changed my mind and suspect that the 360 is already producing most applications using multi-core. This means that on the PS3 frame rates may be lower and the potiential for stalls may occur more often than the same game on the 360. That will only occur on games that require more power than the PPE core on the PS3 can provide.

The following is an extract of an article describing the threads of normal code development for a game application.

http://www.pcquest.com/2004/images/Netgraajpg_may2k4.gif

Each block represents a thread and each arrow represents the way in which each of these threads communicates. The synchronization routines aren’t exactly a thread that the user is aware of; they are the routines that keep calling all other threads and make sure that inconsistent data is not accessed. They also contain level data, triggers for certain threads, collision detection, common data such as health and ammo counts and anything that has to be shared. This also contains the critical section, the communication medium between threads. The graphics thread could be implemented using OpenGL and the sound could be implemented using Directsound. Of these two, the sound thread does not require much CPU time compared to the massive job of the graphics thread. It would, thus, be wise to set the thread priority of the sound thread to a lower value and that of the graphics thread to a higher one. The AI thread has to be placed at comparable priority to the graphics thread as the game cannot move on without the opponents move.

The AI and Graphics threads are the important threads and save the most power if they can be placed in different cores. The other threads use very little power and are used due to need for parallelism. This means that the AI can be assigned to one core and the Graphics to another core. The other threads can be assigned to the core that is using the least amount of power using hardware or software threads or assigned to the third core.

Srecko1
11-29-06, 07:24 PM
So I guess next year when games on the 360 still look as good or better in most cases than what they do on the PS3, you will still be crying:

The 360 is in its 3rd year of development while the PS3 is just hitting its 2nd year..


Like i said.. its always gonna be their excuse for coming up short.. and its an excuse most people wont accept
Well i dont agree with you. RFOM is first generation game for PS3 and look awsome. GEOW is the only game on 360 that looks and feels better than RFOM. Not counting Oblivion that in not exclusive and will be ariving on PS3.If you take a look at screenshots of many PS3 exclusive titles, many of them even thou they are first generation games, look better or even as 360 second generation games that will be released in next year.

Srecko1
11-29-06, 07:29 PM
[SIZE=3]I

On the 360, this does not triple the power but it may give it significant more power than the PS3 even in it's early stages of development. [B]So I've changed my mind and suspect that the 360 is already producing most applications using multi-core. This means that on the PS3 frame rates may be lower and the potiential for stalls may occur more often than the same game on the 360. That will only occur on games that require more power than the PPE core on the PS3 can provide.



Well this can be already seen on some games. Problems with framerate on PS3 games vs same on 360. BUT LETS NOT FORGET THOSE GAMES ARE RUSHED PORTS FROM 360. Your point is good but will have to waith at least one year to confirm it.

Jetrii
11-29-06, 07:29 PM
Well i dont agree with you. RFOM is first generation game for PS3 and look awsome. GEOW is the only game on 360 that looks and feels better than GEOW. Not counting Oblivion that in not exclusive and will be ariving on PS3.If you take a look at screenshots of many PS3 exclusive titles, many of them even thou they are first generation games, look better or even as 360 second generation games that will be released in next year.


Most of the current PS3 games are ports. If you look at games being released on both consoles, the 360 version seems to look/perform a bit better for the most part. As for past 360 games, of course they are going to look better on the PS3, the developers had extra time to improve it.

Also, RFOM has been in development for around 2 years. While it does look really good, it doesn't really prove much.

I'm not saying the PS3 is bad, It's a good console. I'm just annoyed by the whole "RFOM looks almost as good as GoW and it's a launch title."

Guess what, GoW is Epic's first 360 game as well.

Srecko1
11-29-06, 07:35 PM
Most of the current PS3 games are ports. If you look at games being released on both consoles, the 360 version seems to look/perform a bit better for the most part. As for past 360 games, of course they are going to look better on the PS3, the developers had extra time to improve it.

Also, RFOM has been in development for around 2 years. While it does look really good, it doesn't really prove much.

I'm not saying the PS3 is bad, It's a good console. I'm just annoyed by the whole "RFOM looks almost as good as GoW and it's a launch title."

Guess what, GoW is Epic's first 360 game as well.
If you ask me its stupid to compare these two games head to head. Its two different styles/genre but looking on them as "games" GEOW doesnt offer that much more than RFOM. Actually RFOM offers more hours to be spend in the game. Longer single player, better story and AMAZING multiplayer, up to 40 players. RFOM fails on only one subject, it doesnt offer anything revolutionary/new, like GEOW does. GEOW is amazing game, but it has 2 many flaws and if you ask me its way over hyped. Short single player, mediocre story and pretty decent(nothing spectacular) multiplayer that can get frustrating are weak points.

mterzich
11-29-06, 07:39 PM
Well this can be already seen on some games. Problems with framerate on PS3 games vs same on 360. BUT LETS NOT FORGET THOSE GAMES ARE RUSHED PORTS FROM 360. Your point is good but will have to waith at least one year to confirm it.
But without using SPEs the only way to make the PS3 slightly better would be to try to optimize the code and that may also make the 360 run better. So without a total rewrite to include the use of SPEs, the PS3 will only work slightly better if code can be optimized. I don't understand what you would be waiting one year for since most applications will not be rewritten to use the SPEs.

JerryNY
11-29-06, 07:43 PM
One thing being left out of the discussion, which was mentioned in the original post and one of the links in the OP, is the OS and the resources they require on each respective platform. The 360 uses 32MB of the 512MB available RAM and 3% of CPU time on core 1 and core 2 and nothing on core 3. This of course means that a game dev has at least 100% of core 3 and whatever else they care to use w/o even wasting a brain cell thinking about it. The PS3's OS gobbs up 32MB of the 256MB GDDR3 and an ADDITIONAL 64MB of the 256MB of the XDR memory. 1 SPE of the 7 is reserved for the OS, so in effect 6 are available to game devs but one other SPE is allowed to be taken over by the OS at a moments notice w/o a choice in the matter. This probably means many game devs will target 5 SPE to use as a maximum to prevent problems with the OS yanking away an SPE and hurting performance. An OS can't run entirely on the SPE because of all the generalized code it contains so all those resources that the OS uses up in RAM probably mean a somewhat significant amount of CPU power is being used as well. The 360's 3 general cores prevents this from being an issue.

-Jerry C.

JerryNY
11-29-06, 07:45 PM
One other interesting thing is the fact that Sony allows for easy Linux installation on the PS3. Much of the mysteries of what the PS3 is really capable of should be "discovered" by Linux geeks pouring over the hardware. It should be fascinating to see what they can do with it.

-Jerry C.

JasZ
11-29-06, 07:50 PM
But without using SPEs the only way to make the PS3 slightly better would be to try to optimize the code and that may also make the 360 run better. So without a total rewrite to include the use of SPEs, the PS3 will only work slightly better if code can be optimized. How else can they make it better without a major rewrite so I don't understand what you would be waiting one year for since most applications will not be rewritten to use the SPEs?




So, how does the new multi-core information you detailed above affect your analysis of the future potential of each system with regard to exclusive first party games specifically developed that system?

mterzich
11-29-06, 07:58 PM
So, how does the new multi-core information you detailed above affect your analysis of the future potential of each system with regard to exclusive first party games specifically developed that system?
It sounds like it may scare the hell out of developers committed to Sony. Japanese companies have a commitment to each other but that can only be carried so far. If developing a killer application takes 2x or more the cost and time on the PS3 compared to the 360, they'll probably start to rethink their strategy.

Srecko1
11-29-06, 08:02 PM
It sounds like it may scare the hell out of developers committed to Sony. Japanese companies have a commitment to each other but that can only be carried so far. If developing a killer application takes 2x or more the cost and time on the PS3 compared to the 360, they'll probably start to rethink their strategy.
Aint you little over exaggerating? Yes its well know that its easier to develop for 360 but its not that hard to develop for PS3 as many want to think. Actually many developers say that it is hard to develop for PS3 but it was harder on PS2 in beggining. They only need some time to find easier way. BTW- even PS2 was hardest machine to develope for in previous gen.

mterzich
11-29-06, 08:07 PM
Aint you little over exaggerating? Yes its well know that its easier to develop for 360 but its not that hard to develop for PS3 as many want to think. Actually many developers say that it is hard to develop for PS3 but it was harder on PS2 in beggining. They only need some time to find easier way. BTW- even PS2 was hardest machine to develope for in previous gen.
It just does not have the memory in the SPE to do very much that is useful as far as a games application is concerned. I'm not as concerned about the lack of shared memory or the use of DMA (Direct Memory Access) as I would be about the 256KB of memory that is available in each SPE. Am I over exagerating? No I'm not. 2x the cost and time is a conservative estimate.

I know what it is like when you don't have enough memory as a developer. Back in the 80s, I took a IBM 3270 application running under PC-DOS (using an Intel 80286 processor) that was using between 400-600 KB of memory (of maximum 640 KB directly usable memory) and segmented it to run in less than 100 KB so that a second application could run at the same time. I used UMB and extended memory as overlay areas (paging mechanism). Finally after 12 hour days, 6 days a week, and 10,000s of code changes, I finally was able to deliver the product after 18 months.

In some ways the cell processor would not be as difficult since it has a PPE core with a large amount of memory to control the process, move memory around, and activate SPEs. However, in other ways programming the cell will be even more difficult since the developer needs to find a large number of small code seqments that can be executed in parallel to save power (the 3270 program did not have any parallel execution code). Also managing that much moving data (results of each process) will create synchronization issues as well as a large amount of management code .

ProjectEF
11-29-06, 08:08 PM
Aint you little over exaggerating? Yes its well know that its easier to develop for 360 but its not that hard to develop for PS3 as many want to think. Actually many developers say that it is hard to develop for PS3 but it was harder on PS2 in beggining. They only need some time to find easier way. BTW- even PS2 was hardest machine to develope for in previous gen.

Man..you must really love Sony :)

Srecko1
11-29-06, 08:13 PM
Man..you must really love Sony :)
Far from that! Since they delayed release for us Europeans i lost my interest and i must say i am little mad at tham. But that doesnt mean i have to agree with everything that is against Sony. And all of things i wrote are not lies that i come up with just to defend Sony.

mterzich
11-29-06, 08:59 PM
Even if a person is not technically knowledgeable, I think he can relate to the issues that are involved with each SPE only having 256KB of memory.

To illustrate the point, In 1981 the IBM PC had a maximum of 640KB of main memory. If you had a hard drive, it was 5 MB. Since an SPE has 256KB of memory, that would be the same as giving someone today a 5 MB hard drive. That is 1/4000th the size of the 20 GB hard drive in the 360). What can you use a 5 MB hard drive for today? Nothing. So therefore 256KB of memory is also close to nothing in todays development environment.

ProjectEF
11-29-06, 09:07 PM
Just because things are against Sony doesnt mean they arent true. If we were to take the badges off each system, I think alot more people would agree with that. The debate isnt really toward each company, but rather if what they did actually is the best thing to do for "gaming". Please forgive me for misinterpreting your posts though :o

mterzich
11-29-06, 09:22 PM
Just because things are against Sony doesnt mean they arent true. If we were to take the badges off each system, I think alot more people would agree with that. The debate isnt really toward each company, but rather if what they did actually is the best thing to do for "gaming". Please forgive me for misinterpreting your posts though :o
Actually I interpeted his posts as to being very good questions and not a fanboy attitude. He kept me on my toes to make sure that I could backup what I was posting. I think the thread has been lucky so far in that it has usually been reasonably debated. In other words it hasn't gone to hell yet.

JData
11-29-06, 09:41 PM
If you ask me its stupid to compare these two games head to head. Its two different styles/genre but looking on them as "games" GEOW doesnt offer that much more than RFOM. Actually RFOM offers more hours to be spend in the game. Longer single player, better story and AMAZING multiplayer, up to 40 players. RFOM fails on only one subject, it doesnt offer anything revolutionary/new, like GEOW does. GEOW is amazing game, but it has 2 many flaws and if you ask me its way over hyped. Short single player, mediocre story and pretty decent(nothing spectacular) multiplayer that can get frustrating are weak points.


Graphics are subjective aren't they?

You say their on par or near GOW while someone else i know with his PS3 thinks RFOM graphics are equal to COD2's.

I wouldn't consider RFOM exactly a title that will move a lot of console sales would you in comparison to GOW? GOW will push console sales and the proof is in the numbers which could be misleading due to the holiday season but the fact remains, the 360 is selling at nearly 5-7X times the rate over the PS3.

In certain aspects, I do concur with your assessments about GOW. Since it was developed by the team that released the Unreal Tournament series, it explains the short SP campaigns.

mterzich
11-29-06, 10:13 PM
Well this can be already seen on some games. Problems with framerate on PS3 games vs same on 360. BUT LETS NOT FORGET THOSE GAMES ARE RUSHED PORTS FROM 360. Your point is good but will have to waith at least one year to confirm it.
If the difference in frame rates are small, that may be due to differences in the GPU. However if they become significant, it may be more likely due to differences in processor power availability.

schticker
11-29-06, 10:15 PM
How is this four pages? Sony fanboys need to own to reality--the 360 is the boss right now.

EVERYONE knew that the PS3 would be a hassle to program for, just like its predecessor. The company that makes computer software knows how to program for hardware, which is something Sony can only hope to effectively emulate.

I WANT to see something for the 360 approach GoW graphically. Go ahead, do it! :o

QUICKerQuestion
11-29-06, 10:33 PM
Wow this thread is very informative and interesting. One question though (because I don't have time to do the research) what is the difference between PPE's and SPEs? Is it that one is more suitable for floating point operations and the other for vector calculations or what? :confused:

5150
11-29-06, 10:37 PM
Even if a person is not technically knowledgeable, I think he can relate to the issues that are involved with each SPE only having 256KB of memory.

Having been involved in computing back when 256KB was a very, very large amount of RAM, I'd like to ask you a question if you don't mind: What is your background, and what are your qualifications to speak on the subject? Depending on what an SPE is being asked to do, 256KB of memory may be far more than needed.

Having followed this thread, most of it seems to be informed speculation with a healthy dose of uninformed speculation, otherwise known as a WAG. What is being discussed has gone beyond hardware specifications into software design and design optimization. If a person hasn't actually programmed for the designs in question, I don't think conclusions arrived at via the speculation is useful. Just my opinion--many people will choose to like it or dislike it based upon their brand preference.

We will rarely hear a cross-platform developer speak candidly about the platforms. It would be against that developer's interest to do so. They would be the ones that could speak with authority and credibility on this topic, but I can't imagine why one of them would. Someone quoted an IGN interview earlier--does anyone think that a developer would have anything other than nice things to say? They have their honest opinions, but the moment they voice them they put the support they get from the engineers at Sony and MS in jeopardy.

This is all a very nice discussion, but any conclusions (and likely many of the "facts") involved are speculative to a degree that makes their value quite dubious.

mterzich
11-29-06, 10:52 PM
Wow this thread is very informative and interesting. One question though (because I don't have time to do the research) what is the difference between PPE's and SPEs? Is it that one is more suitable for floating point operations and the other for vector calculations or what? :confused:
The PPE is a general purpose processor with a large amount of main memory. If there were multiple PPEs such as in the 360, they share the common main memory. The SPE has almost the same execution unit (very little difference performance and capability wise but the execution units in the SPEs do not have branch prediction capabilites but that is mostly solved by the compiler which inserts pre-fetch instructions in the code) as a PPE but a SPE has a small amount of memory (256KB). An SPE cannot communicate directly with other SPEs or the PPE via any shared memory. Therefore there is an Element Interconnect Bus (EIB) which allows data and code to be transfered via DMA (Direct Memory Access - similar to the DMA that is used to transfer data between a hard drive memory and the main memory) between the PPE main memory and the SPE memory.

SPEs and the PPE have vector units and floating point capabilities of approximately the same speed.

The following are the primary disadvantages of SPEs.

Small amount of memory.
Cannot access main memory directly.
Must use DMA to transfer code and data between the main memory and SPE memory (usually done be PPE).
No branch prediction capabilities (compiler can reduce the performance degradation significantly but not 100%).

The following are the advantages of SPEs.

High speed memory (SRAM instead of SDRAM as in main memory).
Large cost savings over PPEs (234 million transistors in 9 core (one is disabled) cell processor (PS3) compared to 165 million transistors in 3 core Xenon processor (360) - 26 million per core on the PS3 and 55 million per core on the 360).

However the cell processor has 1 PPE and 8 SPEs and the PPE probably includes about 70 million transistors because it has to fully acount for its L2 cache, bus controller, and DMA. Therefore each SPE core has approximately 20.5 million transistors.

20.5 million transistors per SPE core on the PS3 includes the following.

1 execution unit.
256 KB SRAM.
1/8th of the transistors for the EIB and DMA.

55 million transistors per core on the 360 includes the following.

1 execution unit.
No SDRAM.
1/3rd of the transistors for 1 MB L2 cache.
1/3rd of the transistors for bus controller and DMA.

So you can see that the cost per core on the PS3 would be about 1/2 of the cost per core on the 360. However the total cost of the cell processor would be about 40% more than the cost of a Xenon processor. However since Sony allows IBM to disable one SPE core, this allows IBM to increase its yield rate dramatically because they disable the defective (some times it will be non-defective) SPE. In turn IBM probably reduces the price to probably about the same price as the Xenon processor. So Sony gets 8 cores for about the same price as 3 PPE cores.

The practice of disabling a core is becoming more common. For example, Intel disables one core of the Intel Core Duo processor when one is defective and sells it as an Intel Core Single processor. So if you purchase an Intel Core Single processor, you are really purchasing a Core Duo processor with a disabled defective core.

Extra
11-29-06, 11:12 PM
Since we're spending quite a bit of time talking about graphics, I thought I'll throw in a few more tidbits of stuff.

Gears of War:
1)
If you pay attention, 3 of the cutscenes in GOW are actually not in real time (that I've noticed. There could be more, but I've only completed the SP once so far). They are (some spoilers, but I assume just about everyone beat GOW by now):
-Kim getting killed by Ramm
-End of Act 3 when you see the resonator activate
-The final scene with the missles going off
I can assure you those scenes are FMVs using in-game resources, however not actually rendered by the 360 hardware. And the reason why they are FMV? Welll...
2)
In the actual real-time cutscenes, characters are at a much higher detail than normal gameplay model. This is what causes the framerate flutuations in cutscenes, and I can tell that when exactly 3+ characters are on screen the framerate start dropping significantly. Epic did a great job trying to hide this fact by trying to keep less characters on screen during cutscenes. And this also partially explains why the previous 3 FMV scenes I mentioned were not rendered in real-time (the other reason being saving on texture loading time).
Keep in mind that although the detail is higher for the cutscenes, they are still nowhere NEAR the original Z brush models. Not even close.
3) One major reason why GOW looks so good is because epic did an exceptional job at LOD (level of detail depending on distance from "viewer"). They use low res textures at long range then switch to high res textures as you get close (sometimes you can catch the moment it switches if you pay attention, but overall it's very seamless and well done). While this is not a new technique by any means, GOW is one of the best examples of making excellent use of that technique.

And just a bit of a throw back to killzone trailer and the polycount difference between CG and real-time today:
Many people may not realize when they see a movie like Shrek just how many polygons Shrek is composed of. I can't tell exactly, but I WILL say without a doubt that it's A LOT. A lot more than what's possible in real-time today. In fact, Toy Story still cannot be done in real time today. Polycount is one of the reasons. When you look at something like polycount, SMOOTHNESS is what gives it away. Sure the charater design of Shrek might be seem simple and easy, but it's exactly this "roundness" that tells you how many polygons you're actually seeing.
Here is a very good example of high res model compared to in-game model:
Time Shift character model poly differences (http://screenshots.teamxbox.com/gallery/1293/TimeShift/p7/)
You will observe that the in-game model looks pathetic at this stage. But no worries, normal mapping does wonders (and the developer might choose to increase polycount a bit if necessary). Although GOW in game models do appear to sport higher polycounts than those timeshift models.
Now go back and look at that killzone trailer. In game model? Hell no, not even close.

This is why I'm apprehensive when companies like EA announce that fight night round 3 uses 2 million polys on each character. No they don't. They probably use 2 million or more on the Zbrush models, but definitely not in game models. One only needs to look at how screwed up their models' shoulders look when they raise their arm to know that's a load of you-know-what.

mboojigga
11-29-06, 11:49 PM
No it doesn't, it depends when the development kits were finished and if the PS3 is more difficult to program then obviously it will take longer for developers to get more out of the hardware.


I get the feeling this is going to be the same excuse said for the next 5 years

mterzich
11-30-06, 12:28 AM
Having been involved in computing back when 256KB was a very, very large amount of RAM, I'd like to ask you a question if you don't mind: What is your background, and what are your qualifications to speak on the subject? Depending on what an SPE is being asked to do, 256KB of memory may be far more than needed.

Having followed this thread, most of it seems to be informed speculation with a healthy dose of uninformed speculation, otherwise known as a WAG. What is being discussed has gone beyond hardware specifications into software design and design optimization. If a person hasn't actually programmed for the designs in question, I don't think conclusions arrived at via the speculation is useful. Just my opinion--many people will choose to like it or dislike it based upon their brand preference.

We will rarely hear a cross-platform developer speak candidly about the platforms. It would be against that developer's interest to do so. They would be the ones that could speak with authority and credibility on this topic, but I can't imagine why one of them would. Someone quoted an IGN interview earlier--does anyone think that a developer would have anything other than nice things to say? They have their honest opinions, but the moment they voice them they put the support they get from the engineers at Sony and MS in jeopardy.

This is all a very nice discussion, but any conclusions (and likely many of the "facts") involved are speculative to a degree that makes their value quite dubious.
As I stated several times during this thread that I am neither a game developer or a gamer. However, I have worked in the computer industry for about 35 years. First as a hardware engineer (mostly with super computers) and the last 20 years as a software developer. As a software developer I have worked on the internals of Unix, Pick, MS-DOS, CPM, and C/CPM. I've developed IBM 3270 applications for on Pick, Windows, Novell, and MS-DOS operating systems. I've developed other applications related to communications such as VPN, DHCP, modem pooling, and Novell NDS. I've developed low level drivers on Windows, Unix, Novell, Pick, MS-DOS, CPM, and C/CPM operating systems. I've developed normal applications on all of those systems plus web based server applications and over the years have used C, C++, C#, Java, Intel and Motorola assembly languages, and several other languages for development. I've developed with hardware ICEs as well as many other development platforms. The last time that I had any experience developing graphics was using a Techtronix display in the 80s.

I will be the first to admit that I don't know much about graphics or AI development. So everything is pure speculation. But without a game developer that has developed a game application using SPEs, we can only make our best judgment based on our own experiences developing different applications.

The hardware specifications of a game console doesn't mean much unless you can discuss the uses related to the intended software. That is why discussing software in relation to the platform is critical. If we only went on specifications, machines such as vector machines or 512 processor machines would have specifications that well exceed high end general purpose computers used by most large companies. They have their purpose but are a niche market.

I suspect that there are a lot of algorithms used in the AI and graphics portions of a game application. I assume that they are quite small, hopefully there is a lot of them, the code and data input and output can be fit into a SPE, can be executed in parallel, and the DMA operations of constantly transfering code and data to SPEs will be performance benifical (there wouldn't be any benefit to doing it if the code only took the same to execute and the overhead to transfer was about the same).

So the best we can do is speculate. The following is the only things we know for certain.

We know cell developement can't use the concept that would be used by the 360 with very large chunks of time consuming parallel code such as the AI on one core and the Graphics code on another core.
We know that most normal library calls cannot be performed using a SPE due to the size of most libraries. For example, you couldn't use OpenGL calls, network calls, etc. Even if those calls were part of the operating system, the SPE cannot make the calls due to the fact that the OS runs in the PPEs memory and he can't get to that memory.
We know that the SPE has very limited memory resoures.
We know the SPE cannot communicate directly with main memory.
One developer that works on both game consoles (who remains anonymous) thinks that in a couple of years some high budget company will develop using SPEs. Why so long into the future? We don't know. ( http://www.hardcoreware.net/reviews/review-348-1.htm )

Srecko1
11-30-06, 03:52 AM
Graphics are subjective aren't they?

GOW will push console sales and the proof is in the numbers which could be misleading due to the holiday season but the fact remains, the 360 is selling at nearly 5-7X times the rate over the PS3.



The only reason why this is happening is that there is shortage of PS3. I think even you know that? So to say that 360 is outselling PS3 is not realistic. Its true but there are reasons why is it like that.
Also sales dont tell much, Ps2 is outselling 360 does that mean PS2 is better console? Simple answer is NO!

mboojigga
11-30-06, 04:18 AM
The only reason why this is happening is that there is shortage of PS3. I think even you know that? So to say that 360 is outselling PS3 is not realistic. Its true but there are reasons why is it like that.
Also sales dont tell much, Ps2 is outselling 360 does that mean PS2 is better console? Simple answer is NO!


Um there is a difference between shortage and delivering. They just launched and didn't delivered as planned. You can't call it a shortage when they just couldn't develope enough for the launch. It is pretty obvious now that those numbers were never reachable and I have stated I didn't even believe those numbers were going to make it in just 40 days from launch.

Srecko1
11-30-06, 04:48 AM
Um there is a difference between shortage and delivering. They just launched and didn't delivered as planned. You can't call it a shortage when they just couldn't develope enough for the launch. It is pretty obvious now that those numbers were never reachable and I have stated I didn't even believe those numbers were going to make it in just 40 days from launch.

No matter what, its stupid to say 360 is outseling PS3 when there are no PS3 on the market. Deliver or not deliver there are no PS3 out there, or if there is, its in SMALL numbers.
At least waith until the end of 2007, then we will know for sure who is outseling who.

Red Cell
11-30-06, 05:20 AM
Anyways please check this "famous" trailer of Killzone 2

http://www.gametrailers.com/gamepage.php?fs=1&id=1668

Browse little down and you will see video clips, press first one where guy with a gun is.

looks real nice. that lvl of graphics might be possible on the PS3, with 4 years dev time.

Sony is known for there advertising using CGI to wow/fool the masses.
remember FFVII ? most of the FFVII tv ads were using 100% CGI.

I still wish the Ultra 64 would have made it to market with FFVII, instead of the watered down N64.

Srecko1
11-30-06, 06:19 AM
looks real nice. that lvl of graphics might be possible on the PS3, with 4 years dev time.

Sony is known for there advertising using CGI to wow/fool the masses.
remember FFVII ? most of the FFVII tv ads were using 100% CGI.

I still wish the Ultra 64 would have made it to market with FFVII, instead of the watered down N64.
Yes but in the end FF7 was or is still best FF game.

Anyways ppl even said that Motorstorm cant look good as it does in video trailers, but Demo is looking at least 90% as E3 2005 video trailers.

But i see your point. Sony over exaggerates, a lot! Like calling PS2 "emotion engine" or "super computer". Or PS3 "its 4D" and other crap. I personally never payed much attention to those things, i look at games if i like them good if not i dont give a damn about "emotion enigne". Hey, "EMOTION" little 2 much!

OfficerDibble
11-30-06, 09:18 AM
So I guess next year when games on the 360 still look as good or better in most cases than what they do on the PS3, you will still be crying:

The 360 is in its 3rd year of development while the PS3 is just hitting its 2nd year..


No I won't. If that's the case as I already have a 360 I probably won't get a PS3.

JData
11-30-06, 09:35 AM
The only reason why this is happening is that there is shortage of PS3. I think even you know that? So to say that 360 is outselling PS3 is not realistic. Its true but there are reasons why is it like that.
Also sales dont tell much, Ps2 is outselling 360 does that mean PS2 is better console? Simple answer is NO!


Yes, there is a PS3 shortage and Sony is slow to ramp up production of potential sales.

Why bring up the PS2? Isn't in a different 'generation' and pay scale? If you want to be objective, keep it objective. For the past several years during the Holiday Season, the Xbox was the number 1 selling console in CONUS.

You can't excuse Sony for not having their estimated. They didn't have it and just dropped the ball on that one. Who's taking advantage of it? Nintendo and Microsoft.

Microsoft could not meet the early demand but there is no 'next generation' console on the market at the time thus no competition.

Bottom line:

A sale is a sale. If you don't have it, you don't have a sale. It is that black and white.


Now let's back to topic.

Bailey151
11-30-06, 09:39 AM
looks real nice. that lvl of graphics might be possible on the PS3, with 4 years dev time.
Sorry, don't think so - it just doesn't have the horsepower. Trailers like this are simply CGI "movies" & have little if any relation to the actual game.

While the PS3 may be able to do some neat tricks with physics type effects it just won't be able to achieve high pixel count graphics with lots of effects.

It will be what it is now, a very nice 720p console that plays movies very well. Given the GPU you have two choices - lower the resolution & up the features like HDR OR up the resolution & drop the features. Go to the store & buy a 7800 series chip - now put as much horsepower as you want behind them.....in games like Oblivion you'll never get playable frame rates in 1080p with all the features turned on.

Personally I don't view this as a bad thing, 720p can look very nice with tons of features. R:FoM pretty well proves this.

And sorry the PS3 will be a PITA for development. The non-unified memory alone tells me this.

Easy example -

The 360 is like having 3 people in the same room processing papers & pulling those papers from a central table. It's easy to control who processes what & share information.

The PS3 is like having 5 - 7 people processing papers in a seperate rooms & having the papers messengered from room to room. It's much harder to determine who's doing what & effectively manage the workload.

My experience is very much like mterzich's though I disagree on the DMA/memory issue. When the PC used segmented memory (extended/expanded) is was a royal PITA (IMHO). Running a single app was okay thanks to page swaps - but running multiple apps was tough as keeping track of who was using what didn't always happen :) If you'll recall this was the norm until DOS was removed from underneath WinDoze. The earlier versions were always plagued with "3 finger salute" issues.

The tiny memory will also plague development. I'm also not a game dev but if I was given that hardware & thought about segmenting code into 128 parts I'd say "screw that, I'll write for a single processor & make it the best I can". The amount of RAM per SPE is just too small to get anything done & would require WAY to much time to figure out reasonable segmenting.

And those who think "it will get better w/ time" I doubt it, the hardware just isn't there. The real issue is that Sony A) really doesn't understand programming or hardware. This is evidenced by none of their products have been dev friendly B) Designed the architecture & then waited an excessive amount of time to stuff BluRay into the box = hardware is no longer relevant.

But then I don't think it's a bad box, it just doesn't have the legs that the 360 does. IMHO it will produce some very nice games - 720p on a large screen w/ lots of features can look really nice.

Disclaimer - I am NOT a fanboy, I own neither console. Personally I don't think MS should have released the 360 on the 90nm die - the upcoming die should have been the release.

OfficerDibble
11-30-06, 09:54 AM
I get the feeling this is going to be the same excuse said for the next 5 years

The final PS3 development kits were shipped in June 2006.

Cynic
11-30-06, 10:19 AM
@mterzich

What do you think about 360/PS3's capabilities regarding AI and physics processing?
I've read multiple times that these consoles' CPUs aren't very good (or are just plain bad) at dealing with this part of game code. Unfortunately, I can't remember any sources, except for a Ars Technica article about 360's CPU (Xenon). I can't post the link because I have less than 5 posts, but if you go to Ars' site, type "xenon" in the search box, choose "Inside the Xbox 360, Part II: the Xenon CPU", and move forward to the conclusion you can read what I'm talking about.

I've also read comments about Xenon being only twice as fast (regarding non-graphics code performance) than Xbox's 733 MHz Pentium 3; is it really that bad? And this claim applies to only a single PPE, or for all 3 of them working together?

Regarding the Cell, I've also read a few times that, despite it being even worse than Xenon for AI, its SPEs can actually be very good at physics processing, if enough effort is put into it, obviously. What do you reckon?

Lastly, and assuming the GameCube/Wii CPUs are out-of-order (I don't know if they are or not), can the sub-1 GHz CPU in Wii perform comparably to the CPUs in 360 and PS3 in AI and/or physics code (or at least a lot closer than the graphics code will ever perform)?

Srecko1
11-30-06, 11:53 AM
Yes, there is a PS3 shortage and Sony is slow to ramp up production of potential sales.

Why bring up the PS2? Isn't in a different 'generation' and pay scale? If you want to be objective, keep it objective. For the past several years during the Holiday Season, the Xbox was the number 1 selling console in CONUS.

You can't excuse Sony for not having their estimated. They didn't have it and just dropped the ball on that one. Who's taking advantage of it? Nintendo and Microsoft.

Microsoft could not meet the early demand but there is no 'next generation' console on the market at the time thus no competition.

Bottom line:

A sale is a sale. If you don't have it, you don't have a sale. It is that black and white.


Now let's back to topic.
Objective? Sorry man but you are not objective. You ask why i used PS2 as an example? Well its best example how sales cant tell much or meen anything. Yes 360 is outselling PS3 wright now but there is a reason for that. Also 360 is doing BADLY in Asia/Japan second market after USA.
You guys (that know a lot about spec) are giving great info to all of us, but most of the gamers (like two of my brothers) dont care if Sony is weaker than 360, they just want Sony because it offers some great exclusive that they want and most if the future console buyer think the same.
Ps1 sold about 102 million units, PS2 until now 111 million units sold, xbox(original) sold "only" 24 million units, so your theory that xbox is selling better than PS2 is not true. You can find on internet this numbers of sales.
360 until now sold around 7.7 million(this is nice and reliable site http://nexgenwars.com/ ) which is great number but so did Dreamcast in first year until PS2 was released. I am not saying that same thing will happend to 360 as it did to Dreamcast, no way since 360 is just to damn good but lets just not underestimate Sony brand and PS3!
Also Wii will catch up pretty soon with 360. They have around 4 million units to send around the world before the end of the year.
And i am not excusing Sony for not havinf enough consoles on market, same or similar situation was with PS2 and look at it now.

Only realistic time to judge how good sales are going for all of 3 console makers is end of 2007.

And i want to make it clear that i am not a FANBOY of any system. Eventualy i will get all 3 of them.

OfficerDibble
11-30-06, 02:59 PM
http://www.gamevideos.com

Search for Afrika then click on the HD trailer.

Rendered or realtime?

Srecko1
11-30-06, 03:39 PM
http://www.gamevideos.com

Search for Afrika then click on the HD trailer.

Rendered or realtime?
Well thats easy even for me, rendered

snyper1982
11-30-06, 03:44 PM
At a time when processor power and GPUs were continually improving the fact that Xbox was released over 18 months after the PS2 gave MS the time they need to develop a technically superior console so again why should anyone be surprised that Xbox games look superior to PS2 games? It's like arguing the P4 is a superior CPU to the P3.

With regards to the PS3 and Xbox 360 that's not the case, there is little to choose between them technically.


I find this quote funny. This is why they abandoned the Net Burst(P4) architecture and moved BACK to a more P3 design in the new Core 2 Duo's right? The P4 is not a superior design.

mterzich
11-30-06, 04:54 PM
What do you think about 360/PS3's capabilities regarding AI and physics processing?
I've read multiple times that these consoles' CPUs aren't very good (or are just plain bad) at dealing with this part of game code. Unfortunately, I can't remember any sources, except for a Ars Technica article about 360's CPU (Xenon). I can't post the link because I have less than 5 posts, but if you go to Ars' site, type "xenon" in the search box, choose "Inside the Xbox 360, Part II: the Xenon CPU", and move forward to the conclusion you can read what I'm talking about.

I've also read comments about Xenon being only twice as fast (regarding non-graphics code performance) than Xbox's 733 MHz Pentium 3; is it really that bad? And this claim applies to only a single PPE, or for all 3 of them working together?

Regarding the Cell, I've also read a few times that, despite it being even worse than Xenon for AI, its SPEs can actually be very good at physics processing, if enough effort is put into it, obviously. What do you reckon?

Lastly, and assuming the GameCube/Wii CPUs are out-of-order (I don't know if they are or not), can the sub-1 GHz CPU in Wii perform comparably to the CPUs in 360 and PS3 in AI and/or physics code (or at least a lot closer than the graphics code will ever perform)?
The article that you pointed to was more about the ability to acquire memory consistently at a high speed more than anything else. True physics code is really the use of vector units since physics is usually based on a large amount of repetive data that can be relatively easily segmented to run in a parallel or possibly segmented multi-core serial fashion. So a cell processor with so many cores and vector units can be made run faster than the Xenon processor if enough small segments of code can be found that will run in either fashion.

The article seems to be over exagerated and seems to be based more on the perspective of the author than real world need of a game console. I have very few qualms with either the Xenon or Cell execution units and cache. It is expected that those types of processors will change the number of stages in the pipelines, have less than desired cache memory, and not well designed branch prediction capabilities. IMO, in the large scheme of things those shortcomings have a relatively small impact on a game console. Probably a much bigger issue (and significantly the greatest performance loss) with either the Xenon or Cell cores is that lack of out-of-order execution. But overall with all the degrading of the cores, a PPE core appears to still execute at approximatly 40% of the lastest full featured processor running at the same clock frequency.

You might think that is very significant and in many ways it is. However when you look at price/performance considerations, that is pretty good especially considering that the Xenon has 3 PPE cores and the Cell has 8 cores. If the applications can be written to effectively use all cores in parallel, the price/performance is greatly reduced over a single core full featured processor at the same clock frequency.

My original post was based more on the useability of the multi-cores since that where I believe the primary benefits arise. Without useability, multi-core processors are a total waste. I suspect that the Cell processor is even further degraded than the Xenon processor. For instance I suspect that there are different number of stages in the vector units since the Cell has about 26 GFLOPs per core and the Xenon has about 38 GFLOPS per core. I also expect that a SPE has a very small amount of L1 cache and any form of branch prediction (but is compensated by the compiler). However, in the large scheme of things, I don't think it is that important.

Its hard to tell exactly how each PPE core will perform in comparison to a 733 MHz Pentium 3 processor but suspect that it will be somewhere in the range that you stated (one PPE core as compared to the Intel processor). So all 3 cores would probably have between 6x-8x the power of the original xbox processor). I'm basing that on the benchmarks that were run between the PPE on the cell processor and a full featured PowerPC. There probably are some performances differences between the cell PPE and the Xenon PPE but probably not too significant.

http://www.geekpatrol.ca/2006/11/playstation-3-performance/

Using that benchmark and various other benchmarks with Intel processors (thats not to accurate since there are compilier and benchmark code differences between benchmarks on different platforms) and the current cost of processors (cost may reflect performance because manufacturers have to remain competive), it is pretty well established that the Xenon and Cell processors are under powered in relation to full featured processors. However, they should be sufficient for this generation game consoles if the power of each processor can be harnessed. Also SPEs probably don't perform as well as PPEs but it would be difficult to tell since comparable benchmarks can't be run on a SPE.

The Wii processor is expected to have about 1/2 of the performance of one PPE core on the cell or xenon processor.

It all comes down to price/performance. A full featured single core 2.4 GHz PowerPC (IMB 970 GX) sells for about $223 in quanities and the core is about 2x as powerful as one PPE core on the 3.2 GHz Xenon processor. Microsoft may be able to get that price down a bit but I doubt that it would approach the price of the Xenon processor (probably significantly below $100).

http://www.ppcnux.com/modules.php?name=News&file=article&sid=6562

mterzich
11-30-06, 07:10 PM
My experience is very much like mterzich's though I disagree on the DMA/memory issue. When the PC used segmented memory (extended/expanded) is was a royal PITA (IMHO). Running a single app was okay thanks to page swaps - but running multiple apps was tough as keeping track of who was using what didn't always happen :) If you'll recall this was the norm until DOS was removed from underneath WinDoze. The earlier versions were always plagued with "3 finger salute" issues.

The tiny memory will also plague development. I'm also not a game dev but if I was given that hardware & thought about segmenting code into 128 parts I'd say "screw that, I'll write for a single processor & make it the best I can". The amount of RAM per SPE is just too small to get anything done & would require WAY to much time to figure out reasonable segmenting.

I assume when you are referring to page swaps, you are referring to the original Windows 3.1 and not DOS. If that is what you are referring to, that could create unexpected problems because of locking of memory that had to be done with some realtime applications plus issues with the memory management done by the OS.

You are right that most people will not even attempt to segment a program. Before I changed the IBM 3270 program to allow for swapping of segments, 3 other engineers made an effort and gave up on the project after about 100 code changes. When I looked at the project, I immediately knew that I would need to change 1,000s of lines of code needed to be changed to complete the project.

If the 3270 application was a normal business application, there wouldn't have been too much of a problem. A normal business application just has to be segemented according to functionality and then implement a normal overlay scheme. However this was a realtime communications program and any of the functions of the full program could be called whenever a network interrupt occured or the user sent a packet.

Therefore any data that was common between any code seqments had to be kept in memory at all times. This meant that all data references had to be referenced double indirectly through a table (eg. **ptrToTable->Data1) since the table and seperate data could reside in any part of memory (not the same place always) and the pointers in the table had to be set up when the application was started. So for example when an network interrupt occured, the program had to save the assembly registers, pull the network driver into memory and patch the location that pointed to the table, and then execute the network driver code. If the driver called another segment, special code had to be created to save the parameters and pull the new segment in over the top of the network driver and then call the new segment with the saved parameters. When the new seqment returned to the network driver code, a special return routine was called and the return parameter would be saved, the network driver is brought back into memory, the parameter is reloaded, and a branch is made back to the network driver.

So basically a virtual memory operating system where only one frame could reside in memory at one time was created. These were the things that had to be performed.

Segment the code into about 100 segements.
Create an table of pointers that pointed to all data that was common between any segment.
Create a pointer in each code segment that pointed to the table. That pointer would be patched each time the code segment was brough into memory.
Modify all code that was originally using the values that was shared between segments to reference them though the table.
Create specialized routines that always remained in memory to page code segments into memory and pass parameters. One routine would have to be created for each type of call that would be made (eg. Call2ParameterStringCode(ModuleNameToCall, EntryPointToCall, StringParameter1, StringParameter2) would pass two string parameters to the called routine).
Modify all cross segment calls to use those routines (eg. if the call was originally Display3270Screen(ptrToData) this would be changed to Call1ParameterStringCode(ModuleNameToCall, Display3270Screen, ptrToData).
Create specialized return routines that reside in memory that will return to a segment that had initially made a cross segment call so that the code can be paged into memory and the return parameter can be passed back. One routine would have to be created for each type of return (eg. ReturnToPageCodeWith1IntegerParameter(IntParameterToReturn) would return an integer to the original caller).
Modify all returns that return to a page code segment to perform the modified return code (ex. original return may have been return(1) and that would be changed to ReturnToPageCodeWith1IntegerParameter(0)).

That is the basics of the code changes that were required. There was much more such as the code had to be segmented in such a way that code should to stay in memory for at least a millisecond at a time otherwise the overhead of the paging would reduce performance very drastically. Also assembly language routines had to be developed to adjust the stacks on cross segment calls and returns.

Typically when one network packet was received, code segements were swapped in at a very fast rate and the same code segment may be swapped in several times for processing just one packet. Fortunately the 3270 communicated at only 9600 baud so even though I was paging in several hundred frames per second during a file transfer, performance degradation was not noticeable.

Cynic
11-30-06, 07:12 PM
However when you look at price/performance considerations, that is pretty good especially considering that the Xenon has 3 PPE cores and the Cell has 8 cores. If the applications can be written to effectively use all cores in parallel, the price/performance is greatly reduced over a single core full featured processor at the same clock frequency.Also SPEs probably don't perform as well as PPEs but it would be difficult to tell since comparable benchmarks can't be run on a SPE.
Thank you very much for the detailed reply.

So, if Microsoft or Sony were to include a single core out-of-order CPU, which according to Carmack's speech at QuakeCon 2005 would have been a better option from a developer's perspective, the systems would be much more expensive for the consumer (or sold at a greater loss) and the end results wouldn't be much better than or even as good as what can be achieved with these cheaper alternatives (assuming there will always be a handful of developers capable of harnessing these CPUs' power regardless of how diffcult/expensive/time-consuming it is)?
In other words, should we (the gamers) be glad Microsoft and Sony went multicore?

And what about the bandwidth differences between 360 and PS3?
IIRC, PS3 has 2 buses, for a total of about 50 GB/s total system bandwidth (22 GB/s to the video RAM and the rest for the main memory); 360 has the embedded DRAM in the GPU which supposedly gives it 256 GB/s bandwidth for the frame buffer, but only 22 GB/s between Xenon and the main memory. It's usually reported that PS3 is severely bottlenecked in its graphics memory, but what about the 360 main bandwidth of "only" 22 GB/s?
Who "wins" in this case? 360 partisans would argue it's clearly the Microsoft machine because 22+256 > 50, but I'm having a hard time believing it's that simple.

darthrsg
11-30-06, 08:16 PM
One of the best threads ever...

mterzich
11-30-06, 08:27 PM
So, if Microsoft or Sony were to include a single core out-of-order CPU, which according to Carmack's speech at QuakeCon 2005 would have been a better option from a developer's perspective, the systems would be much more expensive for the consumer (or sold at a greater loss) and the end results wouldn't be much better than or even as good as what can be achieved with these cheaper alternatives (assuming there will always be a handful of developers capable of harnessing these CPUs' power regardless of how diffcult/expensive/time-consuming it is)?
In other words, should we (the gamers) be glad Microsoft and Sony went multicore?

And what about the bandwidth differences between 360 and PS3?
IIRC, PS3 has 2 buses, for a total of about 50 GB/s total system bandwidth (22 GB/s to the video RAM and the rest for the main memory); 360 has the embedded DRAM in the GPU which supposedly gives it 256 GB/s bandwidth for the frame buffer, but only 22 GB/s between Xenon and the main memory. It's usually reported that PS3 is severely bottlenecked in its graphics memory, but what about the 360 main bandwidth of "only" 22 GB/s?
Who "wins" in this case? 360 partisans would argue it's clearly the Microsoft machine because 22+256 > 50, but I'm having a hard time believing it's that simple.
Using a full featured 2.5 GHz PowerPC would currently probably add about another $100 or more to the cost of the console. A year ago when Microsoft delivered the 360 system, it probably would have added $200. These companies can only bleed so much. Adding another $200 or even $100 to the cost of the Sony console would not have been possible since I doubt that a gamer would like to pay $699 and $799 for a PS3. According to most estimates, Sony is currently losing about $250 per console. From a business aspect, this is unsubstainable. By all projections, Sony will probably lose about $2.5 billion per year initially if they sold 10 million systems per year. I doubt that they would get more than $15 (and probably a lot less) per 3rd party game so they would need to sell at least 17 3rd party games to every owner to break even.

If both consoles can harness the power, they should perform very well. I suspect that most games on the 360 are already (or can easily be done) using the 2nd core fairly effeciently. I discovered the other day that the basics of game program is to put the AI and Graphics generation on different threads. If game development was originally done as defined by the basics, converting it to dual core shouldn't take a developer more than a couple days (it is only a matter of synchronization and semaphore usage being slightly more difficult). Since the AI and Graphics are real power funtions, two cores should probably be fairly well balanced. So as far as the 360 is concerned, I think it was an extremely wise descision to implement multi-core.

As far as the PS3, maybe those developers know something that I don't know and would be able to use the SPEs effectively without a great deal of effort. Even if they were able to do that, it would be hard to understand how it would be an easy implementation keeping all 6 game SPEs (or even 2) fully utilized.

The issues of bandwidth is very partisan and I'd really rather not get into that too much since it probably doesn't have a major impact on how well the too consoles will perform (a lot of nit picking). However, as far as the GPU is concerned, it appears that the 360 has the upper hand there.

But when it comes to the GPUs, the more powerful the better but again at what cost. Using a very powerful latest generation GPU is not feasible from a cost basis.

snyper1982
11-30-06, 09:58 PM
Well i dont agree with you. RFOM is first generation game for PS3 and look awsome. GEOW is the only game on 360 that looks and feels better than RFOM. Not counting Oblivion that in not exclusive and will be ariving on PS3.If you take a look at screenshots of many PS3 exclusive titles, many of them even thou they are first generation games, look better or even as 360 second generation games that will be released in next year.



Gears IS a first gen game..... It is Epics first game out for the 360.

Jetrii
11-30-06, 10:19 PM
Here is a (crude) comparison of the PS3's CPU and a 1.6 Ghz G5. The benchmarks are only for the PPE though, so it's a 3.2 Ghz in-order-exeuction core VS a 1.6 Ghz out-of-order G5.


http://www.geekpatrol.ca/2006/11/playstation-3-performance/

dpe8598
11-30-06, 10:22 PM
Ultimately, the product is all that matters. What games have come out. Its OK to speculate a bit on what might come out, but the 360 and PS3 are so close technically that it is a bit futile.

I own both systems and I have to say that right now the 360 has a lot of great games. In the PS3s favor, I have to say that I didn't expect any of the games to be that great and Resistance and the Motorstorm demo are incredible. The biggest thing I like about those 2 games is the gameplay. Incredibly smooth and a ton of fun.

I have to admit though, I'm on my way home (on the train) right now and when I get home I'll be playing the 360. PS3 has its work cut out if it wants to gain my favor in the next year.

JerryNY
11-30-06, 11:56 PM
And what about the bandwidth differences between 360 and PS3?
IIRC, PS3 has 2 buses, for a total of about 50 GB/s total system bandwidth (22 GB/s to the video RAM and the rest for the main memory); 360 has the embedded DRAM in the GPU which supposedly gives it 256 GB/s bandwidth for the frame buffer, but only 22 GB/s between Xenon and the main memory. It's usually reported that PS3 is severely bottlenecked in its graphics memory, but what about the 360 main bandwidth of "only" 22 GB/s?
Who "wins" in this case? 360 partisans would argue it's clearly the Microsoft machine because 22+256 > 50, but I'm having a hard time believing it's that simple.

Not to nit pick but I think the 360 has 32GB/s and 256GB/s for the 10GB of EDRAM on the GPU die. For some of the same reasons the PS3's GPU is thought to be somewhat bottlenecked the 360's is most certainly not. The beauty of the 360's entire CPU-GPU architecture is that unlike the PS3 you can pretty much use every last bit of the full 32GB/s. Think of what it means for a second. Say I have the CPU humming away spitting out that 32GB/s to the that lovely 10MB's of EDRAM which can handle 256GB/s. That means that the GPU can handle everything the CPU can dish out and while it is waiting for its bandwidth challenged compatriot to feed its ravenous appetite it can do nice things with those future pixels. If you want 4XAA, or anything else that needs to reprocess those pixels over and over again for that matter, you are in in great shape. It can play around with the data almost 10 times before it starts to get backed up. This basically means the 360's GPU can do many things that normally start to slow down a GPU penalty free.

-Jerry C.

schticker
12-01-06, 01:48 AM
One other interesting thing is the fact that Sony allows for easy Linux installation on the PS3. Much of the mysteries of what the PS3 is really capable of should be "discovered" by Linux geeks pouring over the hardware. It should be fascinating to see what they can do with it.

-Jerry C.

Which matters not if game developers can't make use of it. Raw numbers and stats don't mean a hill of beans if it doesn't translate to in-game action.

wreckshop
12-01-06, 02:11 AM
isn't it an accepted fact that CELL is more powerful than XeCPU? even john carmack has come out and said so, and he usually has nothing good to say about PS3. MS does have the advantage with better development tools, though. this is where MS experience as a software provider is evident. but in terms of raw performance PS3 has the advantage.

CELL has more than enough power to decode TWO simultaneous 40mbs AVC streams. in contrast, decoding a single hd dvd stream is "difficult" on XeCPU.

once developers figure out to effectively utilize the SPEs you will see the difference in games. things like physics simulations, AI routines, particle effects etc..

mterzich
12-01-06, 02:27 AM
So what does it mean that the 360 uses a more standard 3 core processor and the PS3 uses the specialized cell processor? If all things were equal, it would seem that the 360 had the advantage. However, since all things are not equal, I suspect that it is still too early to draw any conclusions.

Why did Sony use the cell processor?

Since it is pretty much agreed that the cell processor is rather difficult to develop for, I've pondered this question and still cannot can not come up with any reason that makes a lot of sense.

Sony has some very bright people so it doesn't make much sense that they couldn't see that there would be issues with the cell processor.

Did Sony marketing apply pressure on the engineering department to provide a large amount of cores on the PS3 so that they could market the game console as the game console of the future? Then did engineering produce a design that would be cost effective to satisfy the marketing department and not necessarily well designed as a game console?
Did the chief engineer go off on a tangent and none of the subordinates dared to question the design and management didn't understand the potiental concequences of the design?
Did Sony believe that developing games on a non-standard processor would be a non-issue?
Did Sony believe that they would be able to deliver the PS3 to the market at the same time or before the 360? If they did, were they trying to completely eliminate Microsoft as a competitor by developing a processor with a non-standard design? Did they assume that if the PS3 was delivered at the same time as the 360 that developers would not develop games using any of the capabilities of 360 and would only use the PS3s enhanced capabilites for killer applications? Did Sony assume that they had such a large fan base that even if games were only designed for single core capability, Microsoft could not survive?

More Questions Than Answers

At this time it is difficult to draw any conclusions.

I suspect if Microsoft had used the cell processor for the 360 and Sony had used a more standard processor for the PS3 and Microsoft had a 1 year lead to market as they currently have, Microsoft would probably have a very hard time competing in the game console market in the future. However, even with being very late to market with a non-standard processor, it appears that Sony will still be a major competitor.
Even though it appears that the 360 is better designed for game development than the PS3, if developers design their applications for the lowest common demonitor using only single core capabilities, Sony should probably maintain its dominance in the market.
What will the effect of price have on market share?
How will an intregrated BD player on the PS3 or an external HD-DVD on the 360 be perceived by the gamers?
How will the capabilities of HDMI and 1080p for games be perceived by gamers?
How will the style of the consoles be perceived by the gamers?
Will it be possible to convince the Sony fan base to abandon Sony even if differences in games between the consoles are significantly in favor of the 360?
Which console will have more and better exclusive killer applications?

Extra
12-01-06, 02:30 AM
^^ Cell is more powerful than xenon AT CERTAIN things. It's undeniable that it has more raw floating point calculation power than Xenon.

Except game code relies on more than just floating point. So it's difficult to sum up real-world performance difference based on one aspect.

mterzich
12-01-06, 02:44 AM
isn't it an accepted fact that CELL is more powerful than XeCPU? even john carmack has come out and said so, and he usually has nothing good to say about PS3. MS does have the advantage with better development tools, though. this is where MS experience as a software provider is evident. but in terms of raw performance PS3 has the advantage.

CELL has more than enough power to decode TWO simultaneous 40mbs AVC streams. in contrast, decoding a single hd dvd stream is "difficult" on XeCPU.

once developers figure out to effectively utilize the SPEs you will see the difference in games. things like physics simulations, AI routines, particle effects etc..
I agree that the cell procesor probably has 2x or more the total processor power than the xenon processor. However, due to the limited memory (256KB) of each SPE, it would be impossible to fit the decoding code, one 720p MPEG frame (166 KB per frame at 10 mb/s) and one output frame (3 MB) into that memory at one time. That is a total of 3.2 MB or more. Therefore even if the input frame and output buffer was divided into 7 equal parts, that would still require about 500 KB of memory in each SPE. I suppose the PPE could divide the input frame and output buffer into 14 equal parts and more data to and from each of the 7 SPEs two times per frame.

wreckshop
12-01-06, 02:48 AM
well this is a comparison of CPUs right? CELL destroys XeCPU in FP performance. what does XeCPU excel at that CELL doesnt? I'm pretty sure CELL has beter integer performance too.

wreckshop
12-01-06, 02:51 AM
I agree that the cell procesor probably has 2x or more the total processor power than the xenon processor. However, due to the limited memory (256KB) of each SPE, it would be impossible to fit the decoding code, one 720p MPEG frame (166 KB per frame at 10 mb/s) and one output frame (3 MB) into that memory at one time. That is a total of 3.2 MB or more. Therefore even if the input frame and output buffer was divided into 7 equal parts, that would still require about 500 KB of memory in each SPE. I suppose the PPE could divide the input frame and output buffer into 14 equal parts and more data to and from each of the 7 SPEs two times per frame.

sorry, but what's your point? CELL can decode 2 40mbs AVC 1080p streams using only 3 SPEs. oh and also, SPEs don't have cache, they have local store.

Extra
12-01-06, 03:10 AM
From what I understand, The xenon is much better than cell at branch prediction, however bad both systems are at it (compared to out-of-order CPUs). The fact that only the PPE in the cell is capable of branch prediction while the SPEs lack this function entirely, gives the Xenon 3x the power in branching since all 3 cores in the Xenon are PPE.

A lot of non-graphics code are branch intensive. AI, game control, and some degree of physics.

mterzich
12-01-06, 03:39 AM
sorry, but what's your point? CELL can decode 2 40mbs AVC 1080p streams using only 3 SPEs.
Are you saying that there is currently code available that is doing that or is the statement just based on processing power requirements? I assumed that the PS3 currently uses hardware decoders. If you think differently I would like to know. Also I would be interested if anyone has ever produced a decoder that uses multiple SPEs because it would be very interesting to determine how they accomplished it even on a therory basis due to the memory limitations of the SPEs.

I was using 7 SPE just to point out possible complexities since I not even sure it is possible with the limited memory in the SPEs. I could have used 3 SPEs in the example with more passes per frame. So the point is, is it possible with the limited memory available in SPEs to segement everything to such extremes or are we only talking about processor power requirements?

I assume that MPEG frames have dependencies both in the frame and between frames such as GOPs. Since I have never developed a MPEG decoding program, I'm not sure how tight those dependencies are within a frame or even between frames. If the dependencies are very tight and stretched over the complete frame, it may not be possible or at least extremely difficult to segment the frame into many pieces to be handled by different cores in parallel. I'm just not sure. However it may be possible to have different cores work on different frames in parallel which would allow the xenon processor to have each core working on a different frame in parallel since there is plenty of memory. I'm not sure exactly how they would perform the operation on the Xenon processor but I'm assuming the each core get its own frame. If there are dependencies between frames, it may be possible to communicate between PPE cores using common memory.

I'm also not sure that a segemented output buffer (part of a raw bit map) can also be used. Since the information in a MPEG frame is not sequential from left to right and top to bottom, it seems that only part of bit map ouput buffer could not be used but instead some type of intermediate code would need to be generated that would have to be decoded and merged with the other produced buffers from all passes for all SPEs for that frame by the PPE.

Is it possible that the decoder application would perform the decoding in a sequential fashion? A sequential fashion is where the PPE feeds chunks of MPEG data to the first SPE which does a partial calculation on the chunk of the MPEG data and then passes the output to a second SPE which does more calculations on those results, and passes those results to the third SPE which will perform the final calculations and will pass final results to back to the PPE which will produce the bit map. In this way the MPEG data is processed sequentially, only small chuncks of data are in any SPE memory at any time, everything is done sequentially yet all SPEs are executing at the same time once the stream is moving, and no bit map is required in any of the SPEs.

JerryNY
12-01-06, 04:39 AM
Which matters not if game developers can't make use of it. Raw numbers and stats don't mean a hill of beans if it doesn't translate to in-game action.

Very true but in the context of a thread being driven by comparison between different CPU architectures I meant that it was fascinating to see what the PS3 can be capable of performing, not just for gaming.

well this is a comparison of CPUs right? CELL destroys XeCPU in FP performance. what does XeCPU excel at that CELL doesnt? I'm pretty sure CELL has beter integer performance too.

I would be careful about throwing around the term "destroying" when talking about FP performance between the two. Apple's use of the Altivec SIMD on their G4 and G5 based offerings showed that in massively parallel code the Ativec was a monster and could "destroy" all x86 based SIMDs from Intel and AMD but this only really helped in things like video editing and encoding. The problem is you have to look at more than just the superficial, things like what is the precision of those FP units and how exactly do you keep them fed without starving them as they zip through the code. Ultimately Altivec never seemed to benefit Gaming in particular on the Mac side of things very much at all. I tend to look at the SPE's in a similar fashion, spectacularly good at what they do but somewhat limited in the scope of what they can be fed.

An interesting side note is that when Apple was having major problems cramming G5's into portables (a.k.a. PPC970 based on the heavy iron Power4 monster so it was not originally based on a design with even one iota of thought toward power savings) IBM was trying to push CELL on them for the portable space. Apple apparently passed on it because of severe limitations in general code and bad OOOE. Keep in mind that a desktop PC/Mac does have to deal with lousy branchy code more than a game console would, at least theoretically.


Why did Sony use the cell processor?

Since it is pretty much agreed that the cell processor is rather difficult to develop for, I've pondered this question and still cannot can not come up with any reason that makes a lot of sense.

Sony has some very bright people so it doesn't make much sense that they couldn't see that there would be issues with the cell processor.

---

What will the effect of price have on market share?
How will an intregrated BD player on the PS3 or an external HD-DVD on the 360 be perceived by the gamers?
How will the capabilities of HDMI and 1080p for games be perceived by gamers?
How will the style of the consoles be perceived by the gamers?
Will it be possible to convince the Sony fan base to abandon Sony even if differences in games between the consoles are significantly in favor of the 360?
Which console will have more and better exclusive killer applications?
[/list]

I do wonder if Sony's insistence on putting the BD player in all PS3's will make the PS3 a better media machine at the expense, literally, of making a reasonably priced console. How much would a BD-less PS3 cost? Wouldn't that have made it priced down there with the 360? It even raises dev costs as BD discs cost more to make and devs need charge more for their games to make this up or eat it. This may not be all that much money but more money is more money.

They don't even seem all that serious about making it the center of your living-room with the lack of an IR receiver. Right now I have a 360 hooked up with the HD DVD addon being controlled by my Harmony 880 and all is well. You take the PS3 home and after gaming you decide you want to watch a nice BD movie but until you can get the Sony bluetooth remote for it you have to use a game controller to run it. Even when it does use the BT remote, that is another remote you have to use even if you have a nice universal IR one. Their whole strategy eludes me. The PlayStation division of Sony is apparently the only one making money. Using their only current money maker to push a next gen movie disc format that may not even be relevant in 5-10 years when everyone wants to just download their movies seems like a bad idea.

-Jerry C.

Bailey151
12-01-06, 09:09 AM
First of all I'd like to say thanks to all who contributed to this thread - nice to have a detailed discussion of hardware without any of the "nee-ner....nee-ner" garbage.

Sony has some very bright people so it doesn't make much sense that they couldn't see that there would be issues with the cell processor.
I don't doubt that they're smart, I just see it as different backgrounds. Sony is an entertainment company - that the way they approach the designs. None of their concoles have been "easy to program". They seem to have been designed with entertainment in mind. MS on the other hand is a software company with a long history of developing code for various hardware. They have a great deal of experience with "resource pinch" & "future proofing" (no such thing but I lack a better term). It only makes sense that their console would be easier to program & have a more standard architecture. One could make the argument that MS is the creator of DX10 = they have an inside track on future GPU requirements = able to make a better choice. Sony simply took the best bang-for-the-buck available at the time.

Personally from what I've read (here & prior) I think the number of processors is simply marketing (& Sony KNOWS marketing) - much akin to the Ghz war Intel waged with the P4. The P4 was simply awful from a pure design standpoint but you could ramp the Ghz through the roof = the average Joe simply buys by the "number" = tons of sales. Few companies can compete with Sony when it comes to slick marketing.

Apple's use of the Altivec SIMD on their G4 and G5 based offerings showed that in massively parallel code the Ativec was a monster and could "destroy" all x86 based SIMDs from Intel and AMD but this only really helped in things like video editing and encoding.
Yep, outside of a narrow application they're pretty lame processors......but.....they do what gaming consoles need very well & they're can be affordable when stripped = good for gamers :)

I assumed that the PS3 currently uses hardware decoders.
Hmmmmm......we can guess that they use software scalers.....thread (http://www.avsforum.com/avs-vb/showthread.php?t=759951)

IMHO the PS3 was designed to be a media device, good but not revolutionary gaming performance & solid media playback.

Waters_10
12-01-06, 10:19 AM
I don't want to go too much off-topic, but I have to make a comment regarding PS2 sales.

Ps1 sold about 102 million units, PS2 until now 111 million units sold, xbox(original) sold "only" 24 million units, so your theory that xbox is selling better than PS2 is not true.

Yes, PS2 is still selling like hot cakes! It's just incredible! But, don't you think that whoever is buying a PS2 right now (or bought this year, for that matter) is not even considering buying a 360 or a PS3 in the near future? I don't think somebody that bought a PS2 this year is considering a PS3/360 for now, no matter if they prefer Sony over MS or vice-versa! So the huge monthly sales of PS2 hardware should concern both MS and Sony! Sure, Sony is making money from those sales, and some of these people will buy anyway after one or two price cuts, but don't forget that Sony has to catch up with the year head start MS had, and installed base next year can drive publisher decisions one way or the other.

I'm just surprised with the number people that is happy enough with current gen stuff! I'm sure high prices next-gen has a lot to do with it.

JData
12-01-06, 10:37 AM
Objective? Sorry man but you are not objective.

snip....


Alright man. You've missed my point. I'll pass on continuing the discussion since I don't want to hi-jack the thread.

Cynic
12-01-06, 10:52 AM
Sony is an entertainment company - that the way they approach the designs. None of their concoles have been "easy to program".
I don't think that's true. AFAIK, the original PlayStation was very easy to program for, which resulted in giving the Sega Saturn a run for its money. When Nintendo 64 arrived, it was already too late for the race.

PS1 was easy to program for because it needed to be; PS2 didn't. When it launched, "PlayStation" was already the second or third biggest brand in the world, it was bound to be successful. Same with the PS3.

wreckshop
12-01-06, 11:25 AM
Are you saying that there is currently code available that is doing that or is the statement just based on processing power requirements? I assumed that the PS3 currently uses hardware decoders. If you think differently I would like to know. Also I would be interested if anyone has ever produced a decoder that uses multiple SPEs because it would be very interesting to determine how they accomplished it even on a therory basis due to the memory limitations of the SPEs.


there is an interesting article published on PC Watch (japanese) detailing the AV capabilities of PS3. here is a (not mine) translated excerpt:

H.264 decoding itself was not very difficult for Cell with moderate optimization and they could play a movie in realtime at the first try unlike very difficult SACD optimization. However, because they began the development without knowing the final Blu-ray standard, they set the goal very high for decoding 2 full HD H.264 streams at 40Mbps simultaneously. Besides the clockspeed of the devkit was lower than the final product which made the development difficult. The current decoder can decode full HD H.264 with 3 SPEs.

wreckshop
12-01-06, 11:28 AM
I would be careful about throwing around the term "destroying" when talking about FP performance between the two.

IBMs own published figures for FP performance IRC -

CELL: 238gflops
XeCPU: 77gflops

to me, thats destroying.

Bailey151
12-01-06, 11:55 AM
I don't think that's true. AFAIK, ..........................

PS1 was easy to program for because it needed to be; PS2 didn't. When it launched, "PlayStation" was already the second or third biggest brand in the world, it was bound to be successful. Same with the PS3.
Oops :o forgot about "da one". The 3 will be successful but I think the difficult programming will hurt them this time. Aside from the few platform exclusives (& there are very few this time) nobody is going to spend the time, it's just too expensive. As noted in the beginning we'll likely see many more generic coded programs.

IBMs own published figures for FP performance IRC -

CELL: 238gflops
XeCPU: 77gflops

to me, thats destroying
Well, that & $4.50 will get you a cup of coffee. FP performance just isn't the determining factor (anymore). To me the Cell, though tough(er) to develop, will do just fine - it's the GPU I see as the limiting factor, it just doesn't have it. The SPEs may or may not have the ability to take up some slack but I'm just not convinced that they can do the "final pretty work" on the graphics.

Jetrii
12-01-06, 12:09 PM
IBMs own published figures for FP performance IRC -

CELL: 238gflops
XeCPU: 77gflops

to me, thats destroying.


That is the the theoretical peak performance. Using a cell with 8 SPEs, IBM was only able to achieve 150 Gigaflops in a perfect enviroment. Gaming is far from perfect. I suspect that the Cell processor is not going to pass 60-70 gigaflops in games. That said, Xenon won't pass 60-70 either.

Cynic
12-01-06, 12:12 PM
IBMs own published figures for FP performance IRC -

CELL: 238gflops
XeCPU: 77gflops

to me, thats destroying.
Emotion Engine: 6.2 GFLOPS
Xbox CPU: 2.9 GFLOPS

Once again, more than twice as much. It can't be that easy comparing the two.

muzzakus
12-01-06, 01:25 PM
And let's not forget MS is at it's core a software company. It's easy to see why they would see the benefits of the 360's architecture(& ATI's GPU). Sony is an entertainment company - they would likely see the media processing power as very attactive.

Wow, that's so true.

mterzich
12-01-06, 01:34 PM
Hmmmmm......we can guess that they use software scalers.....thread (http://www.avsforum.com/avs-vb/showthread.php?t=759951)
there is an interesting article published on PC Watch (japanese) detailing the AV capabilities of PS3. here is a (not mine) translated excerpt:
Thanks guys for the information. Now if Sony would just issue a White Paper detailing how they accomplished it. It would be interesting for us non-game developers as well game developers how it was accomplished. It would be nice to know if they did it in a parallel or segmented sequential fashion. If they did it in a parallel fashion and there was dependencies, how were the dependencies handled? Also if it was done in a parallel fashion, how did they handle the output buffer issue?

Without a White Paper game developers may be scratching their heads. With a White Paper at least it gives them some idea how to implement a real world situation even though it may not totally relate to a game application.

wreckshop
12-01-06, 01:35 PM
That is the the theoretical peak performance. Using a cell with 8 SPEs, IBM was only able to achieve 150 Gigaflops in a perfect enviroment. Gaming is far from perfect. I suspect that the Cell processor is not going to pass 60-70 gigaflops in games. That said, Xenon won't pass 60-70 either.

I believe you are mistaken. CELL optimized applications can acheive near theoretical maximum performance if most of the work can be offloaded to the SPEs.

Emotion Engine: 6.2 GFLOPS
Xbox CPU: 2.9 GFLOPS

Once again, more than twice as much. It can't be that easy comparing the two.

what does that have to do with the performance of CELL vs XeCPU?

muzzakus
12-01-06, 01:42 PM
One other interesting thing is the fact that Sony allows for easy Linux installation on the PS3. Much of the mysteries of what the PS3 is really capable of should be "discovered" by Linux geeks pouring over the hardware. It should be fascinating to see what they can do with it.

-Jerry C.


Excellent point. The homebrew enthusiast and the hardcore demo coders may tap into some of PS3's goodness and prduce something not yet seen. However - this is all theory. But if anyone were to prove the PS3's immense power, these guys will. Only issue is that games (being a enterprise to make money) may never follow suit due to the economics.

Muz

JerryNY
12-01-06, 01:44 PM
IBMs own published figures for FP performance IRC -

CELL: 238gflops
XeCPU: 77gflops

to me, thats destroying.

Well those numbers are pretty but the CELL in the PS3 uses doesn't use 8 SPE's. Those numbers are derived from extrapolating what 1 SPE does and multipliying it by 8 to come to a nice big number. This means that theoretically each SPE is capable of 238gflops /8 which comes out to 29.75gflops. One SPE is disabled right off the bat to improve yields and lower costs so when we drop to 7 the total figure drops to 208.25gflops. A game developer has only access to 6 of the SPE's, one is reserved for the OS so it cannot even be touched. That is another 29.75gflops dropping the total to 178.5gfops as the real world theoretical maximum. Now this isn't the only thing that will limit real world gflops. The PS3 OS reserves the right to yank a second SPE out of use at a moments notice w/o choice so you can lose another SPE at any time so you can lose another 29.75gflops. Do you see where this is headed? With a few keystrokes I chopped off ~90 gflops from the CELL and this doesn't even take into account that the original number of 238 is a multiple of a SINGLE SPE's maximum theoretical number. To make matters worse all 8 SPE's , only 7 in the PS3 are active, share a common 512KB L2 cache which was fully available to a single SPE to get the nice 29.75gflops number. Sharing that cache with a bunch of other SPE's will only cause the numbers to dive even further. Pretty numbers usually hide an ugly truth behind them.

All is not so gloomy though. The 360's CPU can't really reach its theoretical maximum either but its less exotic architecture and lack of GPU bottleneck offset this somewhat.

-Jerry C.

muzzakus
12-01-06, 01:46 PM
It sounds like it may scare the hell out of developers committed to Sony. Japanese companies have a commitment to each other but that can only be carried so far. If developing a killer application takes 2x or more the cost and time on the PS3 compared to the 360, they'll probably start to rethink their strategy.


I'm not so sure. The Sony badge will carry it through regardless. The games will still be there, but they will be done economically - to run on just the one core. Sony will have their games and will have the sales. Especially in Japan. The only thing that Sony will lack is that the games will be Inferior to 360 throughout it's whole lifesap. But in Japan - they will for the most part not even know, or care.

Muz

mterzich
12-01-06, 01:50 PM
Excellent point. The homebrew enthusiast and the hardcore demo coders may tap into some of PS3's goodness and prduce something not yet seen. However - this is all theory. But if anyone were to prove the PS3's immense power, these guys will. Only issue is that games (being a enterprise to make money) may never follow suit due to the economics.

Muz
That is true but first some has to modify the linux operating system to provide support libraries to handle the loading, monitoring, and execution of SPEs. Is the version of linux that is used for the PS3 an open architecture version (eg. not something like Red Hat)?

muzzakus
12-01-06, 01:52 PM
Having been involved in computing back when 256KB was a very, very large amount of RAM, I'd like to ask you a question if you don't mind: What is your background, and what are your qualifications to speak on the subject? Depending on what an SPE is being asked to do, 256KB of memory may be far more than needed.

Having followed this thread, most of it seems to be informed speculation with a healthy dose of uninformed speculation, otherwise known as a WAG. What is being discussed has gone beyond hardware specifications into software design and design optimization. If a person hasn't actually programmed for the designs in question, I don't think conclusions arrived at via the speculation is useful. Just my opinion--many people will choose to like it or dislike it based upon their brand preference.

We will rarely hear a cross-platform developer speak candidly about the platforms. It would be against that developer's interest to do so. They would be the ones that could speak with authority and credibility on this topic, but I can't imagine why one of them would. Someone quoted an IGN interview earlier--does anyone think that a developer would have anything other than nice things to say? They have their honest opinions, but the moment they voice them they put the support they get from the engineers at Sony and MS in jeopardy.

This is all a very nice discussion, but any conclusions (and likely many of the "facts") involved are speculative to a degree that makes their value quite dubious.

Nah, thats rubbish mate. If I was a console developer and knew anything about the in's and outs of this hardware I'd spill the beans, even if it meant anonymously. Let's not get crazy - books will be published for these "computers", textbooks, whitepapers. It's not a big secret when it's in plain site.

Muz

mterzich
12-01-06, 01:55 PM
To make matters worse all 8 SPE's , only 7 in the PS3 are active, share a common 512KB L2 cache
-Jerry C.
Actually I misquoted earlier that they share a 512KB L2 cache. That is not true since they use a different mechanism and really doesn't use cache. Sorry about my misquote and I already corrected that in earlier documents.

muzzakus
12-01-06, 02:23 PM
That is true but first some has to modify the linux operating system to provide support libraries to handle the loading, monitoring, and execution of SPEs. Is the version of linux that is used for the PS3 an open architecture version (eg. not something like Red Hat)?

Perhaps then with a stray development kit...natively. Not sure how tight Sony's policies are however. Potentially a "stray" kit may not ever even have a hope of existing in the first place.

Muz

JerryNY
12-01-06, 02:29 PM
Actually I misquoted earlier that they share a 512KB L2 cache. That is not true since they use a different mechanism and really doesn't use cache. Sorry about my misquote and I already corrected that in earlier documents.


Ok, thanks for the correction.

http://www-128.ibm.com/developerworks/power/library/pa-cellperf/

I am not sure if someone posted this link but it has a nice flowchart and some nice info on the CELL.

-Jerry C.

wreckshop
12-01-06, 02:51 PM
The PS3 OS reserves the right to yank a second SPE out of use at a moments notice w/o choice so you can lose another SPE at any time so you can lose another 29.75gflops.

proof?

To make matters worse all 8 SPE's , only 7 in the PS3 are active, share a common 512KB L2 cache which was fully available to a single SPE to get the nice 29.75gflops number.Sharing that cache with a bunch of other SPE's will only cause the numbers to dive even further. Pretty numbers usually hide an ugly truth behind them.

???? each SPE has 256 local store (NOT CACHE) which can access main memory directly. once again, if an application is well optimized for cell, and offloads a lot of work to the SPEs then you will see close to theoretical maximum performance. why do you think the scientific computing community is so excited about the prospect of using CELL?

mterzich
12-01-06, 03:43 PM
Ok, thanks for the correction.

http://www-128.ibm.com/developerworks/power/library/pa-cellperf/

I am not sure if someone posted this link but it has a nice flowchart and some nice info on the CELL.

-Jerry C.
That is a good flowchart. I hadn't seen it before.

mterzich
12-01-06, 04:01 PM
proof?
The article is a little confusing and I not sure if he meant 1 in 7 or another 1 of the 7 since 1 in 7 is already reserved all the time by the OS.

The costs for the PS3’s operating system are as follows

32mb of the 256mb of available GDDR3 memory off the RSX chip
64mb of the 256mb of available XDR memory off the Cell CPU
1 SPE of 7 constantly reserved
1 SPE of 7 able to be "taken" by the OS at a moments notice (games have to give it up if requested)

http://dpad.gotfrag.com/portal/story/35372/?spage=6

mterzich
12-01-06, 04:31 PM
I suspect that IBM had some ulterior motives when designing the cell. To cover the development costs they developed something that could be used for more than a game console so it appears that they are going to use the cell in the worlds largest non-gereral purpose super computer.

http://en.wikipedia.org/wiki/IBM_Roadrunner

Jetrii
12-01-06, 04:40 PM
I believe you are mistaken. CELL optimized applications can acheive near theoretical maximum performance if most of the work can be offloaded to the SPEs.

IBM did optimize for the cell when testing...Yet, they only reach around 150 Gflops. It doesn't matter how much optimization is done in a game, it will never pass 100 gigaflops on the cell. Games just don't work like that. Game code is very chaotic.

mterzich
12-01-06, 05:48 PM
Cost of PS3 Operating System

32mb of the 256mb of available GDDR3 memory off the RSX chip
64mb of the 256mb of available XDR memory off the Cell CPU
1 SPE of 7 constantly reserved
1 SPE of 7 able to be "taken" by the OS at a moments notice (games have to give it up if requested)
I have a few questions.

The above talks about GDDR3 and XDR memory. Isn't all the main memory XDR memory? Is the author just refering to it as GDDR3 memory since it is used by the GPU?
The above would indicate that only only 160 MB is available for the PPE for the game application. Does that sound correct?
Anyone what to venture a guess why 64 MB of memory would be used for the operating system? I would image the memory buffers for communications to SPEs would need to be allocated in the game applications memory. You would think that 8 MB would be enough for the os and disc buffers. After all, it is not a virtual os or and barely a multi-tasking system. Either the os must be using very large disc buffers or all the libraries are part of the os instead of individually included in the game application when needed.

Cost of 360 Operating System

32MB of the 512mb of available GDDR3 RAM
3% CPU time on Core1 and Core2 (nothing is reserved on Core0)

There doesn't seem to be any memory allocated for the OS and disc buffers? Why? Is there a seperate memory for the OS and buffers? In the 360, I image the core os is kept in non-volatile memory such as EPROM since the console is not required to have a hard drive but you would think the os along with buffers would be transfered and setup in main memory.

Extra
12-01-06, 06:13 PM
Hmm that was something that was discussed at Beyond3D forum a bit recently. It comes down to the "96" mbs taken by PS3 OS being pure speculation, with no substantial proof supporting it.

Blkout
12-01-06, 06:52 PM
It's comments like this that make it so difficult to have an intelligent discussion on the subject. This is a good discussion. Don't derail it with this kind of petty, childish BS.


Don't quote me unless you have something intelligent to say.

Blkout
12-01-06, 06:53 PM
Says you.

http://en.wikipedia.org/wiki/1080p#Storage_format



Says almost everyone, even the experts. Don't let marketing turn you into a sucker.

Blkout
12-01-06, 06:56 PM
Erm - just because I am partial to Sony, I'm hopefully still allowed to have an opinion.

If you actually read what I wrote, you'll notice that I do respect the 360, albeit I feel the PS3 has bigger potential.

I don't see myself as blind nor ignorant - however, by posting your comment, I do feel that you have those traits.



I would certainly expect you to feel that way since I cut your throat. Again, save your posts for the PS3 forum where they would better appreciated.

wreckshop
12-01-06, 11:17 PM
The article is a little confusing and I not sure if he meant 1 in 7 or another 1 of the 7 since 1 in 7 is already reserved all the time by the OS.

The costs for the PS3’s operating system are as follows

well one SPE is always used for the OS and most likely security functions. I would guess if a second SPE is given up, the situation would be similar to when the guide button is pressed on the 360 controller, so it doesnt matter if the game has to give up the SPE since you cant play the game anyways.

Cynic
12-02-06, 12:01 AM
The above talks about GDDR3 and XDR memory. Isn't all the main memory XDR memory? Is the author just refering to it as GDDR3 memory since it is used by the GPU?
The above would indicate that only only 160 MB is available for the PPE for the game application. Does that sound correct?

No, 256 MB are XDR @ 1GHZ to the Cell, the other 256 MB are GDDR3 @ 700 MHz to RSX (though recently there have been unconfirmed reports of it being downgraded to 650 MHz, and RSX's clock to 500 MHz).
So apparently it would be 192 MB available for Cell and 224 MB for RSX.

mterzich
12-02-06, 12:48 AM
No, 256 MB are XDR @ 1GHZ to the Cell, the other 256 MB are GDDR3 @ 700 MHz to RSX (though recently there have been unconfirmed reports of it being downgraded to 650 MHz, and RSX's clock to 500 MHz).
So apparently it would be 192 MB available for Cell and 224 MB for RSX.
Thanks. I was missreading what the author posted. So if I read it correctly. both the 360 and the PS3 have a total of 512 MB of memory but the 360 uses GDDR3 memory that is shared between the processor and the GPU whereas the PS3 has two memory buses with one bus containing 256 MB of XDR memory and the other bus has 256 MB of GDDR3 memory. Is that correct?

Extra
12-02-06, 02:00 AM
Thanks. I was missreading what the author posted. So if I read it correctly. both the 360 and the PS3 have a total of 512 MB of memory but the 360 uses GDDR3 memory that is shared between the processor and the GPU whereas the PS3 has two memory buses with one bus containing 256 MB of XDR memory and the other bus has 256 MB of GDDR3 memory. Is that correct?

Yes, but the XDR runs at 3.2 ghz, not 1ghz.

talbain
12-02-06, 03:02 AM
No, 256 MB are XDR @ 1GHZ to the Cell, the other 256 MB are GDDR3 @ 700 MHz to RSX (though recently there have been unconfirmed reports of it being downgraded to 650 MHz, and RSX's clock to 500 MHz).
So apparently it would be 192 MB available for Cell and 224 MB for RSX.


it's not unconfirmed. sony quietly announced these revisions to the clock speeds several months ago...

Cynic
12-02-06, 08:10 AM
Yes, but the XDR runs at 3.2 ghz, not 1ghz.
I stand corrected.
I have no idea where I got that 1 GHz from, sorry.

Cynic
12-02-06, 08:46 AM
it's not unconfirmed. sony quietly announced these revisions to the clock speeds several months ago...
Could you please point to a source? All the PS3 sites I went to still claim the 550/700 MHz figures.

Scotty L
12-02-06, 09:16 AM
Says almost everyone, even the experts. Don't let marketing turn you into a sucker.
k, sounds like you don't have a TV that accepts and displays 1080p. try saving up because this gimmick really looks great!

Daekwan
12-02-06, 11:54 AM
k, sounds like you don't have a TV that accepts and displays 1080p. try saving up because this gimmick really looks great!

Put your tv besides one accepting and displaying the same image in 720P at your normal viewing distance and you'd be rather surprised at the difference.. or should I say 'lack thereof'

I'm not calling it a gimmick.. but trust me.. the difference is not anywhere near the hype unless you are sitting ridiculously close to the screen or have a display over 60"

krinkle
12-02-06, 08:37 PM
Here is a three page article/interview from Insomniac. (They developed Resistance). Anyway, these are people who ARE extremely successful experienced game developers, and ACTUALLY DO have hands on experience making games for the Cell processor, the OP of this thread has niether of those things and only speculates.

Insomniac does not agree with many of the points expressed by the OP, although they do agree with some. The main difference however, is that Insomniac thinks the Cell has the capability to be an awesome gaming chip!

Game programs have historically been written as serial sequences of instructions. With the exception of graphics rendering, which has long been off-loaded to specialized chips, the program generally requires that the processor do one thing at a time. But the Cell offers the chance to accelerate gaming by executing some parts of the program in parallel.

Moving to a system where there are potentially nine processors computing at once improves the game play dramatically, but it also vastly increases the complexity of the programming challenge. In game-development lingo, the work of a program is divided into what are called systems. A system can range from a bit of code that draws a puddle to something more broad and complex, like computing the gore-splattering details of the meeting of a bullet and an alien’s leg. Last summer, Insomniac was in the process of moving roughly 12 major systems onto the SPEs at once.

Bringing a game to life on the Cell is a tightly syncopated process. It revolves around queries sent by the PPE to the team of SPEs. The queries request information on real-time events—such as at what point a piece of shrapnel hits an object in the game. By off-loading functions to the eight SPEs, the system has more time to devote to the delectable details: more characters onscreen, more complex artificial intelligence routines, more realistic skeletal structures within the game characters’ bodies, and, overall, an environment that makes you feel like you are inside the game. There is surround sound instead of stereo, with over 100 sounds happening simultaneously versus the 24 possible on the PlayStation 2. In Ratchet & Clank on the PlayStation 2, Insomniac was able to support 10 players competing against each other online; in Resistance: Fall of Man, they set a new record—40. On the PlayStation 2, a character could have only 30 different animated movements in a game; on the PS3, there are over 330. The payoff: the characters won’t resemble awkward manikins anymore—they grimace, scowl, and follow you with their eyes.

So how do programmers decide which tasks the PPE should be doing and which tasks the SPE should tackle? Alex Hastings asks two key questions about each task to make the choice. First, would the task take up a lot of the PPE’s time if it ran there? And second, is it the kind of job that lends itself to what the SPEs do well? “We looked for the easiest wins first,” he says. Animation and calculating collisions between objects are perfect fits, says Hastings. So those are the primary jobs Resistance doles out to the SPEs.

Even such perfect fits require some compromises. Ideally, software could automatically allocate tasks to whichever of the SPEs has the most time on its hands, but in order to simplify the programming, Insomniac was forced to dedicate two SPEs exclusively to collisions. Two processors are needed in the most demanding situation, one with lots of players, monsters, and bullets all moving around at once. “In games you’re more concerned about the worst-case scenario rather than the average,” explains Hastings. If you aim for the average, then there will be many times when the processors can’t finish the job in time and the game stalls. But by the same reasoning, most of the time those two processors aren’t being used to the fullest.

Ultimately, in future games, Insomniac will try to get almost all tasks running on the SPEs. “The holy grail that people writing games on the Cell are ultimately trying to reach is to get…the real highest-level decision making [onto the SPEs],” says Hastings. “I think that, based on where we are now, that’s still a few years away.”

READ THE TRUTH HERE --->http://www.spectrum.ieee.org/dec06/4745

muzzakus
12-02-06, 09:02 PM
Here is a three page article/interview from Insomniac. (They developed Resistance). Anyway, these are people who ARE extremely successful experienced game developers, and ACTUALLY DO have hands on experience making games for the Cell processor, the OP of this thread has niether of those things and only speculates.

Insomniac does not agree with many of the points expressed by the OP, although they do agree with some. The main difference however, is that Insomniac thinks the Cell has the capability to be an awesome gaming chip!



READ THE TRUTH HERE --->http://www.spectrum.ieee.org/dec06/4745


STOP LISTENING TO THE FUD!


I'm confused. I just read that 3 page article and to me it backs up the points raised in this thread? Am I mising something - the whole thing reads as a big whinge on how difficult and hard it all is and how it's going to take some years to get to the bottom of it all.

I mean the reality here is - what is the benchmark? The work that Insomniac did on the cell - could thay have done it better on the 360, faster, quicker, smarter?

Insomniac is a 1st party Sony developer, if they themselves don't use the Cell - then god help the PS3. Multi platform games unfortunately will not follow suit I suspect.

I have been reading lots of other material and formulating my own ideas on this - and it seems from my perspective that the PS3 is a vehicle for establishing Sony's patented Blue Ray. The machine is designed around decoding video streams - the gaming is shoe-horned in. It's essentially a blueray player with gaming ability.

Wrong or right - my oppinion only. Although somewhat of an educated one after closely following all this for a while.

Muz

JerryNY
12-02-06, 09:20 PM
READ THE TRUTH HERE --->http://www.spectrum.ieee.org/dec06/4745


STOP LISTENING TO THE FUD!

Then why are you posting FUD lol ;) Resistance is a very nice game no doubt but Insomniac is hardly a disinterested 3rd party. Do they even make games for anyone other than Sony? The article finishes talking about how much potential the CELL has and I agree wholeheartedly that it has tons of potential. The article mentions the challenges of getting all the SPE's working to their utmost but I am not sure I want to invest in a $600 console right now that Mr. Hastings is quoted as saying “I think that, based on where we are now, that’s still a few years away.” A few years away for a console can be an eternity, especially since Microsoft decided to shrink the MTBNCs (Mean Time Between New Consoles) ;) .

It was mentioned above that the CELL is being looked on with excitement from the scientific community but don't confuse that with the PS3 specifically. They are looking forward to massive supercomputers with more CELLs in them than you can shake a stick at. Why do you think it is called the CELL in the first place? The PS3's choice of CELL is one of the more interesting ones they could have made but you pay for being exotic in more ways than money, not to mention they coupled a bleeding edge CPU technology with a GPU who's tech has already seen its better days.

-Jerry C.

mterzich
12-02-06, 10:15 PM
Warning this document is very technical.

After much pondering and reading, I think I have figured out 90% of the peculiarities involved in developing a MPEG decoding applications using SPEs.

Issues Using SPEs
The average developer normally does not envision developing an application set forth by the limitations of the SPE. I suspect that Sony had to find someone with a hardware engineering background, a software engineering background, and experience developing decoders to implement encoders using SPEs.
Development using SPEs are not simple.
Less power intensive decoders such as MPEG 2 can be developed using only one SPE but more power consuming decoders such as MPEG4 may require 3 or more SPEs.
Due to dependencies of the video image in relation to the MPEG stream, the application would probably be developed in a parallel fashion (each SPE decodes part of the same frame using the same video buffer).
DMA is used to to read data from the main memory to the SPEs memory as well as to write data from the SPEs memory to the main memory.
When programming DMA transfers, performance will be better on large block transfers than on small block transfers.
Performance will be better if the transfer starts on 128 byte boundries in both the main as well as the SPEs memory.
When using DMA to read data, the programer should expect that it will take a large number of clocks of overhead to initiate the DMA transfer, a significant amount of time for the transfer to complete, and then get the application restarted once the transfer is complete.
Even if a read memory transfer is initiated on 128 byte boundries, at least 90 clocks will pass before the 1st byte will arrive in the SPEs memory. Once the 1st byte arrives, all the remaining data will arrive at the maximum 25.6 GB/s transfer rate if there isn't any bus conflicts.
One DMA write of less than or equal to 128 bytes can be very fast (no latency) and can possibly transfer at the full 25.6 GB/s if memory conflicts do not occur. However, multiple blocks of any size that are written one right after another or intermixed with main memory reads can and will usually cause a large performance degradation.
The effect on processor performance of DMA reads can be reduced greatly with program sophistication. As an example, a DMA read can be intiated for a 2nd buffer while the SPE is executing code to decode the 1st buffer. When the SPE finishes decoding the 1st buffer, it can than initiate a DMA read for the 1st buffer and immediately start decoding the 2nd buffer. This technique is know as the ping-pong effect and the only performance hit is the cost of initiating the DMA operation. However, the SPE code still has to be careful since the buffer may not be in the SPEs memory by the time it is ready to decode that buffer due to memory conflicts.
Video decoder applications inheriently have an effect on performance due to potientially very large number of DMA operations that may be required for handling the video buffer. Sophistication can reduce the performance degradations but cannot seriously reduce the degradation for decoder type applications. Other types of uses for SPEs more than likely will not have as many serious performance issues if any at all. At the same time, there is other code that is not suited at all for SPE processing and could probably never be implemented in an SPE.
If DMA is handled improperly and sophistication is not used, up to 70% of the total processor power could be lost.

Basic Understanding of MPEG Encoding

Before developing a MPEG decoder application, you have to understand what is in the MPEG stream. The MPEG stream contains macroblocks that each describe changes to specific areas of the previous screen. If the macroblock simply defined data that changed from the previous screen to the current screen, all that has to occur is that the data of that area would be needed to be overwritten with the new data. However if the macroblock contained motion compensation data, the original area of the previous screen would need to be inspected and then changed according to the new motion compensation data.

Developing a MPEG2 Decoder Application

The following is a description of how to develop a decoder program using SPEs that was written and tested by IBM. MPEG2 is an easy and fast stream to decode and the example may only need to use one SPE to decode the stream since with sophistication, proper byte alignment, and optimization in the code, a HD MPEG2 stream may be able to be decoded in realtime. According to IBM an average of about 18 mb/s can be decoded and produce up to about 75 frames per second with one SPE. However, since the decoder application was only tested using a simulator and maxium frame time was apparently never calcualated, 2 or possibly even 3 SPEs may be required since frames with a large amount of motion compensation could potientally cause frames to be lost during a realtime situation. This is due to the fact that heavy use of DMA will be done during those conditions. Decoding MPEG4 streams will require 3 SPEs but each SPE will have similar code but each will be decoding part of the same frame.

The following is a block diagram of a MPEG2 decode.

http://www-128.ibm.com/developerworks/power/library/pa-cellperf/figure5.gif

If you have difficulty reading the text on the above block diagram, see the following link below and scroll down to near the bottom of the page.

http://www-128.ibm.com/developerworks/power/library/pa-cellperf/

In main memory, the PPE will first create a buffer (aligned on 128 byte boundry) and will move one frame from the MPEG2 stream to that buffer. Next the PPE will create a properly aligned video buffer (2,764,800 bytes for 720p or 6,220,800 bytes for 1080i or 1080p) and clear that buffer. Next the SPE is started and told to start decoding the MPEG2 frame from the location that it resides in main memory and use the main memory video buffer for output. The problem is that a normal raster scan or video screen (left to right and then top to bottom) video buffer is not well suited since the number of DMA operations required would be very high due to the small continious data block sizes that are related to MPEG streams. It would require a large amount of DMA reads and writes of small amounts of data to create the screen and the performance would significantly degrade. So therefore the developer will need to reorganize the video buffer to be much more suited to the macroblock concept where the macroblocks would contigious data in the video buffer. Well you may say that is stupid because if that video buffer was sent to the GPU, what you saw wouldn't make any sense (it would just be garbage). That is true so therefore after the video buffer is completely decoded for the frame, the PPE would have to reorganize it as a raster scan buffer. The following is a more detailed description of the reorganization.

Motion compensation required small pixel block transfers (for example, 16x16 pixels) from system memory to local store in order to construct the predicted blocks. If a video frame were stored in raster scan manner, then motion compensation would require numerous small DMA transfers (for example, 16 transfers of 16 bytes). In the Cell BE, however, the DMA transfers are most efficient when performed on 128B naturally aligned boundaries. In order to increase the efficiency, the data structure was rearranged to put each macroblock in a 384B contiguous area, consisting of a 16x16 pixel luminance block and two 8x8 pixel chrominance blocks (in the case of the 4:2:0 format). With this arrangement, a macroblock could be retrieved from the main memory to local store with fewer 128B DMA transfers.
So after the SPE starts to execute, it would create either one or two properly aligned input buffers. One buffer could be used if there was enough available memory to always accomidate a complete MPEG2 frame or if the developer was not that interested in developing code that was very sophisticated (maybe as much as 200 KB may be available if the code and other data structures did not occupy very much memory but maybe as little as 20 KB if they occupied a large amount of memory). However two input buffers (each half the size of the available memory) would be created if the developer decided to use the ping-pong method for performance enhancements.

Next the SPE would DMA either part or all of the MPEG2 frame into the SPEs memory depending on whether it would fit. If two buffers were used, the SPE would initiate a DMA read for the second buffer but not wait for it to complete. The SPE would now decode the first macroblock in the first buffer. When the decoding for that macroblock is completed and if motion compensation is not required, the video information would just be written to the main memory video buffer using the developers new video buffer concept using DMA (several DMA operations may be required for each macroblock). If motion compensation is required, several DMA operations may be required to read part of the video buffer that relates to that macroblock to the SPEs memory. The SPE program would then modify those blocks of data with the motion compensation information and again DMA that operation back to the main memory video buffer (several DMAs may be required).

Next the SPE will check if there is more macroblocks in the input buffer. If there is, another macroblock decode will be performed. If there is not and the frame is not completely processed, it will initiate another another DMA read. If a second input buffer is available, it will start decoding that buffer while the DMA read is being performed. If a second buffer is not available, the SPE will wait until the DMA is complete before processing more macroblocks.

If the frame has been completely processed, the SPE stops execution, the PPE will than create a standard video buffer out of the one created by the SPE, and then issue that video buffer to the GPU. The PPE will then move the next MPEG frame from the stream to the input buffer and start up the SPE again telling it to decode this frame using the same input buffer and the last video buffer that was originally created by the SPE.

Developing a MPEG4 Decoder Application

Developing a MPEG4 decoder application should not be much more difficult than developing a MPEG2 decoder application. The primary difference is the processor power requirements are greater so multiple SPE will be required, the decoder algothrithms are different, and the defined concept of the video buffer may be different.

In the case of MPEG4, 3 SPEs will be required to decode MPEG4 in realtime. The primary difference is that 3 totally seperate main memory MPEG buffers will be created (1 for each SPE). Each of the buffers will contain 1/3 of a MPEG frame. Each SPE will be notified which buffer to use but all will use exactly the same video buffer. There isn't any conflict between the three SPE processes since they are working on seperate parts of the frame and will change different areas of the video buffer.

Review of Difficulties

Limited SPE memory requires segmentation of input and output buffers.
The use of DMA can cause performance issues.
Alignment of memory boundries are important.
Sophistication can improve performance (ex. ping-pong method).
Changing the concept of how the video buffer is organized improves performance.
Concepts are different than what most developers are used to.
Implementation is much more difficult than implementing on a 3 PPE processor. On a PPE processor, all the MPEG frame is directly accessable at one time, a standard raster scan concept is used for the video buffer, alignment of boundries are not important, and sophistication is not required. Developing a decoder application on a PPE is pretty standard programming.

Useful Links

http://www-128.ibm.com/developerworks/power/library/pa-cellperf/
http://en.wikipedia.org/wiki/Synergistic_Processor_Element

mterzich
12-02-06, 10:34 PM
Here is a three page article/interview from Insomniac. (They developed Resistance). Anyway, these are people who ARE extremely successful experienced game developers, and ACTUALLY DO have hands on experience making games for the Cell processor, the OP of this thread has niether of those things and only speculates.

Insomniac does not agree with many of the points expressed by the OP, although they do agree with some. The main difference however, is that Insomniac thinks the Cell has the capability to be an awesome gaming chip!



READ THE TRUTH HERE --->http://www.spectrum.ieee.org/dec06/4745
I applaud him for his honesty. There is not one single thing that he stated in the article that I disagree with or contradicts what I have posted.

JData
12-02-06, 10:40 PM
Krinkle,

I doubt you'll see Insomanic talk bad about the PS3's design but more probably more towards Sony due to the screwup on the 1080i situation.


Muzzak:

I have been reading lots of other material and formulating my own ideas on this - and it seems from my perspective that the PS3 is a vehicle for establishing Sony's patented Blue Ray. The machine is designed around decoding video streams - the gaming is shoe-horned in. It's essentially a blueray player with gaming ability.


You may be onto something based on mterzich's post.

Bailey151
12-03-06, 08:17 AM
Muzzak:

I have been reading lots of other material and formulating my own ideas on this - and it seems from my perspective that the PS3 is a vehicle for establishing Sony's patented Blue Ray. The machine is designed around decoding video streams - the gaming is shoe-horned in. It's essentially a blueray player with gaming ability.


You may be onto something based on mterzich's post.
Which is what I've thought from the beginning, if for no other reason than they delayed the launch waiting for the BD end to be ready.

On the plus side those who wanted the PS3 as a budget BD player got exactly what they wanted :D

JData
12-03-06, 09:05 AM
Which is what I've thought from the beginning, if for no other reason than they delayed the launch waiting for the BD end to be ready.

On the plus side those who wanted the PS3 as a budget BD player got exactly what they wanted :D


Great vision Bailey! :)

Shape
12-03-06, 01:26 PM
Check this out:
http://www.joystiq.com/2006/12/02/e3s-ps3-motor-storm-presentation-vs-reality/

Note that all of the major explosions where there is lots of physics going on is all done in slow motion in the real game.

I really wonder if that was out of necessity, rather than for the effect.

mterzich
12-03-06, 03:10 PM
Check this out:
http://www.joystiq.com/2006/12/02/e3s-ps3-motor-storm-presentation-vs-reality/

Note that all of the major explosions where there is lots of physics going on is all done in slow motion in the real game.

I really wonder if that was out of necessity, rather than for the effect.
I suspect that there is a lot of physics in a game application and Sony will have to produce superior applications if they want to compete with Microsoft for killer applications. With enough resources (developers, training, and money) the PS3 is one powerful game console. Even without a major amount of parallel physics code in a a game application, the SPEs can be programmed to use other code. If the all the SPEs can be used effeciently (physics code or not), the PS3 should be able to have more sophisticated killer applications than the 360.

All the physics code in the world in a game application will not make any difference if the code cannot be executed in parallel. So the primary objective of developers would be to find code (physics or not) that can be run in parallel and is small enough to fit into SPE memory. Dependencies are the primary reason that that code can not be run in parallel. If the data that is needed for calculations is not yet available, the calculations can not be performed. This is no different than trying to prepare your taxes a year early but would be impossible since you don't know what your income, deductions, etc. will be.

The major question is how will those applications be financed and will they be profitable. Some companies that developed earlier applications had to sell off the company to raise the cash to develop the application. Reserve cash is very important and without it a killer application cannot be developed.

If a killer application cost $10 million more to develop on the PS3 than the 360 for the same results and $20 million more to produce superior results, will game application companies risk the money to develop the game? Will Sony selectively finance game development by either giving low cost loans or out right gifts to some companies? Out right cash gifts may not be as far fetched as it seems just to get some really killer applications on the market?

The rising cost of video game development is a much talked about topic lately, especially as team sizes grow, more and more assets are needed for a project, and consumers start expecting more and more from their gaming experiences. All told, the overall cost of development for a next-gen game can eclipse $20 million.


This is clearly where Nintendo's Wii has an advantage, however. According to THQ Chief Executive Brian Farrell, while an investment in an Xbox 360 or PS3 game might be in the range of $12 million to $20 million on average, the money required for developing a title on the Wii can be as little as half that (or less), with an investment generally ranging from $5 million to $8 million. "It's that order of magnitude lower," Farrell explained to Reuters.

mterzich
12-04-06, 04:23 AM
This document is a long document designed for the semi-technical person. It goes though basic memory and processor theory so those people can grasp the more complex understanding of branch prediction. If you are highly technical, you may wish to skip furthur down of the document.

When reading different articles, there have been conflicting opinions concerning branch prediction in the SPEs such as the following.

SPEs cannot assist with AI code since an SPE does not have branch prediction capabilities.
Compiler options allow for options that will insert branch prediction instructions into the application to assist with branch prediction.

Due to the second statement I’ve always defended the design of the SPEs lack of branch prediction capabilities. Even the first statement is not true since even with the lack of branch prediction capabilities, AI code could still be off loaded to SPEs as long as the SPEs was executing in parallel at the same time as the PPE core. Even if the SPE ran at ½ the speed or less of the PPE, it would still produce beneficial results as long as it was executing in parallel. However, it would not make sense to execute AI code in the SPE if the PPE was idle.

Another issue was how did IBM create a SPE core using less than ½ the number of transistors of the PPE core? Removing only the branch prediction capabilities would not accomplish that. Was there some association between branch prediction capabilities and the vastly reduced transistor requirement?

The only way to determine that was to look at the design of the SPE core and compare it to the design of the PPE core. However the design specifications that are available to the public is limited in detail so there isn’t any absolute way that a person can be 100% sure. However with enough experience in hardware design a person can make some assumptions of how the timings are produced and the assumption would more than likely be true.

After a thorough investigation of the design it becomes obvious that many of the transistors that were removed have a major impact on branch prediction or any sort of memory access performed by the SPE.

Basic Memory Concepts for Processor Access

Memory access times are the only reason that branch prediction is performed. If memory access was instantaneous or nearly instantaneous, branch prediction would not be necessary. Therefore a basic understanding of memory concepts must be understood to understand the impact on the performance without branch prediction.

For processors, memories are layered in the system in such as way to give the illusion of high speed even for the slowest of memories. The concept layers the memories in such a way that the slowest to the fastest are layered so that the processor usually communicates with only the fastest memory which is feed by a slower memory which is feed by an even slower memory until the chain reaches the slowest memory. Each memory gets smaller in size as it traverses the chain from slowest to fastest. With a proper hardware design a processor may seldom stall (become idle) waiting on memory. Without a proper hardware design, missing or incorrectly sized memory in the chain, or too slow of a memory in the chain, the processor may stall on a regular basis. In the worst design (fortunately none of that type are produced today), processor speeds can go at a crawl (99% of the power could be lost) and in the best designs the processor could execute near 100% efficiency.

Memory Types

The slower the memory the less amount of power it will use plus it will generate less heat.

There are only two basic types of memories (except specialty memories but are not used for processor performance) that are normally used in today’s desktop computers or game consoles. The following are those basic types.

DRAM (Dynamic Random Access Memory) - This type of memory is primarily used as the main system memory. Almost all main memories use this basic technology but enhance this basic technology to make the memory appear to be faster. But in reality all of those memories are using this very slow 60-nanosecond technology. This type of memory also requires refresh cycles that will cause overall performance to be reduced even further. This type of memory is always external to the processor chip and resides on the bus.
SRAM (Static Random Access Memory) – All other memories in the chain between the main memory and the processor usually use this type of technology. This memory is much faster than DRAM and can be between 10-500 times faster. Most of the time this type of memory are an integral part of the processor chip but can be external in some cases.

The following are some of the possible uses of memory in the memory chain and some of the types that are available.

Main Memory – This memory is very slow memory (usually 30 nanoseconds or about 100 clocks at 3.2 GHz for the first piece data to arrive at the destination). However after the first access is accomplished, depending on the design of the memory, this type of memory can stream data at very high rates (usually 1 clock cycle per memory location which is usually 64 bits or 8 bytes).

SDRAM is the basic DRAM memory with access times of 60 nanoseconds.
DDR2 is double data rate technology. The core is still 60 nanoseconds but tricks were played to make it work at twice that speed.
GDDR3 memory uses the DDR2 technology as the core memory but other techniques such as bank phasing and channels allow for high speed streaming to occur. These memory are primarily used for graphics memory.
XDR memory use techniques such as bank phasing and channels allow for high speed streaming to occur. These memories are usually used for main memory.

L2 Cache – This is a SRAM type of memory of moderate speed (usually about 2-5 nanoseconds or about 6-16 clocks cycles at 3.2 GHz). The memory size is usually 512 KB, 1 MB, or 2 MB. The memory is organized in blocks to reflect different parts of the main memory.
L1 Cache – This is a SRAM type memory of high speed (usually .3 nanoseconds or 300 picoseconds or faster when clock speed is 3.2 GHz and can be accessed by the processor usually causing no more than a 1 clock stall). The memory size is usually about 32 KB of instruction cache and 32 KB of data cache but could be other sizes. The memory is organized in blocks to reflect different parts of the main memory and also usually reflects different parts of the L2 cache.

The L1 and L2 cache are organized according to blocks. As an example the block could contain 32 sequentially addressed instructions or 32 64-bit data words but could be organized as containing more or less sequential instructions or data words.

Putting the Memory Chain Together

The following is an example on a general purpose processor. When a transfer operation is indicated, that indicates that a copy of the data is sent and the source still would contain that same data.

Normally the processor will request instructions prior to needing those instructions. However if the instruction does not arrive back at the processor in time, a stall will occur.
During each clock cycle the processor will check to determine if it wants more instructions to be placed in the instruction pipeline (see instruction pre-fetch and branch prediction later in this document) and if instructions are desired, the processor will issue a request to the L1 instruction cache for that instruction block. If the instruction block is in the L1 instruction cache, the block of instructions will be transferred to the instruction pipeline replacing the least recently used block of instructions in the instruction pipeline and the processor will not stall.

If the instruction block cannot be found in the L1 cache, the L1 cache control will request the instruction from the L2 cache. If the block of instruction is found in the L2 cache, a block of instructions will be sent to the L1 cache which will replace the least recently used block in the L1 cache and then will send the requested block of instructions to the instruction pipeline replacing the least recently used block in the instruction pipeline. In this case, the processor can stall for up to 6-16 clocks depending on the speed of the L2 cache and whether the processor had enough instructions in the pipeline to continue executing.

Finally if the instruction cannot be found in the L2 cache, a request will be made to the memory bus to request a block of data from the main memory. The main memory will trigger a cycle to retrieve that block and return the block to the L2 cache which will replace the least recently used block in the cache, send the block to the L1 instruction cache which will replace the least recently used block in its cache, and then will send the requested block of instructions to the instruction pipeline which will replace its least recently used block. In this case a stall may occur for at for up to 300 clock cycles due to main memory speed and whether there is a bus conflict cause by the GPU, DMA, or other devices accessing memory at the same time. Even in this case, the processor may not stall due to that fact that enough instructions may have been available to execute without stalling.

If all that was designed were the memory chain capabilities and nothing else, there would still be a large number of stalls. So most processors have instruction pre-fetch, branch prediction, as well as out-of-order execution capabilities to resolve the stall problems.

Instruction Pre-Fetch

Most if not all modern day processors have instruction pre-fetch capabilities. This capability is designed into the processor to retrieve possibly needed future sequentially addressed instructions. When executing instructions, the currently translated instruction in the instruction pipeline will check to see if the sequentially next block of instructions is in the instruction pipeline. If it determines that the next instruction block is not in the instruction pipeline, it will request that block of instructions from the L1 cache. If the L1 cache has the block it will transfer the block to the instruction pipeline replacing the least recently used block or eventually it will received from the memory chain and will then transfer the block to the instruction pipeline.

This mechanism insures that instruction pipeline will always have the next sequential instruction available whenever the processor requests that instruction. As long as the processor executes instructions that are addressed sequentially, the processor will never stall waiting for an instruction.

Branch Prediction

Although code is usually executed sequentially, a large number of branches (redirection to another piece of code that is not in the sequential path) usually do occur in the code. Sometimes branches can occur on a regular basis such as function calls, loops, compares, and returns. If the processor did not support branch prediction, every branch could cause the processor to stall. Branch prediction is based on the length of the instruction pipeline (number of instructions in the pipeline) and the parallel sensing for branch instructions that are sequentially addressed in relation to the current instruction being executed. If the pipeline is very large and the complete pipeline is sensed for branch instructions, branch prediction will drastically reduce the possibility of stalls.

Like instruction pre-fetch, branch prediction requests the L1 instruction cache to retrieve a block of instructions and that will eventually replace the least recently block in the instruction pipeline. With a well designed processor with excellent branch prediction capabilities, stalls should seldom occur when branches happen.

Both the PPE core of the cell processor and the PPE cores in the xenon processors have branch prediction capabilities but were scaled down from the PowerPC processor to save on cost. I suspect that the size of the instruction pipeline was reduced, the number of instruction is a block may have been reduced, and the number of instruction being sense simultaneously was reduced. The SPEs do not have any branch prediction capabilities.

Out-Of-Order Execution

Out-of-order execution is primarily designed to increase performance by executing instructions in parallel. If there are 10 execution units (out-of-order execution) instead of 1 execution unit (in-order-execution) contained within the processor, it is possible that the performance could increase by as much as 10 fold. In reality the performance increase is much lower. Out-of-order execution allows the processor to execute several instructions in the instruction pipeline in parallel. However if the source data for an instruction is dependent on the results of another instruction and that instruction has not been completed yet, the instruction that is relying on the results will have to wait until the instruction that is producing the results is finished.

An in-order execution processor can have a stall if the instruction needs data and the data is not available in the data L1 cache. If that occurs, a stall will occur since the data will need to be retrieved from the L2 cache or main memory. Data just like instructions request data though the memory chain.

An out-of-order execution processor can reduce the length and number of stalls that are caused by the lack of data in the data L1 cache by executing other instructions while that instruction is waiting for the memory to be retrieved from one of the memories.

The out-of-order execution capabilities were stripped out of the xenon and cell processors to save cost.

Xenon Processor

Cores: 3 general-purpose cores (PPE)
L1 Cache: 32 KB of instruction cache and 32 KB of data cache per core
L2 Cache: 1 MB of cache shared between all 3 cores.
Main Memory: 512 MB of DDR3 memory shared between processor, GPU, and DMA.
Memory bandwidth: 21.6 GB/s.
Hardware Threads: 2 per core (6 total)
Out-of-order execution: No
Branch Prediction: Yes but stripped down from PowerPC.
Instruction Pre-Fetch: Yes
Instruction Size: 64 bits

The following are some performance issues for the 360.

Only 1 MB of L2 cache shared between all three cores. The minimum L2 cache desired per core is usually 512 KB. However, in 1 or 2 core usage, either 1 MB or 512KB will be available per core respectively. Also since this is designed as a game console and not a multitasking operating system, 1 MB should be sufficient since all cores could be executing the same game code and could be using the same game data so 1 MB should be sufficient. The reduction of branch prediction capabilities probably will not cause a significant performance degradation. Not having out-of-order execution will probably reduce the performance by 50%.

Cell Processor

PPE

Cores: 1 general purpose core (PPE)
L1 Cache: 32 KB of instruction cache and 32 KB of data cache
L2 Cache: 512 KB of cache
Main Memory: 256 MB of XDR memory and 256 MB of DDR3 memory shared between processor, GPU, and DMA.
Memory bandwidth: 25.6 GB/s.
Hardware Threads: 2 total
Instruction Pre-Fetch: Yes
Instruction Size: 64 bits

SPE

Cores: 7 useable specialized cores (SPE)
L1 Cache: None
L2 Cache: None
SPE Memory: 256 KB SRAM (7 total)
Memory bandwidth: 51.2 GB/s.
Hardware Threads: 1 (7 total)
Instruction Pre-Fetch: Yes
Instruction Size 128 bits

The cell PPE core should execute similarly to the xenon PPE core with the exception that the xenon has 1 MB L2 cache available for single core operation. Since the SPE does not have any L1 cache (L2 cache is not important), it is important that the SPE memory is fast. The design used SRAM so that makes memory pretty fast and appears to be 5 nanosecond memory (about the same as L2 cache). So it appears that if the instruction is not in the processors instruction pipeline, a stall of 16 clocks (maximum of 32 clocks if DMA was active to the SPE memory) could occur instead the maximum of over 300 clocks on the PPE. However, without branch prediction, stalls could occur significantly more frequently than on the PPE.

Since the compiler has the capability of projecting branches and then inserting branch prediction instruction (called a hint according to IBM) in the code, it would at first appear to be a non-issue. However, without L1 cache and probably a pretty small pipeline, the compiler could insert too many instruction block requests from memory causing more stalls than would have occurred even if the compiler option was not used. In the case of the PPE core, the worst that could happen if the same condition were to occur would be a one-cycle stall waiting for the instruction to be retrieved from the L1 instruction cache.

Conclusions

It is now clear how IBM produced each SPE core using less than ½ the number of transistors. By eliminating L2 cache, L1 cache, branch prediction, 1 hardware thread, and greatly reducing the complexity but in turn adding 256 KB of SRAM.

I suspect that if code is written for the SPE, the developer should try to minimize the use of branches if at all possible. The number of possible branches can be reduced by developing inline code and not calling functions, classes, or libraries. Also if at all possible the developer should use as few as possible while, for, if/else, do, and switch statements which will produce branch code.

It is difficult to determine how much the performance of an SPE is affected by the lack of branch prediction when normal general purpose code is executed even with the compilers ability to insert branch prediction instructions. Is it just a few percent or is it as much as 50%?

GlennRW
12-04-06, 07:08 AM
It revolves around queries sent by the PPE to the team of SPEs. The queries request information on real-time events—such as at what point a piece of shrapnel hits an object in the game. By off-loading functions to the eight SPEs, the system has more time to devote to the delectable details: more characters onscreen

The truth right?

Truth is insomniac is a first party dev and HAS to say good things about the cell. Another truth is they lied in the above statement where they said "8 spe's" where 1 spe is disabled and the other soley for the OS to run on.

I wonder how he as someone who has ACTUAL hands on experience with the cell managed to program for a disabled spe or or the reserved one for that matter.

Thats what you get for relying on on first party sony information. :)

mterzich
12-04-06, 01:16 PM
According to documents, Sony uses 1 SPE for the operating and can acquire a second SPE at a moments notice.

I've worked on the internals of many operating systems and cannot think of anything that an SPE like processor could be used for in an operating system such as a game console. Typically during a games application the operating system is used to handle the interrupts for game controls and to process network packets for online play. These functionalities do not require a great amount of procesor power and are not well suited for an SPE.

So I wondered what Sony may be using the SPEs for. The only thing that I could fiqure out is that they may not really using the SPEs for the operating system but instead may be using them for support of libraries. I wonder if it is possible that Sony discovered that it is possible that one of their libraries (such as the graphics library) can befefit from parallel processing. If that was the case, Sony could be loading a bunch of graphics routines at game startup into a SPE and whenever a graphics call is made, the call may start that SPE to assist in the processing at the same time that the PPE is processing part of the graphics call in parallel.

As far as the SPE that can be acquired ay a moments notice, I suspect that Sony currently is not using that SPE but has defined that specification for possible future needs for library support. However, it is hard to image that a great deal of libraies could benefit from such a concept.

If Sony could effectively utilize SPEs in this way, this type of implementation would be transparent to the developer but yet could significantly decrease the amount of processor power required by the PPE core.

So is Sony currently attempting to develop new librairies (for example create a set of common AI libraries that could benefit fro this concept) that could create a performance befefit?

If such libraries could be created, this could reduce game development time, eliminate complexity of game development, and at the same time significantly improve performance.

mterzich
12-04-06, 02:16 PM
And what about the bandwidth differences between 360 and PS3?
IIRC, PS3 has 2 buses, for a total of about 50 GB/s total system bandwidth (22 GB/s to the video RAM and the rest for the main memory); 360 has the embedded DRAM in the GPU which supposedly gives it 256 GB/s bandwidth for the frame buffer, but only 22 GB/s between Xenon and the main memory. It's usually reported that PS3 is severely bottlenecked in its graphics memory, but what about the 360 main bandwidth of "only" 22 GB/s?
Who "wins" in this case? 360 partisans would argue it's clearly the Microsoft machine because 22+256 > 50, but I'm having a hard time believing it's that simple.
Sorry I put off answering your question. I did that because I was more concerned at the time about understanding the effect of the SPEs and because any reply that I would make concerning this issue would be very specultive and I don't have the ability (design documents, etc.) to back up my assumptions.

This is really a difficult one so I'm not going to draw any conclusions. We know that Microsoft worked very closely with ATI in the design of the GPU just as Sony attempted to design their GPU (but Sony eventaully ended up using basically an off the shelf GPU). I guess to understand the issue we have to try to understand why Microsoft did not use an off the shelf design. I would think that you can get a better performing GPU at a lower cost if the GPU is customized to perform for your system exactly. I'm sure that Microsoft realized that a shared memory concept could have a detremental effect on performance.

I know very little about GPU architecture and if anyone does, please give a hardware overview. All I know is that a GPU performs a lot of parallel processing but I'm not sure if it performed by muliple cores, hardware, and whether it works with vector units (streaming data). It is important that we understand the concept being used by the 360s GPU since that is the only way we can try to understand the performance impact of the bus.

Why does any of that matter? If vector operations are used and the data is acquired from a shared bus, performance can possibly significantly slow down. If the processing is performed by non-vector multi-core processors, most performance degradations can be greatly reduced by using properly sized and speed L1 and L2 cache.

Then you have the issues with the cell processor. According to some documents, the operating system uses part of the GDRR3 memory. What effect will this have on that bus? Also since data is normally going to be created in the XDR memory which will be passed to the GPU, the XDR bus also may need to have few conflicts. So what happens if the game application happens to be using a large number of SPEs? Since the use of SPEs requires DMA activity on the XDR bus for loading and saving code and data the DMA activity will cause conflicts on the XDR bus.

Without enough knowledge, it is very difficult to speculate.

Jules343
12-04-06, 02:49 PM
I think MS still had a bad taste in their mouth over the whole xbox IP issue and wanted to ensure that it didn't happen again and thus sought a new GPU with ATi. Just my theory.

I'm also guessing that Sony didn't have too many options. MS working with ATi counts them out and NVIDIA was probably too far away with the G80 to use in the PS3, G80 cards showed up around the same time as the PS3. So they went with the current, at the time of PS3 development, high end NIVIDA product. Again this is my theory.

mterzich
12-04-06, 02:58 PM
I'm also guessing that Sony didn't have too many options. MS working with ATi counts them out and NVIDIA was probably too far away with the G80 to use in the PS3, G80 cards showed up around the same time as the PS3. So they went with the current, at the time of PS3 development, high end NIVIDA product. Again this is my theory.
Why should that be any different for Sony than everything else they did with the PS3? They had to have blue-ray even though it was having production problems, they had to have the cell processor even though it had yield problems, and they had to have hdmi 1.3 even though the specs weren't complete. So why couldn't they also have a GPU that wasn't yet designed?

briankmonkey
12-04-06, 03:02 PM
The truth right?

Truth is insomniac is a first party dev and HAS to say good things about the cell. Another truth is they lied in the above statement where they said "8 spe's" where 1 spe is disabled and the other soley for the OS to run on.

I wonder how he as someone who has ACTUAL hands on experience with the cell managed to program for a disabled spe or or the reserved one for that matter.

Thats what you get for relying on on first party sony information. :)

Velodyne, actually that is not true. Insomniac is a 3rd party dev, they are completely independent.

Resistance is a great game by the way, just get past the first level which is a bit bland, after that it rocks :)

FrankJ.Cone
12-04-06, 03:07 PM
Velodyne, actually that is not true. Insomniac is a 3rd party dev, they are completely independent.


Insomniac works exclusively on the Sony platforms and Sony is the publisher of all their console games. Not exactly independant in the same way EA, Activision or UBI is.

Jules343
12-04-06, 03:33 PM
Why should that be any different for Sony than everything else they did with the PS3? They had to have blue-ray even though it was having production problems, they had to have the cell processor even though it had yield problems, and they had to have hdmi 1.3 even though the specs weren't complete. So why couldn't they also have a GPU that wasn't yet designed?
Perhaps they discussed the options with NVIDIA and perhaps NVIDIA said we can't oblige your deadline or we want to focus on a product for the PC market. Maybe it just came down to dollars. Just my theory. Cell had low yeilds but it was further along than the G80, i.e. production. BR issues were tied to short supply of blue diodes, but again in production at least. HDMI 1.3 support is probably not as diffuclt to change in later design stages than something like a GPU.

Hey, I wish Sony would have used something better than the G70, but it didn't happen. Anyway, I think this discussion should stay in the realm of the technical.

mterzich
12-04-06, 06:07 PM
The is an interesting tidbit of information about the design of the 360 L2 cache.

Procedural synthesis

1 MiB L2 cache (lockable by the GPU) running at half-speed (1.6 Ghz)

For the Xbox 360, Microsoft has drawn on recent research in computer graphics to enable a new method for game programming. In traditional games, all content is statically stored and generally immutable; that is, textures, meshes, and other game content is stored on a storage medium. As complexity in each rises, the demand for storage rises as well. A newer approach to generating content is utilised for Xbox 360 titles, a method referred to by Microsoft as procedural synthesis. Procedural synthesis is an approach to generating game content via algorithms. For example, trees are one of the most complicated objects to render in a game, due to their organic complexity. A game with only one model for a tree will appear odd, as nature is far more random; the game loses some of its immersion as a result. Instead, a general recursive algorithm will generate the tree's model and textures, so that each tree looks different from the next, and do so with high efficiency. The Xbox 360's architecture was designed with this approach in mind. When running procedural synthesis algorithms, one of the Xenon CPU's cores may "lock" a portion of the 1 MB shared L2 cache. When locked, a segment of cache no longer contains any prefetched instructions or data for the CPU, but is instead used as output space for the procedural synthesis thread. The Xenos GPU can then read directly from this locked cache space and render the procedurally generated objects. The rationale behind this design is that procedurally generated game content can be streamed directly from CPU to GPU, without incurring additional latency by being stored in system RAM as an intermediary step. The downside to this approach is that when part of the L2 cache is locked, there is even less data immediately available to keep the 3 symmetric cores in the Xenon CPU running at full efficiency (1 MB of shared L2 is already a rather small amount of cache for 3 symmetric cores to share, especially considering that the Xenon CPU does not support out-of-order execution to more efficiently use available clock cycles).

Procedural synthesis is also found outside of the Xbox 360 in the advanced freeware FPS game .kkrieger, where such techniques have reduced the size of the visually stunning game to a mere 96 kilobytes. Other interesting examples of procedural synthesis are shown in various demoscene demos. The Playstation 3 also has impressive procedural synthesis capabilities, but the technical implementation differs significantly.
I wonder if this capability is currently being used by the 360 GPU. This would be one of the reasons why Microsoft would need a GPU designed specifically for the 360.

Specially designed GPU that has the ability to lock the xenon L2 cache (may be as simple as allowing the GPU to interrupt the CPU).
Specially designed GPU that has some high speed method to access the xenon L2 cache (may not require changes if design already allowed for access to a high speed bus).

IBM was required to change the design of the xenon processor that was different from the PowerPC to perform the following.

Design a new instruction to lock and unlock part of the L2 cache.
Design the L2 cache to allow part of the cache to be locked.
Design the L2 cache to work differently for the part that is locked.
Design in changes that would allow the locked part of the L2 cache to be read external to the chip.

Very clever if it works. I would not be too concerned about the performance of the xenon processor if 256KB of cached was locked. If the performance benefits of the GPU significantly outways the temporary slight slowdown of the xenon processor, the tradeoff would be worth it.

Locking 256KB of the L2 cache during single core operation of the xenon processor, the performance hit for that core should be extremely small (less than 1% under most condtions). Even if all three core were executing, my experience would indicate that the performance hit should be less than 10% under most conditions during the time that the L2 cache is locked. Of course the cache must be unloacked when it is no longer needed but if bottlenecks are caused by the GPU and not the CPU, this feature could alleviate that problem.

I suspect that the way it currently works is that the game developer would lock part of the L2 cache using a core of the xenon processor. The cache would now appear to the developer as a specific memory space and he would create code that produced procedural synthesis data in that memory space using structures or arrays in the locked L2 cache. After the procedural synthesis data is generated, the CPU would notify the GPU to process the procedural synthesis data that is located at a specific memory location. As soon as the GPU starts processing the data, the xenon core could continue executing new code in parallel. When the GPU finishes processing that data, it unlocks the cache.

To the developer, the only difference in the code would probably be that the data is stored in cache memory instead of main memory and he would need to lock part of the L2 cache to access that memory (the L2 cache would then appear as part of the normal memory space but is addressed beyond main memory). After the data is created, the GPU would be called by the xenon processor probably using the same graphics call such as ProcessProceduralSynthesisData(PointerToData). The Procedural Synthesis routine in the GPU could easily determine that the memory pointer that was passed is pointing to cache memory since the pointer will not be pointing to the main memory space so at the end of the routine it would release the locked L2 cache.

I wonder if even IBM knew why they were designing features into the xenon processor that allowed part of the L2 cache to be locked and a high speed read only external bus from that cache. Microsoft may have given IBM another reason for the needed design changes so that Sony wouldn't build that capability into the cell processor.

muzzakus
12-05-06, 02:14 AM
Ah yes, procedural synthesis. I did wonder what eventually became of that. Seems nothing as yet. Perhaps it's the old case of PC holding the 360 back. Hopefully 360 exclusives will start harnessing some of this potential.

Hey I just realised that's the same arguement as for the PS3 !

I guess the caveat here is that the 360's superior dev tools and simpler architecture may make it an actual reality.

Muz

briankmonkey
12-05-06, 02:45 AM
Ah yes, procedural synthesis. I did wonder what eventually became of that. Seems nothing as yet. Perhaps it's the old case of PC holding the 360 back. Hopefully 360 exclusives will start harnessing some of this potential.

Hey I just realised that's the same arguement as for the PS3 !

I guess the caveat here is that the 360's superior dev tools and simpler architecture may make it an actual reality.

Muz

Doesn't ES:IV use procedural synthesis (via speedtree)? I thought I read that it is used for the trees and shrubbery. Could that also be cause for some of the framerate issues in the game?

muzzakus
12-05-06, 03:08 AM
Doesn't ES:IV use procedural synthesis (via speedtree)? I thought I read that it is used for the trees and shrubbery. Could that also be cause for some of the framerate issues in the game?


Sorry, what is ES:IV ?

mterzich
12-05-06, 04:31 AM
Ah yes, procedural synthesis. I did wonder what eventually became of that. Seems nothing as yet. Perhaps it's the old case of PC holding the 360 back. Hopefully 360 exclusives will start harnessing some of this potential.

Hey I just realised that's the same arguement as for the PS3 !

I guess the caveat here is that the 360's superior dev tools and simpler architecture may make it an actual reality.

Muz
I forgot that the game developers have to support the PCs also. I suspect that both of games consoles GPUs already have support for procedural synthesis capability and if that is the case the developers would be able to use the same source code with only a couple changes that were implemented with a #ifdef directive. However, they would probably then need a totally different source module for PC support (coding similar capabilities two different ways).

FrankJ.Cone
12-05-06, 05:59 AM
Sorry, what is ES:IV ?

Oblivion

schticker
12-05-06, 12:51 PM
I don't doubt that they're smart, I just see it as different backgrounds. Sony is an entertainment company - that the way they approach the designs.

This is the heart of the issue. They're like BOSE or bad Italian restaurants: All we have to do is tell you how awesome we are, and that makes it so. MS is actually concerned about making consoles to program for, and great titles.


IMHO the PS3 was designed to be a media device, good but not revolutionary gaming performance & solid media playback.

Which at its core isn't concerned with either. It is designed to be a Trojan horse platform for Blu-Ray.

evader45
12-05-06, 03:20 PM
Doesn't ES:IV use procedural synthesis (via speedtree)? I thought I read that it is used for the trees and shrubbery. Could that also be cause for some of the framerate issues in the game?


I suspect that the framerate issues in Oblivion are due to the fact that it was basically a launch title and not directly related to the use of procedural synthesis.

Framerate issues are not uncommon in launch software as we've seen with both the Xbox 360 and the PS3.

micah3sixty
12-05-06, 03:36 PM
Been reading this thread for a while now. Just thought I would contribute that a new xbox arcade game coming out this week uses the Unreal 3 engine and is smaller than 50 MB and uses Procedural Synthesis to do so. It's called RoboBlitsz as seen in the article below.

http://news.teamxbox.com/xbox/12374/Xbox-Live-Arcade-Wednesdays-RoboBlitz/

JData
12-05-06, 05:30 PM
Nice find!

Cynic
12-05-06, 05:44 PM
More here: http://www.bit-tech.net/news/2006/10/04/Game_file_sizes_could_soon_be_70_smaller/

Red Cell
12-05-06, 07:04 PM
I forgot that the game developers have to support the PCs also. I suspect that both of games consoles GPUs already have support for procedural synthesis capability and if that is the case the developers would be able to use the same source code with only a couple changes that were implemented with a #ifdef directive. However, they would probably then need a totally different source module for PC support (coding similar capabilities two different ways).

Thats why GOW is using a single core.
Epic was/is still uncertain about porting that game over to PC's who's majority user base is single core procs.

Jetrii
12-05-06, 07:06 PM
Thats why GOW is using a single core. .

What are you talking about? At E3 epic says that their Engine was running on 2 cores. They have updated that to 3 cores. While GoW may not be using every core to it's limit, they are using 3.

Shape
12-05-06, 07:43 PM
What are you talking about? At E3 epic says that their Engine was running on 2 cores. They have updated that to 3 cores. While GoW may not be using every core to it's limit, they are using 3.

Yeah, Unreal Engine 3 is definitely multi-threaded. I've seen it mentioned often.

Even multi-threaded applications can run fine on single core processors. But multi-core is definitely preferred.

mterzich
12-07-06, 03:14 AM
This document is based on the timings of the SPEs. It may not be accurate if the information supplied by IBM is not accurate.

It appears that the posted peak floating performance of 25.6 GFLOPS per SPE on the cell processor (218 GFLOPS per processor) seem to be very exaggerated. The real effective peak floating point performance appears to be closer to the following.

The maximum effective peak performance of one SPE appears to be less than 1/4th the published performance or less than 6.4 GFLOPS.

If all cores of the cell or xenon processor were executing vector floating point operations simultaneously, the effective peak performance would probably appear to be as follows for all cores.

The maximum effective peak performance of all 6 available SPE cores for the game application on a cell processor executing vector floating operations simultaneously would appear to be probably less than 30 GFLOPS.

How could IBM published figures be so far off the actual figures? It appears that the published figures were created using a non-realistic environment. It appears that the data that was required for the vector floating point operation was pre-fetched into the processor prior to executing the vector floating point operations. Also it appears that the test may have depended heavily on FMADD instructions which count as 2 floating point operations but can use the same amount of memory accesses as one floating point operation. This test gives an indication of how fast the floating point operations can be performed but all data is confined to the processor.

The SiSoftware (Sandra) benchmark tests for the PC uses a similar concept but it confines the data to the processor and the L1 data cache. This gives a better indication of the performance of the floating point capabilities of the processor but even this test is not much of a real world test.

Other benchmark suites are usually much more accurate of the real performance since they sometimes execute code that was extracted from real applications and than only publish the times it takes to execute that code. These benchmarks tend to consider all resources in a system and not just the processor or the processor and L1 cache in isolation.

To understand why it is so important to consider all the resources in the system and not just the processor power and possibly the L1 data cache, you need to understand how the operation is performed. When performance figures are published for floating point operations, those are indication of floating point operations being performed using the vector units. If the same floating point operations were performed using the non-vector capabilities of the processor, the performance would probably be about 1/5th or less of what would be indicated when executed using the vector unit. Vector units requires high speed streaming data to be efficient. When normal (non-vector) calculations are usually performed in today’s modern system, the data is usually already available in the processor by the time the calculation needs to be performed due to the slowness of the instructions to be executed, the design of the processor, and the streaming capabilities of the memories. Since vector operations are very fast, getting the data to and from the processor at a very high speed is of great importance.As an example a normal calculation may be as such using non-vector calculations.

for(j=0; j<1000, j++)
Results[j] = ((Operand1[j] * 20) / (Operand2[I]/3.1658)) / Operand3[j];

The above will perform the same calculations on all elements of the data for Operand1, Operand2, and Operand3, and the results will be placed in elements in Results. Under most conditions after the first piece of data is retrieved for Operand1, Operand2, and Operand3 the remaining needed data will be available for the next pass due to the slowness for normal non-vector math, the streaming capabilities of the memory, and processor and memory design.

The same operation can be performed using the vector unit as follows.

Multiply each element in Operand1 by 20 and place the results back in Operand1.
Divide each element in Operand2 by 3.1658 and place the results back in Operand2.
Divide each element in Operand1 by each element of Operand2 and place the results in Results.
Divide each element of Results by each element of Operand3 and place the results in Results.

The above two examples produce similar results. The maximum speed that a processor can execute floating point operations is restricted by the speed of the bus. In the case of the cell processor, the bus speed for the SPE is 51.2 GB/s between the SPE and its SRAM, the SPEs main memory (the SPE does not have L1 cache). Therefore vector operations will retrieve one 32-bit operand (4 bytes) and save one 32 bit result (4 bytes) for a total of 8 bytes per operation for math operation against a constant or pack or unpack instructions or retrieve two 32-bit operands (8 bytes) and save one 32 bit result (4 bytes) for a total of 12 bytes per operation for a math operation of two operands.

If we divide 51.2/8 we get 6.4 GFLOPS maximum. If we divide 51.2/12 we get 4.3 GFLOPS maximum. This is higher than what can be expected since the initial data originally had to be acquired from the PPEs main memory and the results would need to be returned to the PPEs main memory at a rate of 25.6 GB/s.

In the case of the xenon processor, the vector unit primarily works with the L1 data cache. As long as the total data size (operands and results) do not exceed the size of the L1 data cache (32 KB per core), the vector operations can execute at a very high speed. The L1 cache speed should have a higher rate of transfer than the SRAM in the SPE (Smaller memory on a processor chip is usually always faster than larger memories. The reason is if a 256KB memory had the same speed as 32 KB memory, that spot on the chip would get very hot and consume a large amount of power). The L1 data cache is expected to have at least 2x or more the transfer rate of the SRAM in a SPE or about 102.4 GB/s (at 2x) or more. If the transfer rate was 2x the speed, this would produce results of 12.8 GFLOPS or 8.6 GFLOPS when the vector unit is communicating with L1 data cache depending on which floating point instruction is being used. Although the initial values for all the data need to be acquired from the main memory and the results sent back to the main memory and this operation is pretty slow (22.4 GB/s), it is still faster than doing the same operation used by SPEs. The reason is that as soon as the vector calculations start the data will start to stream to the L1 data cache and then transferred to the processor which will be able to start to execute floating point instructions so the xenon core will be executing floating point instructions in parallel with the data being streamed whereas the SPE must wait for all the data to arrive before it will start executing the floating point instructions (sophistication possibly can be implemented that can reduce that latency). It is impossible to determine the floating point performance of the xenon processor without the tranfer rate of the L1 cache.

If the total amount of data (operands and results) exceeds the L1 data cache, this can have a very detrimental performance effect on floating point operations. In that case, least recently used data will be overwritten in the L1 cache (or if the least recently used data was results data, that data will be sent immediately to the L2 cache and then overwritten) and new more time consuming accesses will be required to acquire the data from the L2 cache. Getting the data from the L2 cache can significantly slow the floating point operations making it slower than an SPE. In the worst case, data could be purged from both the L1 and L2 cache requiring the data to again be acquired from the very slow main memory. Although the main memory appears to be fairly fast, it has a built-in latency since it uses very slow memory technology at its core.

The maximum effective performance for single core operation is based on floating point instructions that caused the least amount of data transfers. The figures will be lower if floating point operations being executed used 2 data operands.

Earlier in this document it was started that performance does not necessarily go up by 6x if all 6 game console SPEs were executing vector floating point operations at the same time. Estimates in that case is an extremely rough guess but is based on the fact that the main memory bus activity increases dramatically during vector operations as more cores are added possibly reducing performance of the other cores. The figures could be higher or lower than quoted. Even attempting to determine a reasonable performance figure when multiple cores are executing vector code is almost impossible due to memory conflicts caused by the GPU, DMA, and the other cores executing.

Increasing the transfer speed of the SPE SRAM of the cell processor could possibly improve performance. However, sometime you run into a situation of diminishing returns. Increasing the speed may help the performance of one core but may not overall increase the performance if all cores are executing since the bus could have been previously saturated. As the data blocks used in floating point operations get larger, the greater the possibility of having performance issues.

It was stated in the document earlier that the PPEs main memory is very slow yet it is only about 1/2 as slow as the SPEs SRAM memory. However, the PPEs memory can be slowed dramatically due to conflicts from the GPU, DMA, other cores, etc. Also there is a large latency that occurs when initially accessing a block of main memory (typically 100 clocks or more).

The following is the Linpack benchmarks of all different companies computers. You will see all have a very low GFLOPS in a more real world situation. The N=100 benchmarks are normal non-vector benchmarks and the N=1000 benchmarks should be vector benchmarks. Peak performance is what the manufacturer publishes as expected perfromance. If N=1000 does not indicate results, that probably indicates that the tested processor does not have a vector unit.

http://performance.netlib.org/performance/html/PDSbrowse.html

Many of those systems that are published are older systems or low frequency so the GFLOPS may be very low. Look up Apple G4 (a PowerPC chip). It is only a 533 MHz system but has a performance of only 478 MFLOPS (N=1000). If we multiple 533 MHz by 6 to give it about a 3.2 GHz clock, that comes out to about 3 GFLOPS.

Also look at the IBM RS/6K Power 3 system with 16, 12, 8, 4, and 1 processors. The peak performance for a 16 processor system is 24 GFLOPS but the real performance indicates about 1/3rd that amount (about .5 GFLOPS per processor). It is probably an issue with the bus since as the number of processors get less, the ratio of real performance as compared to peak performance becomes less until a 1 processor system comes close to peak performance (1.2 GFLOPS).

You'll notice that most of the benchmarks (N=1000) do not indicate anything more than 1x the frequency of the processor although new processors such as the Intel Pentium 4 show performance in multiples of the clock speed. This is because to go above the 1x figure, the processor would normally require either parallel vector units or either double, quadruple, or more stages in the vector units. Apparently one of those two concepts are designed into the cores otherwise IBM could make those claims.

http://www.realworldtech.com/page.cfm?ArticleID=RWT072405191325&p=2

The following is a link to the Intel Xeon processor used in servers. Look at the differences between the first two processors.

http://support.intel.com/performance/server/xeon/hpcapp.htm

Processor1:

2 Processors with 1 core each
3.60 GHz
L2 cache - 2 MB
800 MHz bus - 12.5 GB/s
6.5 GFLOPS per core

Processor 2:

Dual Core
3.00 GHz
L2 cache - 4 MB
1333 GHz bus x 2 - 21 GB/s per bus 42 GB/s total
19 GFLOPS per core

A little more than three times the bus speed produced 3 times the GFLOPS even with a lower processor frequency. Is the differences due to the bus speed or due to the vector units and L1 cache speed between the 2 single processors and the dual core processor? I suspect it is due to differences between vector units and L1 cache speed. To achieve those type of results with the dual core processor, it appears that Intel put more than 6 vector units or more than 6x the number of stages in the vector unit in each core. Also the L1 data cache in the dual core processor would need to be about 2x as fast as what I projected in the xenon processor

It appears that IBM may not have been able to increase the SPEs SRAM memory access speed beyond its curent speed due to its comparatively large size on the processor chip so they created a bus of 51.2 GB/s to match the memory. Although that is a pretty fast memory, it is relatively slow compared to the transfer rate of most L1 data cache memories. Without the closest memory to the core not being fast enough, decent GFLOPS rate will suffer. So why did IBM build vector units into the SPEs that rival Intels top Xeon servers in GFLOPS performance but really cannot be used any better than a much more basic vector unit? Was it done that way for the sole purpose of marketing? Is there a mistake or something left out of the IBM documentation? Is there really L1 cache in the SPE but not documented.

http://www-128.ibm.com/developerworks/power/library/pa-cellperf/

This document could be inaccurate if the IBM documentation is inaccurate. If there is L1 cache in the SPE, that could change the calculations completely.

Edit: Removed references to xenon processor performance since that was pure specuation since the transfer rate for the L1 cache is not stated in IBM documentation.

muzzakus
12-07-06, 03:55 AM
mterzich,

You really need to publish your findings on a site somewhere. Maybe submit them to some web publication even?

All this excellent highly detailed work will eventually get buried under countless threads on VGA and Gears of War, or whatever the flavour of the month is.

Make it timeless !

Muz

JData
12-07-06, 10:16 AM
Thank you for your work and the information mterzich

Bailey151
12-07-06, 11:25 AM
Thank you for your work and the information mterzich
Amen! Thanks much, I've certainly learned a lot.

mterzich
12-07-06, 01:50 PM
Removed references in the previous document to the xenon processors performance since that was purely speculative since the tranfer rate of the L1 cache is not defined in IBM documentation. Depending on that rate, the xenon processor could perform close to documented or perform poorly. Too much speculation of that transfer rate could only cause confusion.

Although performance references to the xenon processor have been removed, it would be suprising if the xenon processor could obtain the IBM documented performance since those are figures that a high performance expensive processors would envy. But who knows.

Also placed warning messages in that document that indicate that the results are based on the timings given in IBM documentation. If the published IBM documentation is incorrect or missing important information, the analysis would be incorrect.

There are also many updates to the document so if you have previously read that document, please reread that document to get those changes.

ArchieGates
12-07-06, 03:52 PM
Round one has clearly gone the 360's way:

http://www.gamespot.com/features/6162742/index.html?tag=topslot;title;1&om_act=convert&click=topslot

Nearly every game looks better on the 360 so far. It makes a lot of sense given the findings in the original post here. When it comes to practical, real-world horsepower, the 360 is the clear winner. The Cell processor just isn't going to enable any advantage in games because the SPE's aren't as useful as the Xenon's 3 symmetrical cores.

Jetrii
12-07-06, 04:14 PM
mterzich, I know it may be a lot to ask, but could you possibly write a small summary of your findings? I've been keeping up with this thread and while I understand most of the technical information, I feel like I've missed some important things. I understand if you're sick of this topic :)

mterzich
12-07-06, 06:42 PM
I know it may be a lot to ask, but could you possibly write a small summary of your findings? I've been keeping up with this thread and while I understand most of the technical information, I feel like I've missed some important things. I understand if you're sick of this topic :)
The following are my basic conclusions based on public IBM documents and any other documents that are available on the internet. If there were differences between the documents, I tried to evaluate which was correct and would nomally choose an IBM document if available. If the documents are incorrect or missing critical information, some of my conclusions may be wrong.

If all the processor power of the cores of the cell processor can be efficiently harnessed, the cell should produce overall more processing power than the xenon processor.
The xenon processors 3 PPE cores are easier to program than the cell processors SPEs.
If a game application is written according to the basic concepts of game design, the 360 should be able to harness some of the power of additional cores easily with little effort.
Even on the 360, it will be difficult to harness close to 100% of the processor power available in all three cores.
SPEs do not have branch prediction capabilities. Compiler options could include branch hint instructions to try to solve the potenital performance problems but this does not completely solve the problems. SPEs can be used for general purpose code but may not perform as well as desired due to the lack of branch prediction capability.
Both the cell and the xenon processors are stripped down versions of the PowerPC. A PPE core on either console will run at less than 1/2 the speed of a PowerPC of the same clock frequency.
The xenon processor is designed fairly close to a standard general purpose multi-core processor whereas the cell is designed as a specialized purpose multi-core processor.
The biggest issue with cell processor as far as game development is the small memory size (256 KB) in the SPEs.
Other issues that make it difficult to program the SPEs is the lack of direct access to the PPEs main memory and the required use of DMA to transfer data.
The required use of DMA to transfer data between the SPE and the PPEs main memory can have a performance impact on the processor.
The GPU appears to be a better/faster design on the 360. However, other things need to be taken into consideration such as shared memory bus, DMA activity, the effect of multi-core processing, etc. to determine which performs best. It is very difficult to come to a conclusion in that area.
A processor can perform very well using floating point operations but must use the vector capabilities to achieve published GFLOPS. However most code cannot use the vector capabilities and therefore must use non-vector floating point capabilities greatly reducing the maximum GFLOPS. Even when vector operations can be performed, the code must be reorganized to use those capabilities.
Vector operations are very memory intensive and therefore a good memory/bus design is very important so that data is available at the vector unit as needed to get the maximum GFLOPS possible.
SPEs are most efficient if physics code that can be segmented for SPE use and can run in parallel. However, all physics code is not always useable in an SPE. For example, if you only needed to know the current position of an object, that would probably not be very useable but if you needed to know the complete trajectory (all points) of an object, that would probably be very useful code for a SPE.
The developer needs to find code that will execute for a reasonable time for the SPE. If that time is too short, it may be counter productive transferring data and code to the SPE, starting the SPE, and when execution is completed, transfer the results to the PPEs memory.
The PPE cores in the xenon and cell processor had the out-of-order (parallel execution) execution stripped out so they are in-order execution. This is the most significant degrade of the PPE cores.
Branch prediction has been slightly reduced for PPE cores in both processors.
The xenon processor and 360 GPU has a feature implemented to allow for high speed transfers through the L2 cache to the GPU for use with procedural synthesis data. It should be easily implementable by the developer and should increase performance of the graphics capabilities.
SPE performance does not appear to generate GFLOPS anywhere near the published specifications. This is based on the bus speed between the SRAM and the SPEs execution unit. This is the conclusion that bothers me the most. Am I correct or is the public IBM documents in error or possibly does not contain enough information to be able to draw that conclusion?
Xenon performance as far as GFLOPS is concerned cannot be accurately predicted due to lack of information relating to the L1 caches bus speed.

Jetrii
12-07-06, 07:02 PM
Thanks mterzich. Great work :)

Cynic
12-07-06, 08:21 PM
Indeed, thank you very much for all the information you've provided in this thread.

talbain
12-07-06, 11:24 PM
yeah, all i can say is wow this thread has turned out great. i really have to give a hand to everyone here. this thread proves that there can be a civilized discussion of the two systems without resorting to nonsense. outstanding job to everyone, particularly mterzich!

Artmic
12-07-06, 11:36 PM
Very good read indeed !

JData
12-08-06, 01:26 PM
Hey mterzich,

Have you visited Microsoft's XNA website for the 360 developers? They have tracks there from the Game Developer's conference that depicts programming, hardware, graphics and such. There's a wealth of information found there.

Here's a link:

http://msdn.microsoft.com/directx/presentations/

Chubzilla06
12-09-06, 12:44 AM
how true is this about the 360 cpu

the 3 core CPU which is a flawed design in and of itself runs at 3 GHz but the Northbridge limits it to half that. More fluff from the PR folks.

dukmahsik
12-09-06, 09:50 AM
how true is this about the 360 cpu

the 3 core CPU which is a flawed design in and of itself runs at 3 GHz but the Northbridge limits it to half that. More fluff from the PR folks.

did that come from Sony? lol

Edit: Look, here's the deal. BOTH systems are really really on par with each other and in some areas one console will shine a BIT over the other. It's been said by many many developers. You guys can argue specs back and forth until death but the truth of the pudding is in the games. So far, most of the multiplatform games have shown better on the 360 for multiple reasons. Better dev environment, 360 lead platform, 360 is more flexible overall and has more available memory, etc etc. Even Madden 07, developed individually by two different teams around the same time yields a better game on the 360. FNR3 is better on the 360 even with 6-8 more months of dev time. That is the truth, the reviews are out there, google it.

Here is ANOTHER dev speaking about both systems:

The X360 GPU is slightly more flexible due to the subtle differences between the respective architectures of the nVidia and ATI chipset. Certain shader restrictions are lifted or easier to work around on X360. The bandwidth of the X360 GPU is pretty massive too, though partly this is balanced out by the differences in VRAM layout of the two systems that means the Xbox needs to move graphics data around the system more than the PS3 does. While the GPUs are fast and have a big impact on the visual quality and level of game effects, other factors such as general processing power and memory architecture also come into play.

http://blogs.guardian.co.uk/games/archives/2006/12/07/ps3_vs_xbox_360_a_developer_speaks.html

mterzich
12-10-06, 06:08 PM
Have you visited Microsoft's XNA website for the 360 developers? They have tracks there from the Game Developer's conference that depicts programming, hardware, graphics and such. There's a wealth of information found there.

Here's a link:

http://msdn.microsoft.com/directx/presentations/
The following are my opinions relating to the development platforms.

It appears that Microsoft is pulling far into the lead over Sony by providing superior software development tools to use in the development of game applications.

Microsoft has an integrated development environment called Visual Studio that has been under development for over 10 years. Over the years Visual Studio has been refined by adding support for the development of client/server applications (both web as well as intra network applications), adding support to debug several applications simultanously, adding support for new languages such as C#, Visual J+, and Visual Basic, adding support to allow breakpoints to be set prior to a application being loaded into memory, refined the support for multiprocessor/multi-core capabilities, as well as many other features. Microsoft is currently using Visual Studio for development of games.

Because of the standard 3-core processor design in the Xbox 360, game developers can develop, debug, and test the game under development using the Windows operating system. The main requirement for the development system is that the processor on the development system has enough processor cores that the developer desires for the game. The Intel Core Quad processor with 4 cores should be sufficient to develop any game application destined for the 360.

It was probably quite easy for Microsoft to add support to Visual Studio to support development the 360 game applications. C and C++ are the languages that are primarily used by game developers due to its relatively easy porting capabilities between different game consoles. Since Visual Studio already supported those languages, Microsoft only had to provide libraries (graphics and other game related libraries) and options for the C and C++ compilers to produce PowerPC native code.

Sony on the other hand, due to its non-standard design, would almost certainly require developers to develop game applications directly on the PS3. I suspect that is the main reason that the PS3 supports the Linux operating system and large external hard drives. I suspect that Linux is used to develop game applications on the PS3. Then Sony would have needed to develop a development platform that could be used for both PPEs and SPEs simultaneously.

Now that Microsoft has delivered a very good development platform to the game developers, Microsoft has embarked on providing even a better platform and from Microsoft’s point of view could possibly lock developers into its game console. Microsoft currently has the following development platforms under development.

XNA Game Studio Express

Available to students and hobbyists for game development (in beta but available)
Free – anyone can download the development platform
Currently only supports the C# language
Currently the developer cannot sell or distribute the application

XNA Game Studio Professional

Currently under development
Available to anyone including students, hobbyists, and small independent game developers
Games can be certified for distribution and sale
Will probably support the C# language

XNA Game Studio

Currently available to developers of Xbox 360 and/or Windows games
Games can be certified for distribution and sale
Currently supports C, C++, and C#

It appears that the primary reason that the 3 platforms are currently under development is to fully support the C# classes. Most of the classes that will be supported would be similar to the graphics and game control libraries that are currently supported by C and C++. More sophisticated classes for those libraries as well as other capabilities would also probably be implemented. C# is a very powerful language and if the classes are implemented correctly with a good deal of sophistication and usability and provide the capability that the developer needs, development time can be reduced significantly, less debugging will be required, fewer bugs will be in the released applications, and code readability is improved as compared to developing the same application using C or C++ using the libraries. However, C# is a propriety language for Microsoft. Microsoft could easily also provide the Visual J+ (java) and Visual Basic languages since they can also use the C# classes for developers preferring a java or basic like language. However both of those languages are also propriety languages for Microsoft.

It seems that Microsoft is attempting to get developers to use C# for development so that game development will be locked into the Microsoft operating systems. This strategy may work (especially for exclusive games and/or multi-core capabilities) due to the greatly reduced development and debugging time and the inability to easily port multi-core applications to the PS3 even if it was written using C or C++. However, if in the future, Sony delivers a more standard architecture game console, porting 360 game applications to that console that were written using C# would be very difficult.

If the games are developed using a single core and is developed with C or C++, this allows for easy cross platform ports since both Microsoft and Sony both support C and C++ for game development. If the libraries provided by both Microsoft and Sony were exactly the same, the porting of a single core application would normally be very easy requiring only a recompile. In this case only quality assurance tests would need to be performed.

However, the libraries provided by the two companies are probably not the same and at a minimum have different routine names and have the data passed differently. Also one company may support libraries that other company does not support requiring the programmer to develop code that is not included in the libraries for the console. However, this is still usually a fairly simple port and actually gives a good indication of how the two consoles compare to each other for single core and GPU performance.

Usually for single core applications, the same source can be used for both game consoles by just using #ifdef directives in the code which will compile code differently depending on what is defined.
Example:

#ifdef PS3
- call PS3 graphics library
#else
- call 360 graphics library
#endif
For single core game applications it may be beneficial to initially develop the application using C or C++ on which ever console has the best development environment plus libraries and then port to the other game console. In this way the developers can get the application working quickly with the least amount of manpower required to develop and debug the code. This would also possibly allow whichever company has the best development tools to have their application to market first.

I suspect that the biggest issue for Sony with the development environment is the support for debugging SPEs. Sony will have to get the debugging environment working for one PPE and 6 SPEs simultaneously while all are executing the game application. This is complex since SPEs have a very limited amount of memory available and the code is loaded on the fly. This limited memory of the SPE would not allow for a very large debugging monitor in the SPE memory and setting breakpoints for SPEs prior to the code being loaded is a complex operation. Due to the complexity of the implementation, I suspect that limited debugging capabilities of SPEs are currently available (possibly the developer has to wait until the SPE memory is loaded prior to setting a breakpoint).

The development of applications for SPEs are complicated enough without having problems with the development environment. Without a very good development environment, it is possible that some developers would not even attempt the develop code for the SPEs.

Due to the fact that many libraries (especially graphics libraries) are generally quite large in size and require a large amount of data to be sent and returned, use of those types of libraries may not be possible for SPEs.

I’ve heard comments stating that the PS2 was difficult to develop a games application due to its design. I expect that it had nothing to do with games console design (it was a pretty standard design) but instead there was probably a poor development environment and a limited set of libraries provided by Sony.

Languages

What is a high-level computer language? A high-level computer language is any computer like language that will relate how a computer can execute operations but at the same time the developer can understand the language. A high level language is not restricted in any way by the architecture or the low level language of the processor other than the processor must have some way of performing what is requested by the language. A language just tells the computer what to do. The processor does not understand high-level languages but only assembly language (micro code or processor instructions) so a compiler will turn the high level language into the assembly language that is understood by the processor. The compiler can produce a few or hundreds of assembly language instructions for every high level statement but a good compiler will produce very efficient fast and compact assembly language code.

C Language

Advantages:

Very fast
Small native code
Require only the native code and included libraries to execute
Very flexible languages allowing the developer can do anything he wishes.
Easy to start developing code

Disadvantages:

Can accidentally destroy data in own program or other developers’ code
Can produce intermittent or difficult code to debug
Limited language but can still produce the same capabilities as any other language


C++ Language

Similar advantages and disadvantages of C but supports class capabilities.

C# Language
Advantages:

Uses managed code that normally will limit the possibility of serious bugs occurring
Very powerful language if classes are supplied
Fast to develop and debug an application
Automatically performs tasks such as releasing memory when it is no longer needed
Code is easy to follow
Small pseudo code is distributed saying disc space

Disadvantages:

Propriety language
Probably slower than C or C++
Just In Time (JIT) compiling is performed when running an application
JIT compiling may delay application from starting for a few seconds
When highly unusual code is required, can be difficult to implement
Requires runtime monitor, JIT compiler, and classes installed in system or delivered with application


Visual J+ and Visual Basic Languages

Same advantages and disadvantages as C#
A programmer may prefer one of these languages if he is familiar with java or basic

Libraries

Libraries are routines that are either developed by the game console manufacturer and/or by the developers of the game application. The libraries that are developed by the manufacturer are libraries that most game applications will use. Examples of those libraries are graphics libraries, network libraries, game console control libraries, sound libraries, etc. The game developer may develop other libraries that produce functionality that could possibly be used by several games under development (a library may be produced that use many routines of the manufacturer supplied graphics library to perform some functionality that can be used by several game applications instead of repeating several hundred lines of code in each game application.). Libraries are easy to use in that developer just calls the needed routine passing data to the routine and the routine will return some other data.

Classes

Classes are easy to use capabilities that perform specific capabilities. Classes are usually implemented in such a way that the class appears to be just another statement of the language that is being used. Therefore classes allows that language to be expanded to fit the needs of the developer. Classes are generally much easier, more understandable, and have more capabilities than calling libraries.

spwolf
12-10-06, 07:33 PM
Are you saying that there is currently code available that is doing that or is the statement just based on processing power requirements? I assumed that the PS3 currently uses hardware decoders. If you think differently I would like to know. Also I would be interested if anyone has ever produced a decoder that uses multiple SPEs because it would be very interesting to determine how they accomplished it even on a therory basis due to the memory limitations of the SPEs.

I was using 7 SPE just to point out possible complexities since I not even sure it is possible with the limited memory in the SPEs. I could have used 3 SPEs in the example with more passes per frame. So the point is, is it possible with the limited memory available in SPEs to segement everything to such extremes or are we only talking about processor power requirements?

I assume that MPEG frames have dependencies both in the frame and between frames such as GOPs. Since I have never developed a MPEG decoding program, I'm not sure how tight those dependencies are within a frame or even between frames. If the dependencies are very tight and stretched over the complete frame, it may not be possible or at least extremely difficult to segment the frame into many pieces to be handled by different cores in parallel. I'm just not sure. However it may be possible to have different cores work on different frames in parallel which would allow the xenon processor to have each core working on a different frame in parallel since there is plenty of memory. I'm not sure exactly how they would perform the operation on the Xenon processor but I'm assuming the each core get its own frame. If there are dependencies between frames, it may be possible to communicate between PPE cores using common memory.

I'm also not sure that a segemented output buffer (part of a raw bit map) can also be used. Since the information in a MPEG frame is not sequential from left to right and top to bottom, it seems that only part of bit map ouput buffer could not be used but instead some type of intermediate code would need to be generated that would have to be decoded and merged with the other produced buffers from all passes for all SPEs for that frame by the PPE.

Is it possible that the decoder application would perform the decoding in a sequential fashion? A sequential fashion is where the PPE feeds chunks of MPEG data to the first SPE which does a partial calculation on the chunk of the MPEG data and then passes the output to a second SPE which does more calculations on those results, and passes those results to the third SPE which will perform the final calculations and will pass final results to back to the PPE which will produce the bit map. In this way the MPEG data is processed sequentially, only small chuncks of data are in any SPE memory at any time, everything is done sequentially yet all SPEs are executing at the same time once the stream is moving, and no bit map is required in any of the SPEs.
Yes, BD support in PS3 is completly software based, and they indeed can decode 2 full 40 mbs streams, by using "only" 3 cores. Which is impossible for xbox 360.

you seem to be throwing a lot of info around, without actually knowing a lot about cell and how it works...

It is true that PS3 costs more because they wanted to make it entertainment center, and there is an good reason for that - I saw Jerry C posting how Playstation division is only Sony division making money, that is actually not true... it is one of the few that are known to loose money. By pushing BD format via PS3, they are making sure people spend more money of PS3 software - both games and movies.

Shape
12-10-06, 07:43 PM
Yes, BD support in PS3 is completly software based, and they indeed can decode 2 full 40 mbs streams, by using "only" 3 cores. Which is impossible for xbox 360.


They could use the Xenon GPU to help decode the video on the XBox (just like is done on PCs). This is rather complex to code, however. It is definitely easier to just use the processor. A GPU, especially one with unified shaders is really just a "stream processor."

I'm not sure why it is desirable to decode 2 40mbps streams, however. I would think that one would be plenty. :)

Hammer65
12-11-06, 11:01 AM
C# is a fine language for business apps but not really for video games. It's an interpreted language and you do not have the fine control over memory allocation/release that you need in a video game. Even more so in a console. That's why c/c++ are now a separate product and not part of Visual Studio.

micah3sixty
12-11-06, 11:26 AM
Here's some developer talk done as a Q&A by gamasutra.com in regards to developing for the PS3 vs the xbox 360. Seems to confirm what is mostly discussed here, that being the PS3 takes more effort to get the same performance.

http://www.gamasutra.com/php-bin/news_index.php?story=12058

JData
12-11-06, 01:05 PM
C# is a fine language for business apps but not really for video games. It's an interpreted language and you do not have the fine control over memory allocation/release that you need in a video game. Even more so in a console. That's why c/c++ are now a separate product and not part of Visual Studio.



.... So what language do you think the gamedevs to program in? All the game developers that I know work mostly in C.

Hammer65
12-11-06, 01:18 PM
C or C++. It's C#/Java that are not well suited to games. And I should say games that push the hardware. You most certainly can program games in C# or VB for that matter.

JData
12-11-06, 02:46 PM
Push hardware?

Have you loaded up MS' FSX or tried some games that pushed the PC hardware? They are out there. LOMAC during its early released pushed the hardware. It took at least 2-3 cycles for the video cards to catch up to it. Then the developers started optimizing their code.

Just odd. I am not implying C is the best language but it seems that there are games that push the hardware. In the simulation genre, it's the software that pushes the hardware since the developers design ahead for several cycles.

I have experienced an abundance of games that 'crawled' when the hardware wasn't up to snuff.

I guess all my current and past game developers that I know don't know what they are doing??????

Not trying to disrespect or flame you but your statements statement threw me back. Maybe I am just misinterpreting what you are saying.

Again so what language do you prefer?

mterzich
12-11-06, 02:57 PM
C# is a fine language for business apps but not really for video games. It's an interpreted language and you do not have the fine control over memory allocation/release that you need in a video game. Even more so in a console. That's why c/c++ are now a separate product and not part of Visual Studio.
C# is not an interpreted language. A pseduo code is distributed but the JIT compiler does produce native code. As far as the allocation/release of memory resources, C# is not much different than C or C++ other than it automatically releases memory for local variables on a return from a function or class (which couldn't be used by any langauge anyway once the routine is exited). Global memory is managed the same. The biggest difference is that it is difficult to setup a memory pointer in C# that points to an area of memory that is not in the applications workspace or exceeds the bounds of an object. In the rare instance that this is desired, C# can call a C or C++ routine to perform that operation.
It's C#/Java that are not well suited to games. And I should say games that push the hardware. You most certainly can program games in C# or VB for that matter.
Although I overall tend to agree, I also thought the same when C started to be used as the development languages for operating systems instead of assembly language. At that time, some people stated that C could not produce the speed of an assembly language but due to its structure and ease of programming but the differences ended up being negible.

Due to its managed code, C# will be slightly slower than C or C++. However, due to the power of the C# language and the time it takes to develop and debug code, the advantages of using C# may outway the performance advantages and flexibility of using C or C++. Also if a few percent of processor power is lost, that processor power may be able to be easily made up by using the resources of the other cores. If the development time is reduced significantly, programmers can develop the other cores to get better performance with less manpower resources.

I suspect that the major complaints for using C# is the fact that it is a properiety language, a delay caused by the JIT compiler the first time that the application is run, concern over performance, and the requirement that the runtime monitor, JIT compiler, and classes are required to be on the system.

I'm still on the fence as to whether a significant number of developers will adopt a language such as C# for game development in the near future.

ihyln
12-11-06, 03:31 PM
There's too many "I assume", and "I suspect" in the OP's post.

No doubt the 360 and PS3 are close, but I don't think the PS3 is as hard programming as some people would like others to believe.

Reading blogs and interviews with people who's actually programmed the 2 CPU's (not just read about them), lots agree that the Cell isn't easy, true, but it's far from impossible.

Lots of programmers see it as a challenge, and compares programming the Cell with the PS2 (actually, some think the Cell is easier to grasp than the PS2).

Sony's dropped the ball somewhat by not giving devs the tools they need, but when this gets fixed, I believe that the PS3 will put some distance to the 360. It's a close race, but I think that the PS3 do have an edge to the 360.

Comparing ports of games won't be fair, as the system it's originally developed on almost always will look better than the system onto which a title is ported.

And yes - I'm a Sony fan(boy), but I do respect the 360 as an awesome platform. There's just nothing I've seen on the 360 so far that makes me go "OOOH!". Sony's always delivered, and I believe they'll continue to do so.

I have a close friend who is working on a title that's due in march (you can guess) and he worked on both the PS3 and 360. He tells me that the 360 devkit is by far the easiest and most robust to program for. As for the PS3 he said it reminded him of the nightmares of the PS2 devkit when it was released. Sony really dropped the ball on this one and it shows. He said flat out "there's a reason why GoW looks so good compared to any PS3 launch title"

Hammer65
12-11-06, 03:32 PM
I stand corrected on the JIT nature of C#. I was thinking it required a VM type process that executed/interpreted the CIL.

You are correct that local variables are allocated on the stack and released when the routine returns. Global memory or heap is specifically allocated and released (Good ole memory leaks). In C#/Java memory is not truely released until garbage collection is run and you don't have control over that process. You can give hints, but you don't control when it happens. This is a good thing for most applications since the programmer is no longer required to manage memory which is huge source of bugs.

mterzich
12-11-06, 04:17 PM
Actually it is possible for the developer to force garbage collection in C#.

The garbage collection GC class provides the GC.Collect method, which you can use to give your application some direct control over the garbage collector. In general, you should avoid calling any of the collect methods and allow the garbage collector to run independently. In most cases, the garbage collector is better at determining the best time to perform a collection. In certain rare situations, however, forcing a collection might improve your application's performance. It might be appropriate to use the GC.Collect method in a situation where there is a significant reduction in the amount of memory being used at a defined point in your application's code. For example, an application might use a document that references a significant number of unmanaged resources. When your application closes the document, you know definitively that the resources the document has been using are no longer needed. For performance reasons, it makes sense to release them all at once. For more information, see the GC.Collect Method.

Before the garbage collector performs a collection, it suspends all currently executing threads. This can become a performance issue if you call GC.Collect more often than is necessary. You should also be careful not to place code that calls GC.Collect at a point in your program where users could call it frequently. This would defeat the optimizing engine in the garbage collector, which determines the best time to run a garbage collection.
Also large blocks are handled slightly different than small blocks to improve performance.

There is one more performance improvement that you might want to be aware of. Large objects (those that are 20,000 bytes or larger) are allocated from a special large object heap. Objects in this heap are finalized and freed just like the small objects I've been talking about. However, large objects are never compacted because shifting 20,000-byte blocks of memory down in the heap would waste too much CPU time.

mterzich
12-11-06, 06:50 PM
Someone created some benchmarks to compare Java, C#, and C#. The benchmarks were run over a year ago so the results may not be accurate due to possible enhancements of C#. The following is a link to those benchmarks.

http://www.tommti-systems.de/go.html?http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html

These are the results using the last (3rd) graph.

C# is slightly faster than C++ performing these functions

Double Math (2%)
Trig Math (14%)

C# is significantly faster than C++ performing these functions

Hash Map (140%)
String Concatenation (187%)

C++ is slightly faster than C# performing these functions

Integer Math (3%)
I/O Operations (7%)
Hash Maps (2%)
Heap Sort (17%)

C++ is significantly faster than C# performing these functions

Long Math (43%)
Arrays (79%)
Exceptions (523%)
List (534%)
Matrix Multiply (515%)
Nested Loops (450%)

On the surface it appears that C# would not perform well enough for game development. However if you look at the Java (which is similar to C#) benchmarks using Server VM, you will notice that the performance significantly improves in some areas that C# did not perform well. I'm not exactly sure of the differences between the Server VM and the C# runtime monitor but I suspect that Microsoft could do something similar to get the performance improved.

In the areas of matrix multiply, nested loops, and arrays, I suspect the poor performance is caused by the managed code in that each element is tested to make sure that it is in the array bounds. Microsoft could produce a compile switch that eliminates the managed code for the production release (greatly imporving performance) but if the array is accessed outside the bounds, debugging may be much more difficult.

I suspect that if Microsoft completely removed the managed code capability for final distribution of the game application, there probably wouldn't be any significant difference in performance between C++ and C#. However, removing the managed capability goes against the concept of that type of language since it becomes very difficult to debug after distribution (same as C++).

Until we know what Microsoft is doing to improve the performance of C#, it would be difficult to determine if the C# language would be suitable for game development. I suspect that Microsoft is currently working very hard to improve the performance.