To the outside observer, improvements in PC architecture are evolutionary but logical. Processors advance inevitably in speed and performance, in happy accordance with Moore's Law. For Nebojsa Novakovic, a consultant in high-end computing systems, that's hardly the case. The demise of the DEC Alpha processor is a case in point. A performance leader was killed off by corporate whim.
Novakovic argues that despite the turn by AMD to a 64-bit platform, vindicated by the success of the Athlon 64, AMD had better watch out. Intel will be fighting back in 2006 with Merom, Conroe and Woodcrest, as processor design adopts a multicore approach. Even so, life won't be a bed of roses for Intel either. Rumors abound in the industry of the "retirement" of the Intel Itanium, an expensive attempt to break away from the x86 architecture at the high end, and Intel sorely lacks an interconnect that could compete with AMD's HyperTransport.
Novakovic comments on the major technology issues, in this in-depth interview, as Intel and AMD square off for the next rounds of the processor wars.
This is Part IV of a five-part interview. Part I appeared on 23 January, Part II on 24 January, and Part III on 25 January. Part V will follow on 27 January.
Q: Clearly what we are now seeing, from both Intel and AMD, is CPU design moving to dual- and multicore solutions. Could this approach drive dramatic processor performance gains? Earlier you pointed out that processor performance, over the past couple of years, has not been keeping up with Moore's Law.
A: I would agree that a multicore approach helps to continue Moore's Law without creating massive thermal problems, but the reality is that the software also has to be moved forward if it is to take advantage of the multicore approach. After all, what is multicore processing? It is symmetric multi-processing (SMP) on a chip, essentially. It's been known for quite a few years now, since SMP systems were first devised for commercial applications, that the real issue becomes the efficiency of the software when it runs on multicores.
The question, "How do you get the software to run efficiently on so many CPUs?" quickly becomes, "How do your thread your job?" essentially. The lessons learned on server platforms could now find application on the desktop because server applications have been multithreaded for quite a while. Games and multimedia applications would benefit from this approach, as also would MS Office. After all, every time CPUs are sped up, Microsoft simply creates new, bloated versions of the Office software that then slow down a PC several times. If the CPU becomes twice as fast, Office is four times as bloated. It's an exponential dependency, it seems to me.
The point is that multithreading will be necessary, to extract any real performance advantage from multicore designs, and software vendors should take a careful look at the experiences of their peers in high-performance computing (HPC) and similar areas. Past experience in running software on tens and hundreds of CPUs could well supply a basis for tighter integration between the software and multicore single-chip systems. If that is done well, the multicore approach could well offer many advantages.
One advantage would be having one core to handle system security, given the vulnerabilities that plague the Windows OS and the x86 architecture, and the multiple antivirus, anti-spyware, anti-malware programs that you now need to run on the system, all of which occupy CPU time. Having a multicore CPU would allow one of the cores to take care of these tasks, while the other cores allow you to actually use the application you hoped to use.
If your PC is already capable of that kind of multitasking, then a multicore CPU would already be suitable for your system, whether Intel or AMD. But the best approach would be a hybrid one, where a single application could be optimized on multiple cores, while multiple applications could be run simultaneously – in other words, a mix of capabilities.
An interesting point, here, is that as you know, we aren't going to see any more support for hyperthreading on Intel chips. I'm not completely against hyperthreading, and multithreading within a single core was always present on IBM chips, also in POWER5 and Alpha EV8, and of course in Sun's UltraSPARC T1, the Niagara, where you have eight cores, each supporting four threads. Multithreading, if executed correctly within a single core, is not such a bad idea, and it does not mean the core will become unnecessarily complicated. Now you have applications such as Google Search, where you don't have much CPU utilization per processor bus, but where you do need a fast response time within each processor bus for fast switching. If you have that, then a multithreaded approach makes very good sense.
I don't exactly like Intel's version of hyperthreading, but it would have been better to have retained some form of evolved, more efficient speculative multithreading in the upcoming Conroe and Woodcrest cores.
Q: Is it still too early for us to know the actual performance specifications of Merom, Conroe and Woodcrest, and what they will be capable of per core?
A: Intel has made the architecture publicly available, but we still don't have any official clock rates for this year. Nevertheless, I think a reasonably optimistic estimate would be around 3.3GHz on the desktop and 2.6GHz on the mobile side. I believe Intel will try to push the clock as high as they can. They won't be able to push the speed to 4.0GHz until 2007 at least, but the Conroe core will be able to run at least 50% faster per clock than the Pentium 4, with half the power consumption – and, of course, with good liquid cooling, you might just be able to hit 4.0GHz on Conroe XE top parts. This move should have happened much earlier.
An interesting point here is that the new core builds on the success of Intel's Israeli design team with the Pentium M. Merom is an enhanced 64-bit Pentium M, and the Pentium M is basically an improved Pentium III, so Merom, the new mobile core, adopts a back-to-the-future approach, and in this case I think that is a smart move. I mean, when I tried running the Pentium M it had a speed of 2.13GHz, and with the FSB at 533MHz it was faster, for most desktop applications, than a Prescott-core Pentium 4 running at 3.6GHz. I have to give credit to the Israeli design team. They have really done a good job.
So I would say that Intel has an excellent multicore product, but the core is not the problem. The real problem is the system approach. In the past, for example, I have used Quadrics interconnects from the UK – excellent technology, far better than Infiniband for high-end clustering but in need of marketing to match the technology – with some team members dating back from the old revolutionary Transputer days, before funding cuts, I think under Maggie Thatcher, killed all that off. This interconnect performs extremely well, despite a slow product cycle, as a result of the system-level design approach. With the Quadrics interconnect, you are not just looking at a cable with two ends.
Again, AMD's HyperTransport is superior to PCI Express, partly due to that system approach, and HyperTransport III enables throughput of up to nearly 50 Gbytes/sec per channel. A theoretical combination of next-generation Quadrics and HyperTransport III could scale such systems to 32,000 CPUs, for example, with excellent, nearly linear, scaling and a low-latency, high-performance global virtual-memory space across the whole cluster, for applications using SHMEM. It would be more difficult – or at least slower – to do that using PCI Express. A direct-switched HyperTransport could also be an ultrafast, NUMA-like solution for linking a small number of large Opteron systems into really monstrous supercomputers, with a single system image if necessary – affordably!
The problem for Intel is that it does not have an interconnect with that kind of capability among its current solutions. Obviously, Intel will need to make a decision about what it is going to do.
Some of the old Alpha design team – who seem to have left Intel now, to do their own thing – were involved in trying to create CSI, a competitor to HyperTransport. This would have been partly based on work that design team did on Alpha EV7, the Alpha 21364 core, which was, in a sense, part of the development process for HyperTransport. But for some reason, they didn't achieve any results. So if the rumors that CSI may be delayed until 2009 are true, then I honestly think it would be better for Intel to adopt HyperTransport, taking the same pragmatic attitude it did when it adopted AMD64.
I have no doubt that Intel is capable of a great interconnect at some point, and after all they are able to make equal or better CPUs than those of AMD, there is no doubt about that, but it may save them a couple of years work, if they simply offer a HyperTransport-enabled version of Conroe or Woodcrest, with or without that integrated memory controller – at least for the next couple of years. Not to do so would involve too much risk.
At the end of the day, AMD will probably heed the warnings of the industry and deliver a new core on time. If and when it does, it could have better features than Conroe/Woodcrest, with even more instructions per cycle and better FP performance and so on. And if Intel hasn't solved the interconnect problem by then, they'll be in trouble once more. Why let that happen? Those of us who don't work for either AMD or Intel want to see these companies in competitive balance since that's the situation in which the performance race can best continue.
This is Part IV of a five-part interview. Part I appeared on 23 January, Part II on 24 January, and Part III on 25 January. Part V will follow on 27 January.

Nebojsa Novakovic is a Singapore-based consultant for high-end computing as well as maglev transportation systems. He has been active in various projects throughout the Asia Pacific for over 10 years, the most recent being high-end technical computing clusters using top-end Intel and AMD platforms in combination with a Quadrics high-speed interconnect. His IT commentaries, covering high-end computing issues in particular, have appeared in numerous publications, including Singapore's The Straits Times, and he is a frequent contributor to the well known www.theinquirer.net website.
Photo: Nebojsa Novakovic
Article edited by Chris Hall