Blue-collar processing: Q&A with Tensilica founder Chris Rowen

Based in Santa Clara, Tensilica has been around the semiconductor industry for around 15 years, providing customers with what it calls configurable dataplane processors (DPUs). The company has more than 200 licensees worldwide and is approaching 2.5 billion cores in the market but in March 2013, the company entered into an agreement with EDA provider Cadence to be purchased for US$380 million.

Just after the purchase, Digitimes had the opportunity to sit down with Chris Rowen, chief technology officer and founder of Tensilica to talk about the acquisition and what Tensilica brings to the table with its technology.

Q: Very briefly, what is Tensilica's role in the market?

A: What we are able to do is complement standard control CPUs and add significant opportunities for customers to differentiate their products, be more flexible and add value in terms of new algorithms, and we can do that in minutes rather than in months. Our focus is handling computation on the most critical data, whether it is images in a camera, audio in a multimedia device, or wireless communications. If you look at semiconductor companies, a majority of those focusing on smartphones and a great majority of those focusing on digital TVs are likely to be using our technology. As a company we have become one of the major suppliers of processor technology.

Q: How did the interest from Cadence develop?

I think it is the combination of two things that has made Tensilica so attractive to Cadence.

First of all, we have been able to develop a very unique processor technology as I previously mentioned. The second area is our deep architectural and market knowledge in key domains, especially around baseband, audio and imaging. When it comes to system architects and the software ecosystem, we are able to get into the dialog and discussion at a very high and very early level to help customers figure out where they are going with their product lines and what kinds of fundamental technologies they are going to use.

That ability to engage with customers high and early changes the nature of the relationship between vendor and supplier and when Cadence looks at that, it recognizes that it really wants to change the way it engages with customers as well. It is not just about better tools but being able to have a seat at the table for key decision making or to be invited to discuss what is going on. So while Tensilica is far smaller than Cadence, a much larger proportion of our activities fit into that strategic discussion about architectures and applications.

Q: Can you talk about technology synergies?

A: Let me explain by giving an example. If you look at an SoC, there are a lot of different things going on. There is a host CPU that runs some high-level applications like the operating system and user interface. But it is not terribly efficient, so it is increasingly common for chip designers to implement other kinds of processing or computation to handle things like voice processing, audio, video, vision, baseband and for other customized applications. Then there is a need to interface the device to the outside world, whether it is for flash, DDR, analog front ends for different kinds wireless interfaces, network interfaces, PCI and USB.

What Tensilica brings is a mastery of processor hardware and software and engagement in key applications, Cadence brings a rich portfolio of complementary IP, particularly in interfaces - the analog and digital interfaces that connect with radios, USB, PCI, flash and DRAM. Those strengths represent all the things that form the boundary of the device, while we are providing more of the guts of the SoC.

And our combined focus has most notably not been on the CPU. We've steered clear of the general purpose CPU market because that is much more about legacy. ARM (in RISC) and Intel (in x86) have taken a strong position in those respective markets and remain the dominant general purpose CPU architectures. However, the key takeaway here is that it is in most of the other areas I mentioned where differentiation takes place.

And it's not just a niche that we've carved, but a broad territory of what you might call blue collar processing - all of the heavy lifting that is at the heart of applications such as imaging, vision, communications, networking, storage, audio and voice and we are a leading supplier in those areas.

Q: Can you talk about your processor compared with a general processor? Is there a lot of overlap in what they do?

A: MIPS and ARM overlap a lot, but with Tensilica there is much less overlap. I guess if you look at it one way, our underlying technology has a strong element of RISC processor in it so theoretically you can use Tensilica processors configured as general processors. But that really underplays our capabilities so we've never particularly emphasized that aspect of our technology. We think it is much more important that our dramatic extensibility and parallelism allow us to do so many things ARM cannot do. It is routine for our high-end processors to do the equivalent of 100, 200 or 300 RISC equivalent operations per cycle whereas for ARM it is typically a discussion about whether to do 1, 2, 3 or 4 general purpose RISC operations per cycles.

Now there are some caveats that go with that. While it's true our processors can do more things at a time and more things at less power - because it does them in parallel - many applications don't require hundreds of operations to be done. But in tasks like imaging, where you can work on all the pixels in the image at the same time, it is possible.

So you always need two ingredients - a processor that is capable of doing things in parallel and a problem that naturally exposes, and has a high degree of parallelism in the nature of the task itself. We're masters at finding those applications and then coming up with processors that can exploit the available parallelism in the application. That is why I referred to what we do as blue collar, because the focus is on applications where heavy lifting in involved (multiple operations at the same time), such as in imaging, audio, storage, security and network protocol processing.

That kind of parallelism doesn't really apply when running the sort of general purpose code which ARM focuses on. The applications I just mentioned are much different from what you find inside the code of Angry Birds or for the Android operating system (OS) and one neither expects nor needs an ARM processor to run at that level.

Q: Can you walk us through an example of how a customer would decide that it should implement Tensilica technology, say for an imaging application?

A: Let's take a case for a hypothetical customer - a chipset maker targeting smartphones. Whether it is making chips for its own smartphones or one for the general public, the same challenges are there - different types of functionality need to be integrated while making sure the product stands out to differentiate it in terms of the features, because it is the features that the end customer is going to be most passionate about.

For example, these days camera functions are critically important for smartphones. Many smartphone makers today recognize that having superior imaging in quality, resolution and richness of features - whether it is face detection and tracking, high dynamic range (HDR) photos or handling specialized low-light conditions - are things that will help sell the phone.

Moreover these features are increasingly migrating from a focus on still image functions to continuous video. This is really the key divide, because if it is just a still image function, the application may be run on the main CPU, but to do video requires that the processor run at high rates continuously. Now, while there is a fair amount of computing power coming from a lot of ARM processors on the market, the issue is more of an energy problem. As one leading smartphone company described it to me, a little known fact in the market is that if you took a quad-core 1.5GHz ARM processor and actually ran the cores all together for any period of time the phone would overheat in about 20 seconds, maximum. There is a lot of peak computing power, but it can only be used for a sprint.

But image processing, particularly as it moves to video, is not a sprint. It becomes a marathon. So the question is how long and how fast you can go. The gap between the performance you can afford to power when running on a general purpose processor, and the performance that you really want to have to do these video functions differs by a factor of 10x or 20x. You can do a little bit better with a GPU because it is a little bit more efficient than a general purpose CPU for image processing, but it is still probably 3-5x less efficient in terms of fewer ops per watt than an optimized imaging platform.

What you would really like to do is turn off the CPU and GPU and turn on the image processor during those key periods in order to get that high throughput, but you still want to make sure you have the same programming model and the same ease of bringing the applications onboard.

Q: What about hard coding as an alternative?

A: One of the other potential alternatives to do imaging is to have it completely hard wired, meaning each function has a different block of IP and a different bucket of gates on your chip addressing it. That has worked reasonably well for applications like the simple standard imaging signal processing pipeline, which takes a pixel value off of an image sensor and sort of massages it and improves it in order to get sort of an OK image from the crude image that came raw off the device. However, that model works because image sensors are fairly similar and what you want to do with the images is fairly similar.

However, with video there has been so much innovation taking place using more sophisticated processing. For example, using temporal information and spatial information when catching a sequence of frames and using that information from frame to frame to make each frame look better than it would have when viewed in isolation. Another application is one where you want to start extracting and processing image content information, whether it is facial features or gestures. Those examples are not at all suitable for putting into hard wired logic.

Q: The reason being?

A: Even if it can be expressed in hard wiring you probably don' want to go that way. Doing so would typically have you freezing the definition of the algorithm one to two years before you want that product in the marketplace. So your ability to anticipate and know what is the best possible algorithm, and what is the best and right set of features deployed in the phone is extremely limited.

There are some functions that are governed by standards where you can do that. The H.264 standard has been around for about ten years and it is going to be around for another ten years, It is not a moving target and people doing completely or partially hard wired implementations have done a pretty good job with H.264.

But when it comes to other functions in gesture, image improvement, and other vision applications, these areas are not governed by standards but by competition in the marketplace. Dozens of independent software houses as well as in-house imaging teams at the smartphone companies are constantly competing and coming up with the next new version of the application, so they need a platform that is good for imaging but is flexible enough to accommodate all of these different kinds of algorithms.

Chris Rowen, chief technology officer and founder of Tensilica