# DTMOS Based Low Power High Speed Interconnects for FPGA

Kureshi A.K. and Mohd. Hasan Deptt. Of Electronics Engineering, Aligarh Muslim University, Aligarh, India akkureshi@rediffmail.com, m\_hasan786@rediffmail.com

Abstract—This paper present new energy efficient methods of designing switches and routing interconnects inside FPGA using novel variants of Dynamic Threshold MOS (DTMOS) instead of traditional NMOS pass transistor based switches and interconnects. The extra needed transistors can be easily shared, in multiplexer based routing architecture of FPGA, keeping area overhead to be minimum. Extensive transistor level HSPICE simulation based on Berkeley Predictive Technology Model (BPTM) for 65nm device at operating frequency of 300MHz shows an average 23.35% improvement in power delay product (PDP) of simple switch (NMOS pass transistor) and an average 32.83% improvement in the PDP of Virtex-II FPGA routing interconnects over conventional approaches. Since FPGA consists of thousands of Multiplexer based routing interconnects, hence the overall improvement in the PDP is significant.

*Index Terms--* Low power, High speed, Dynamic threshold-CMOS, FPGA -switches, interconnects.

#### I. INTRODUCTION

In modern Field Programmable Gate Arrays (FPGAs), power consumption has become an important design consideration. Increasing performance and complexity have raised the dynamic power consumption per chip, while in deep sub-micron process, shrinking transistor channel length, reducing oxide thickness and threshold voltage have contributed towards the rapid increase in gate and sub-threshold leakage [1]. High power consumption requires expensive packaging and cooling solutions. In battery-powered applications, high power consumption may prohibit the use of FPGA altogether. Consequently, solutions for reducing FPGA power are needed. As power is related quadratically on the supply voltage, reducing the voltage to ultra-low level results in a dramatic reduction in both power and energy consumption [2, 3]. However, reducing supply voltage also negatively affect circuit performance, therefore, a trade-off should be taken into consideration. A study of a state of the art 90nm Virtex-II FPGA family from Xilinx shows that the interconnect is the dominant energy consumer, with Hex lines, Double lines and long lines consuming most of the power [4]. The major role of interconnects in FPGA power consumption make it a high-leverage target for power optimization. If energy consumed by the interconnects is reduced, it would contribute greatly to a reduction in the overall energy consumed by the FPGAs.

Power consumption of FPGA is a vital design objective for portable devices such as mobile communication and bio-medical applications where low power dissipation is as important as the performance [5]. The basic switching element in most of the FPGAs is NMOS pass transistor but it suffers from the threshold voltage drop and causes high DC power dissipation in level restoring buffers. To eliminate the problem of static power dissipation recent architectures from Xilinx and Altera Stratix FPGAs uses buffers as suggested in [6, 7]. However this approach has some disadvantages, replacing all pass transistor switches with tri-state buffers yield a significant increase in area and power consumption [8]. This paper suggests some methods other than replacing the pass transistors by buffers for reducing energy consumption. The proposed method using novel configurations of dynamic threshold MOS (DTMOS) based switches overcome the above disadvantage at a minimal increase in area. In DTMOS any variations of the gate potential induce the same variations to the body, dynamically changing the threshold voltage. The main advantage of DTMOS over conventional MOS is its higher drive current at lower bias conditions.

This paper illustrates the types of switch configurations that need to be chosen for meeting different performance constraints. Such as assigning high Vt to the main transistor (MT) of DTMOS switch reduces the leakage current. Similarly during ON state forward body-source bias reduces the threshold voltage of the main transistor and hence performance is not affected much [9, 10]. Using this variation we can trade-off standby leakage versus active power delay product. All simulations are performed for 65nm BPTM technology [11], at 300MHz with Vdd=0.9V. The rest of this paper is organized as follows: Section II describes different schemes of DTMOS. Section III explains the simulation set-up. Section IV simulates the pass transistor switch. Section V describes the implementation of proposed routing in Virtex-II FPGA architecture and section VI concludes this paper.

### II. DIFFERENT SCHEMES OF DTMOS

Fig.1 shows different configurations of DTMOS, which works as follows:

(a) *Basic DTMOS*: - It consists of single NMOS transistor, in which the body is connected to the gate terminal. The gate voltage swing cannot exceed the cut-in voltage of the diode otherwise a large current would flow through the forward-biased body-source and body-drain junction diodes. To overcome above limitation of DTMOS some variations have been proposed in the basic circuits [12]–[15], which are as follows.



Fig.1 Configurations of DTMOS

(b) *DTMOS with Augmenting transistor*: - It consists of main transistor (MT) and augmenting transistor (AT). The drain and gate terminals of both transistors are shorted to each other, hence it is not possible to share the augmented transistor between other main transistors.

(c) *DTMOS with limiting transistor*: - It consists of one main transistor (MT) and one limiting transistor (LT) the gate of LT is connected to the reference voltage ( $V_{ref}$ ). In the active mode the threshold voltage of MT is reduced by a magnitude of ( $V_{ref} - V_t$ ) only. The disadvantage of this scheme is that the limiting transistor is always ON due to  $V_{ref}$  at the gate terminal of LT, which increases the gate-oxide tunneling when the switch is inactive because of this reason the standby leakage of this scheme is highest among the all DTMOS schemes.

(d) DTMOS with Augmenting fixed reference voltage transistor: - It consists of main transistor (MT) and augmented transistor (AT). A fixed reference voltage ( $V_{ref}$ ) is connected to the drain of AT. When MT is on a fixed body bias of ( $V_{ref} - V_t$ ) is applied to the MT as only the gate terminal of MT and AT are shorted hence it is

possible to share a single AT between many MT transistors which is useful in multiplexer based routing switches in which same select line drives a large NMOS pass transistor tree. This scheme reduces the area penalty by a large margin. Throughout this paper we have used this scheme of DTMOS, which can be further configured as follows.

*SMSA*- (both main MT and auxiliary AT transistors are having standard threshold voltage 'SV<sub>t</sub>' respectively) this switch consume less power, the delay through this switch is also less, but due to SV<sub>t</sub> of both transistor the leakage consumption of this switch is highest among all other switches.

*SMHA*- (the MT and AT transistors are having standard threshold 'SV<sub>t</sub>' and high threshold 'HV<sub>t</sub>' voltage respectively) the power consumption and leakage of this switch is slightly higher and lower than SMSA switch respectively. Due to 'HV<sub>t</sub>' of AT there is certain delay in body biasing of MT therefore the propagation delay of SMHA is slightly higher than SMSA.

*HMSA*- (the MT and AT transistors are having high threshold ' $HV_t$ ' and standard threshold ' $SV_t$ ' voltage respectively)due to  $HV_t$  of MT the propagation delay of this switch is higher than SMHA but the power delay product of this scheme is comparable to SMHA scheme.

*HMHA*- (here both MT and AT transistors are having high threshold voltage ' $HV_t$ ') therefore the power delay product and delay of this switch is highest among all other DTMOS switch configuration. Between the trade-off of power and delay we have used the criteria of power delay product (PDP) to choose the best switch scheme among all proposed DTMOS switches.

#### **III. SIMULATION SET-UP**

Fig.2 shows the simulation set-up, which models the environment under which the switch has to operate in typical FPGAs. The circuit consists of different switches followed by a wire, which is based on the distributed RLC model. The RLC values of the wire are extracted in [16]. A buffer is placed after a certain length, which drives a 20fF capacitive load. The length of the wire between switch and buffer is varied from  $10\mu m$  to  $100\mu m$ . An input is applied to switch and power and delay figures of different switches are measured at transistor level HSPICE simulator.



Fig.2 Simulation set-up



Fig. 3 Island-style FPGA routing architecture

## IV. ISLAND-STYLE FPGA ROUTING ARCHITECTURE

Fig. 3 shows an island-style FPGA routing architecture; Xilinx FPGA falls in this category. It consists of logic blocks and routing switches. The logic block has a basic logic element (BLE). The BLE consists of one K- input lookup table (K-LUT) and one flip-flop. A group of BLE forms a cluster, which is called configurable logic block (CLB). The connection between the logic blocks can be made using one or two wires programmable switches. The programmable switches are in the form of (NMOS) pass transistors and tri- state buffers. Short connections are implemented by using pass transistor, whereas tristate buffers are suitable for larger connection [17, 18].

## A. PASS TRANSISTOR SWITCH

Fig 4 (a), (b) and (c) shows the delay, power consumption and power delay product of conventional and proposed DTMOS based pass switches with respective length of wire. Due to high threshold voltage  $(HV_t)$  of MT and AT transistor the leakage power consumption and delay of PHMHA pass switch is highest, whereas the switching power is slightly lower. Similarly due to Standard threshold voltage (SV<sub>t</sub>) of MT and AT transistor in PSMSA pass switch, the delay is lowest and leakage consumption is highest among other schemes, with comparable switching power consumption. As compared to conventional pass switch, most of DTMOS based pass switches discussed here provides either lower delay or higher switching power, therefore, we have consider the important performance metric of a circuit i.e. power delay product (PDP). A lower PDP indicates an energy efficient design.

Due to standard threshold voltage  $(SV_t)$  of MT and AT transistor, compared to conventional switch the PSMSA switch scheme provides 23.35% improvements in PDP.

Similarly due to high threshold voltage ( $HV_t$ ) of MT and AT, compared to conventional switch the PHMHA switch scheme provides 17.65% improvements in PDP. As shown in Fig. 4(c), the PDP performance of rest of pass switches is in between PSMSA – PHMHA, but they are more energy efficient than conventional switches.



Fig.4 (a) Delay of various pass switches vs. length (um)





Fig.4 (b) Switching power of various pass switches vs. length (um)

Fig.4 (c) Power delay product of various pass switches vs. length (um)



## V. VIRTEX-II FPGA ARCHITECTURE

Fig.5 Virtex-II FPGA Architecture

TABLE I MAJOR INTERCONNECTS PRESENT IN THE SWITCH MATRIX

| Circuit-Block | Details                        |
|---------------|--------------------------------|
| IMUX          | 30-to-1 multiplexer and buffer |
| OMUX          | 24-to-1 multiplexer and buffer |
| DOUBLE        | 16-to-1 multiplexer and buffer |
| XEH           | 12-to-1 multiplexer and buffer |
| LONGH/LONGV   | n-to-1 multiplexer and buffer  |
|               | *                              |

We have chosen the island-style SRAM-based FPGA architecture for our study. Xilinx and Altera's FPGA fall into this category [6, 7]. Our experiments were performed on Xilinx Virtex-II FPGA as shown in Fig 5. The basic logic element in a Virtex-II is called a slice. A slice consists of 2 LUTs (Lookup-tables), 2 flip-flops, fast carry logic and some wide multiplexers. A CLB in turns consists of 4 slices and an interconnect switch matrix. The interconnect switch matrix consists of variable length wire segment that connect to one another through programmable buffered switches, such as IMUX (input multiplexer) selects and routes a signal to a slice input pin. The OMUX (output multiplexer) selects and routes a signal from a slice output pin to neighboring logic block. Double block drive wire segment that span 2 CLB tiles, HEX blocks drive wires that span 6 CLB tiles and long horizontal (LONGH) and vertical (LONGV) resources span the entire width or height of the FPGA [19, 20]. Table I lists the different interconnects from Virtex-II FPGAs family.

All interconnect consists of wide input NMOS transistor based multiplexers and a level restoring buffer. A threshold voltage  $(V_t)$  drop is lost across the device when it tries to pull the output high, this signal degradation subsequently slows the pull-down transistor of the output buffer and causes a high leakage because the pull-up is not fully off. Since the DTMOS circuit potentially lower the on-state threshold voltage to zero or quite below V<sub>t</sub>, therefore DTMOS based interconnects get benefited from such type of design.

Fig.6 shows the transistor level view of 4 - input mulplexer switch with level restoring buffer, all higher order interconnects present in the switch matrix are implemented similar to 4- input multiplexer. Fig. 7 to 10 shows the power delay products of different interconnects of VIRTEX-II FPGA, such as HEX, IMUX, DOUBLE and OMUX respectively. Compare to conventional the proposed DTMOS interconnects based all interconnects outperforms in terms of PDP, out of them the switch schemes of standard voltage for MT and AT transistors provides least PDP whereas due to high threshold voltage (HV<sub>t</sub>) of MT and AT transistors these switches provides highest PDP. All the proposed DTMOS based switches are more energy efficient than the conventional switches.



Fig.6 Transistor level view of 4 - input mulplexer switch





Fig.8 Power delay product of IMUX switches vs. length (um)



Fig.9 Power delay product of DOUBLE switches vs. length (um)



Fig.10 Power delay product of OMUX switches vs. length (um)

The average improvements of these schemes over conventional interconnects are 33.27%, 31.80%, 32.88 and 33.38% for HEX, IMUX, DOUBLE and OMUX interconnects respectively.

#### VI. CONCLUSION

In this paper, we demonstrated various techniques of DTMOS that can be used for a broad range of supply voltages. DTMOS delay and efficiency becomes superior to the traditional design as the voltage is reduced and the loading is increased. The proposed DTMOS based switches results in an energy-efficient FPGA architecture. Simulation results at supply voltage (Vdd = 0.9v) and operating frequency (f = 300MHz) shows 23.35% improvement in the power delay product (PDP) of PSMSA pass switch, and an average 32.83% improvement in the PDP of Virtex-II interconnects.

Since the interconnect fabric of FPGA has thousands of switches (inside the multiplexer and switch box), therefore overall improvement in PDP for the whole FPGA can be significant. Thus the work described here significantly advances in the state of art low energy FPGA design. The area overhead of the proposed switches and interconnects will be very less if the extra needed transistors for DTMOS based switches is shared intelligently, which is easily possible in all multiplexer based interconnects.

A final complication with DTMOS based switches and interconnects is the process complexity. The isolation to the body contact requires an additional masking step. DTMOS can only be implemented in triple-well process technology. The additional increase in area and process complexities for DTMOS is compensated by its higher operating frequency and higher driving capability as compared to Conventional-CMOS circuit topology. Isolation comes naturally for DTMOS when implemented on SOI wafers but it is difficult for the bulk silicon wafers.

#### REFERENCES

- [1] F. Li, D. Chen, L. He, and J. Cong, "Architecture evaluation for power-efficient FPGAs," in Proc. ACM Intl. Symp. Field-Programmable Gate Arrays, pp. 17-21, Feb 2003.
- [2] K. Poon, A. Yan, and S. Wilton. "A flexible Power Model for FPGAs". In Proceedings of International Conference on Field Programmable Logic and Applications, 2002.
- [3] V. George and J. Rabaey. Low-Energy FPGAs: Architecture and Design. Kluwer Academic Publishers, Boston, MA, 2001.
- [4] Tim Tuan, Bocheng Lai, "Leakage power analysis of a 90nm FPGA" IEEE Custom integrated circuit conference, pp. 57-60, 2003
- [5] C. Hyung-II Kim, H. Soeleman and K. Roy, "Ultra-Low-Power DLMS Adaptive Filter for Hearing Aid Applications," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 11, No. 6, pp. 1058-1067, 2003
- [6] Xilinx, Inc., Datasheet: Virtex Configuration Guide, September 2000.
- [7] Altera, Datasheet: Configuration Devices for SRAM-Based LUT Devices, February2002.
- [8] M.Sheng and J.Rose, "Mixing buffers and pass transistors in FPGA routing architectures," in ACM/SIGDA Int. Symp. On FPGA, pp. 75- 84, February 2001
- [9] A. Wang, A. P. Chandrakasan, and S. V. Kosonocky, "Optimal supply and threshold scaling for subthreshold CMOS circuits," in *Proc. IEEE* Annu. Symp. VLSI, pp. 5–9, 2002.
- [10] N. Lindert, T. Sugii, S. Tang and C. Hu, "Dynamic threshold pass-transistor logic for improved delay at lower power supply voltages," IEEE J. Solid-State Circuts, vol.34, no.1, pp.85-90, Jan.1999.
- [11] http://www-evice.eecs.berkeley.edu/~bsim3/bism4.html
- [12] B. Calhoun, A. Chandrakasan, "Ultra-dynamic voltage scaling using sub-threshold operation and local voltage dithering in 90nm CMOS", ISSCC, pp. 300-301, 2005.
- [13] J.J. Kim, K. Roy, "Double gate-MOSFET subthreshold circuit for ultra-low power applications", IEEE Transactions on Electron Devices, Volume 51, Issue 9, pp. 1468-1474, Sept. 2004.
- [14] Hendrawan Soleleman and Kaushik Roy, "Robust subthreshold logic for ultra-low power operation," IEEE J. VLSI, vol. 9, no. 1, pp.90-98, 2001
- [15] J. Kao, M. Miyazaki, and A. Chandrakasan, "A 175-mV multiply-accumulate unit using an adaptive supply

voltage and body bias architecture," IEEE J. Solid State Circuits, vol. 37, no.11, pp.1545-1554, Nov. 2002.

- [17] J. H. Anderson and F. N. Najm, "Power-aware technology mapping for LUT-based FPGAs," in IEEE International Conference on Field-Programmable Technology (FPT '02), pp. 211–218, Hong Kong, December 2002.
- [18] V.Betz, J. Rose, A. Marquardt, "Architecture and CAD for deep-submicron FPGAs," Kluwer Academic Publishers, 1999.
- [19] A. Rahman and Vijay Polavarapuv, "Evaluation of lowleakage design techniques for Field Programmable Gate Arrays," FPGA, pp. 23-30, 2004.
- [20] L. Shang. A. Kaviani. K. Balhala. "Dynamic Power Consumption in Virtex-II FPGA Family," In ACM FPGA, 2002.

**Kureshi Abdul Kadir** was born in Ahmendnagar on June 9, 1971. He graduated and post-graduated in Electronics Engineering from Aurangabad University. He worked as Asstt. Professor at PDVVP. College of Engineering Ahmednagar presently he is pursuing Ph.D. at Aligarh Muslim University Aligarh (India). His special field of interest includes low power high speed FPGA interconnects design.

**Dr. Mohd Hasan** is working as Reader at Aligarh Muslim University Aligarh India, he has published eighteen research papers in reputed International Journals and fifteen papers in International Conferences. His special field of interest includes low power FPGA design.

<sup>[16] &</sup>lt;u>http://www.eas.asu.edu/~ptm</u>