Download presentation
Presentation is loading. Please wait.
1
SungKyunKwan University
저전력 통신 SoC 설계 조 준 동 VADA Lab. SungKyunKwan University 2006.8
2
발표순서 저전력 SoC 설계 기초 Power metric 기본적인 저전력 설계 기술 재구성구조를 이용한 저전력 설계
Reconfigurable Radio Systems (Software Defined Radio) 병렬처리를 통한 저전력 설계 Network-centric Design Reliable Design Deep submicron Clock and Power 관리 기법
3
저전력 설계 기초
4
이동 단말기 = 소형+저전력+기능 GPS Noise cancellation headphones Cochlear implant
Cellular phone Medical watch * Designers of Battery operated products are concerned with achieving functionality while maximizing battery life and minimizing size. * Especially in portable consumer electronics, designers want to ensure longer batter life and smaller size to reach a larger customer base. No one likes replacing batteries! The longer you can go on the same batteries - they better of you are. Size is also a key concern - consumers want products we can carry with us - clip onto our belts, carry in our pockets, etc. Hearing aid Portable audio Digital still camera Digital radio
5
미래의 모바일 컴퓨팅 실시간 처리 이동 슈퍼 컴퓨팅 Speech recognition, Cryptography.
Augmented reality. 16개의 Pentium-4 필요 2004 Intel 55M TR’s 122mm2 0.09u GHz 0.03u 저전력을 만족하면서 고성능 requires (massive) parallelism Multi-processor systems Subsystem integration Mudge et al:
6
Emb. Systems Prog. 2005: # of Processors per chip
7
Processor Heterogeneity
8
Parallelism favors lower power solutions
P. G. Paulin et al, “Parallel Programming Models for a Multiprocessor SoC Platform Applied to Networking and Multimedia”, IEEE Transactions on VLSI Systems, Vol. 14, No. 7, July 2006
9
Parallelism Inside the Processor
Chris Rowen, President and CEO, Tensilica, Inc.
10
Multiple concurrent processors much lower energy
Chris Rowen, President and CEO, Tensilica, Inc.
11
Keys to Efficient MP Flexible range of topologies
Chris Rowen, President and CEO, Tensilica, Inc.
12
Two Multi-processor Design Flows
Chris Rowen, President and CEO, Tensilica, Inc.
13
Anatomy of a Cellular Phone
14
Why is SDR Challenging? Scott Mahlke
15
Core Technologies for Future Networks
–OFDM 64 –2048 point FFT –MIMO –use of multiple antennas for transmission/reception –Low density parity check codes •Key insight: SDR requires innovation across algorithm, software and hardware •SDR platforms offer low-cost, longevity, and adaptability
16
Parallel Architectures
17
저전력 디바이스의 필요성 실용적 (Reducing power requirements of high throughput portable applications) 경제적 (Reducing packaging costs and achieving memory savings) 기술적 (Excessive heat prevents the realization of high density chips and limits their functionalities)
18
동적 전력 소모 Dynamic Power Average power consumption by a node cycling at each period T: PMOS Network NMOS VDD iDD CL Vo + - Vin Average power consumed by a node with partial activity (only a fraction of the periods has a transition)
19
정적 전력 소모 Static power Pstatic = VCC x Ntr X Ileak
20
SCALING TREND Keeping the pace with Gene’s Law: DPS Chip’s energy efficiency (MIPS/Watt) doubles every 18 Month Low Cost High flexibility Reduce idle power in idle state Gene’s Law Tech&Circ: Voltage islands, Arch: MPSoC Low Cost Integrate, but only when cost effective Push towards A & D integration High flexibility Software radios, reconfigurable architectures Reduce static power in idle state Variable Vdd, VT
21
A distributed system on a single chip!
MPSoC NOC IO IO IO COPR COPR SOCBUS CPU MEM MEM MEM MEM Vdd1 Vdd2 Vdd3 From single-master CPU to MPSoC From bus-based interconnect to NoC Emphasize reuse, flexibility A distributed system on a single chip!
22
저전력 소모 기술 개발 현황 개발자 응용 제품 특징 기타 IBM, Austin DoD DARPA
DPM (PowerPC 405LP) 휴대용 프로세서 전력관리, 스케줄링, OS 시스템 (90% 전력 감소) Philips STMicroelectronics Atmel PCF50606: Single Chip power management unit (for smart phone and wireless PDA) Programmed power management (70% 전력 감소) Atrenta GlassSpy CAD tool RTL 구조의 HDL 및 SystemC로 gate된 클록 구조를 생성
23
에너지 감축을 위한 2가지 요소 C0 redundant h/w extraction Locality of reference
Demand-driven / Data-driven computation Preservation of data correlations Power down techniques (Clock gating, dynamic power management) All in one Approach (SOC) Vdd Dynamic voltage scaling based on workload 2-D pipelining (systolic arrays) Parallel processing
24
Parallel-Pipelined Architectures
Ppar=0.2Pref
25
Loop unrolling The technique of loop unrolling replicates the body of a loop some number of times (unrolling factor u) and then iterates by step u instead of step 1. This transformation reduces the loop overhead, increases the instruction parallelism and improves register, data cache or TLB locality. Loop overhead is cut in half because two iterations are performed in each iteration. If array elements are assigned to registers, register locality is improved because A(i) and A(i +1) are used twice in the loop body. Instruction parallelism is increased because the second assignment can be performed while the results of the first are being stored and the loop variables are being updated. Neither the capacitance switched nor the voltage is altered. However, loop unrolling enables several other transformations (distributivity, constant propagation, and pipelining). After distributivity and constant propagation, The transformation yields critical path of 3, thus voltage can be dropped.
26
루프 풀기에 의한 저전력 기법 Loop Unrolling for Low Power
27
대수 변화 및 상수 전달에 의한 방법
28
Loop Unrolling for Low Power
29
수체계 변환에 의한 저전력 FFT Logarithmic Number System의 사용 Log 수 체계
look-up table 크기 영역에 대해서 2의 log를 취한 값을 산출한다. 변환된 log 값을 어떤 n 비트로 제한된 표현 범위의 값을 갖는 2진수로 표현. LNS 연산 곱셈 : 가산 가감산 : look-up table 연산의 정확도 소수부가 2비트 이상의 경우 BER 성능 감소 없음 전력 소모 실험 결과 일반 butterfly FFT에 비하여 약 60% 정도 까지 전력 소모가 감소함 7.8mW -> 3.1mW Low complexity OFDM Receiver using Log-FFT for Coded OFDM System [1] Log 변환된 수체계를 사용하는 연산 모듈을 사용하였다. - 연산 모듈 중 크기가 가장 큰 FFT에 적용 - Log-FFT의 복잡도는 look-up table에 따라 변화 - look-up table은 logarithmic number systems (LNS) 표현에 따라 변화, 즉 어느 정도의 크기를 및 정확도로 변환하는 가 등에 따라 다름 [2] LNS - 어떤 수를 부호와 크기 영역으로 분리한다. - 크기 영역에 대해서 2의 log를 취한 값을 산출한다. - 이때, 0에 근접할 수록 - 무한대의 값을 가지므로, 특정 값이하의 경우 한계치를 둔다. - 이 변환된 log 값을 어떤 n 비트로 제한된 표현 범위의 값을 갖는 2진수로 표현하면, 정수 비 트 + 소수 비트 + 부호 비트와 같다. - 이 값은 소수 비트의 길이에 따라서 log 값을 양자화 하여 얻을 수 있다. - LNS를 사용함으로써, 곱셈은 단순한 가감산으로 그리고 가감산 연산은 뺄셈과 look-up table 그리고 덧셈 연산으로 구성할 수 있다. [3] 연산의 정확도 - 연산의 정확도는 FFT 블록의 BER 성능에 따라서 판단해 볼 수 있다. - 실험 결과 log 변환된 값의 소수부를 2정도의 값으로 할당할 경우, 성능 감소 없이 동작할 수 있다. [4] 전력 소모 - 실험 결과 일반 butterfly FFT에 비하여 약 60% 정도 까지 전력 소모가 감소함 - 7.8mW -> 3.1mW
30
분할을 통한 적절한 전압 공급 SLOW 3V FAST 5V SLOW SLOW 3V 3V SLOW 3V
분할을 통한 적절한 전압 공급 SLOW 3V FAST 5V SLOW SLOW 3V 3V SLOW 3V Partition the chip into multiple sub-units each of which is designed to operate at a specific supply voltage
31
Using Vdd programmability Wayne Burleson
• High Vdd to devices on critical path • Low Vdd to devices on non-critical paths • Vdd Off for inactive paths A – Baseline Fabric B – Fabric with Vdd Configurable Interconnect This work builds on a similar idea for FPGAs described in: Fei Li, Yan Lin and Lei He. Vdd Programmability to Reduce FPGA Interconnect Power, IEEE/ACM International Conference on Computer-Aided Design, Nov. 2004
32
DIGLOG 곱셈기 1st Iter 2nd Iter 3rd Iter Worst-case error -25% -6% -1.6%
Prob. of Error<1% 10% % % With an 8 by 8 multiplier, the exact result can be obtained at a maximum of seven iteration steps (worst case)
33
재구성구조를 이용한 저전력 설계
34
Reconfigurable Hardware
재구성을 이용한 에너지 효율증대 Doing More by Doing Less 알고리즘 진화에 따른 유연성 다양한 표준 수용 Dynamic QoS 제공 전력 감축 설계 비용 감축: 개발 및 유지 보수해야 하는 플랫폼 감소 임베디드 프로세스 사용 Reconfigurable Hardware A D B C E Task 1 X Z W Y Task N H I J The way out is energy efficiency: doing more work with the same amount of energy. Traditionally, designers have been focused on low-power techniques for VLSI design. However, the key to energy efficiency in future mobile multimedia devices will be at the higher levels: energy-efficient system architectures, energy-efficient communication protocols, energy-cognisant operating systems and applications, and a well designed partitioning of functions between wireless device and services on the network. Mobile computers must remain usable in a variety of environments. They will require a large amount of circuits that can be customized for specific applications to stay versatile and competitive. Reconfigurability is thus an important requirement for mobile systems, since the mobiles must be flexible enough to accommodate a variety of multimedia services and communication capabilities and adapt to various operating conditions in an (energy) efficient way. Reconfigurability also has another more economic motivation: it will be important to have a fast track from sparkling ideas to the final design. If the design process takes too long, the return on investment will be less. It would further be desirable for a wireless terminal to have architectural reconfigurability whereby its capabilities may be modified by downloading new functions from network servers. Such reconfigurability would also help in field upgrading as new communication protocols or standards are deployed, and in implementing bug fixes [3]. One of the key issues in the design of portable multimedia systems is to find a good balance between flexibility and high-processing power on one side, and area and energy-efficiency of the implementation on the other side.
35
Radio systems:Different power Constraints
3G 802.11bg 1 W 100 mW Bluetooth UWB ZigBee 10 mW WiFi – Mbits/sec unlicensed band OFDM, M-ary coding 3G – .1-2 Mbits/sec wide area cellular CDMA, GMSK Bluetooth – .8 Mbit/sec cable replacement 40-100mW for RX or TX Frequency hopping spread spectrum ZigBee – Kbits/sec low power, low cost QPSK, direct sequence spread spectrum UWB – Recently allowed by FCC Short pulses (no carrier), bi-phase or PPM ZigBee UWB 1 mW 0 GHz 1GHz 2 GHz 3 GHz 4 GHz 5 GHz 6 GHz
36
Technology Evolution
37
SDR = Reconfigurable Radios
38
SDR Configuration Soft Radio Digital Signal Processing Engine
Modulation Format QPSK DQPSK p/4 DQPSK {16,64,256,1024} QAM OFDM OFDM CDMA Digital Down/Up Conversion (DDC) Channel Center Decimation/Interpolation rates Compensation Filters Matched Filter a = {0.25,0.35,...} Soft Radio Digital Signal Processing Engine Channel Access CDMA TDMA FEC Convolutional Reed-Solomon Concatenated Coding Turbo CC/PC (De-)Interleave DSSS Rake, track, acquire Multi User Detect. (MUD) ICU Network Interface Definition Beam Forming Security
39
재구성 HW/SW 구조
40
재구성 DSP를 이용한 사례: DMB 변복조부
업체명 생산품목과 주요 특징 TI (미국) DRE200 : 범용DSP사용하여 COFDM/Audio FEC/Decoder수행, 160mW ATMEL (독일) U2739M : Oak DSP사용하여 COFDM복조, HW Audio / FEC Decoding, 860mW Panasonic (일본) MN66720UC : SDSP for COFDM, MDSP for Audio, Frontier Silicon(영국) Chorus FS1010 : Special DSP for COFDM/Audio, 100mW
Similar presentations