When Poll is Better than Interrupt Jisoo Yang, Dave B. Minturn, and Frank Hady (Intel Corporation) 11th USENIX Conference on File and Storage Technologies (FAST’ 12) May 23, 2013 발표: Jeong Su Park (jspark@archi.snu.ac.kr)
Asynchronous I/O model vs. Synchronous I/O model File System Page cache Application Polling (spin-waiting) Sleep & context switch Wake & context resume A’ B’ Request coalesce, Reordering I/O scheduler Device Device driver ISR L A B Interrupt When device latency is very low Device A+B > A’+B’+L L Do other task
Test environment Host Device DRAM DMA IP DRAM PCI-e Gen. 2 x8 Random I/O with Async. I/O model Sync. I/O model CPU : Intel Xeon processor (quad core, 2.93GHz) X2 with 256KB L2 cache, 8MB L3 cache Main Memory Size : 12GB Kernel : Linux 2.6.33 Latency measurement : CPU timestamp counter IOPS measurement : FIO benchmark Emulates future NVM-e based SSD * Theoretical PCI-e Max. Bandwidth ≈ 3.98GB/s (for 4KB payload) * Theroretical PCI-e Min. Latency ≈ 1us
Experimental results Random Read test The work the CPU performs in async. path (6.31 us) is greater than the spin-waiting time of the sync. path (4.38 us) 9.01us 1.4us 2.7us *C-state : CPU enters power saving mode during I/O 4.1us
Experimental results 512KB random read test For sync. I/O model, only one thread running on each CPU. For async. I/O model, I/O threads are added until the utilization if each CPU reaches 100%
Conclusion 저장장치의 latency가 충분히 작아지는 경우, synchronous I/O model을 사용하는 것이 이익임. Kernel의 I/O 처리 방식이 간단해짐 작은 요청들을 모아 큰 요청으로 만들거나 요청 순서를 바꿀 필요가 없음. 오히려 작은 요청들을 바로 sync . 방식으로 보내는 것이 더 좋을 수도 있음. Ordering 를 쉽게 보장할 수 있음. Buffering/pre-fetching 등의 예측에 기반한 성능 향상 기법들을 적용할 필요성이 적어짐.