NDN for Large Scale Scientific Data 암흑물질탐색연구 융합클러스터 세미나 임헌국 한국과학기술정보연구원 (KISTI)
융합 철학적 사고로 되돌아 가는 것 철학 => 인문학, 군사학, 의학, 정치학, 외교학, 자연과학 (입자물리,천문,생물,화학 등), 공학 (컴퓨터/기계/전자/통신 등) 현실세계에서 융합은 협력으로 가능 융합 SW 분야 (지능정보플랫폼/데이터중심네트워킹응용 SW 등)는 4차 산업혁명을 이끌 핵심 지렛대
Contents NDN Overview NDN Platform and Applications Background on NDN for Big Science NDN Application SW for Climate Science/HEP Summary
현재 Internet이 갖는 구조적인 한계 • Security – Cybercrime costs $445B/yr (DDoS attack etc) • Mobility – Rapid shift from PC to mobile devices (Global Internet traffic(2014, 2019) : PC (78%->36%), mobile devices (14%->52%)) • QoS : inefficient delay due to end-to-end transport control • Basically unicast in current internet • Scalability – increasing scale of network, in terms of users, devices, and traffic • IoT: 10B(2013) -> 50B(2020) connected devices • Global Internet traffic: 0.7 ZB(2014) -> 2 ZB(2019) DDoS: Distributed Denial of Service 21 1ZB = 10 bytes * source: CSIS
Named Data Networking (NDN) 현재의 Internet vs NDN (1) Named Data Networking (NDN)은 정보중심네트워킹 (ICN: Information Centric Networking) 을 실현하기 위한 미래 인터넷 아키텍쳐 기술 Firefox 웹 브라우저를 활용한 NDN 통신 사례 Internet Protocol Host centric comm. model Source/destination IP addresses needed Focus on delivering packets from source to destination 구조적인 문제: Security, mobility, QoS, scalability, etc. Named Data Networking (NDN) Data centric comm. model Unique data names needed Focus on the what not the where 현재의 인터넷이 갖는 구조적인 문제에 기인한 비효율을 근본적으로 해결함
현재의 인터넷 vs NDN (2)
IP 없이 호스트 간 unique data name만 가지고 통신 NDN 기술 철학/특징 NDN 기술 철학 Focus on Data (i.e., content name), not Host (i.e., location: IP address) Redesign internet in a clean slate approach NDN 아키텍쳐 특징 Unique and hierarchical name Connectionless communication model Name-based forwarding Mobility, multicasting 기능이 NDN 아키텍쳐 안에 자연스럽게 설계 Securing content itself, not securing communication channel like IP In-network caching 기능을 이용 트래픽 감소 유도: Multiple duplicated data requests can be satisfied from nearby NDN router (cache: CS) IP 없이 호스트 간 unique data name만 가지고 통신 Ok, so first, Let me explain what the named data networking is. The Named data networking, which is called NDN for short, is a redesigned protocol in a clean slate approach to solve the data explosion problem of conventional host centric architecture. So the NDN focuses on the data itself instead of referring ip addresses for the coomunication. So in the NDN, the content name is considered the most important thing in this architecture. Here is distinct NDN characteristics. First one is in-network caching function. Different from current Ip routers NDN routers can store some portion of contents and provide service on behalf of content server. Also contents are retrieved by content name prefix matching instead of ip matching. so the users can fetch interest contents from NDN routers instead of the original content server Due to these features, consumers can fetch the content from the nearby NDN routers and we can expect traffic reductino and decrease latency effect So compare to the conventional host-centric networking architecture, we can expect the NDN can reduce the traffic redundancy and content retrieval latency so it can improve the quality of service and quality of experiences of users. NDN router Lookup hit Lookup miss
2종류의 NDN 패킷 Interest Packet Data Packet Content Name: Identifies the data I want to receive Selector: identifier publisher, etc Content Name: Identifies the data in this packet Signature: Required for all packets Nonce Nonce Data
발신지/수신지 정보가 필요 없음 data (variable length) 32 bits head. type of ver len 16-bit identifier Internet checksum time to live 32 bit source IP address Delete the Source. Named Data Networking does not have sources head. len type of service flgs fragment offset upper layer 32 bit destination IP address Options (if any) IPv6 killed these already Delete the Destination. Named Data Networking does not have destinations
NDN- Security • Content-based Security in NDN – Security is built into content itself – Data packet has digital signature made by PKI (Public Key Infrastructure) • signature securely binds together the tuple<name, data, publisher’s key> – On the other hand, current IP networks secures the channel between two end points • Verifying Data integrity and authentication
NDN - Naming /parc.com /videos/widgetA.mpg /_v2/_s0 •Hierarchical •Unique •Human-readable Naming scheme is the most important piece of NDN architecture and still under research
TCP/IP vs NDN/CCN
NDN Overview NDN Platform and Applications Background on NDN for Big Science NDN Application SW for Climate Science/HEP Summary
Released NDN Platform NDN Platform ver. 0.1.0~0.5.0 (open source) NDN-cxx v0.5– Software router and C library implementation (Released Nov. 2016) NFD v0.5 – NDN Forwarder Deamon NDN common client Libraries with TLV support Python – PyNDN – now fully implemented in Python, with a preliminary feature set. Javascript – NDN-JS– with TLV support by default and user-selectable ndnb support. C++ – NDN-CPP– with TLV support by default and user-selectable ndnb support.
NDN 플랫폼 구조 NDN Platform Architecture Name based forwarding Producer mobility NDN application, Routing, Repository Efficient caching policy NDN Common Client Libraries (CCL) ndn-cxx Security 현재 기능 Management Consumer mobility 멀티캐스팅 Name based forwarding Caching Network Forwarding Daemon (NFD) Improved NFD/ndn-cxx Links and Tunnels (tcp, udp, IP, ..)
글로벌 NDN 테스트베드 현황 (1) http://named-data.net/ndn-testbed/ Topology Map Used for experimentation of the evolving NDN platform
글로벌 NDN 테스트베드 현황 (2) http://named-data.net/ndn-testbed/ NDN Testbed Status:2016/11/05 08:41:04 CDT
NDN 기술 적용 How ?
NDN Application 분야 Content Delivery Applications (Streaming video, etc) Web conference, chat IoT Healthcare Building management system Multiplayer online game Big science Climate science HEP Geology Astronomy NDN webpage: http://named-data.net/
Application 적용 One application deployment into an NDN testbed
NDN Overview NDN Platform and Applications Background on NDN for Big Science NDN Application SW for Climate Science/HEP Summary
Big Science 분야 NDN 접목 필요성 현재 기후 과학 연구는 IP기반의 P2P 시스템인 ESGF을 이용하여 big data를 미국과 유럽 등으로부터 전송 받으나, 장시간의 전송 지연/corrupted data 발생/보안 등 측면에서 많은 단점을 갖음 Content data의 전송, 관리, 보안 기능을 NDN을 활용 혁신적으로 제 공하려는 트렌드가 있음. Data-intensive science 분야에 NDN 응용 플랫폼 기술을 연구 개발 하여 연구 시간의 단축 및 연구 신뢰성 향상에 기여. In Network Caching Target 사용자 그룹 Jan 30-31에 해당하는 interest packet은 데이터 소스인 server1 으로 향함 Feb 01-02에 해당하는 데이터 소스인 server2 로 향함. User 1 기후모델링 LHC (CMS, ATLAS) User 2가 동일한 데이 터 (Jan 30-31, Feb 01-02) 요청시 NDN caching node 에서 전송해줌 User 2는 데이터 전송 시간을 현저히 감소 시킬 수 있음 3. 천문 (LIGO) User 2 4. 미래인터넷기술 개발자그룹
Big Science 분야 NDN 기술 연구개발 현황 국외 현황 Colorado 대학이 기후 모델링 분야에 NDN 기술을 처음으로 접목시킴. Caltech, Fermi Lab, Northeastern Univ. : NDN 기술을 통해 HEP big data 검색/feting 을 위한 NDN 응용 시스템 설계 중. 유럽에선 Imperial College London 을 중심으로 NDN application SW 설계 (HEP) ESnet/Internet2 미국 Esnet의 기후모델링 NDN 망 국내 현황 KISTI가 NDN과 기후과학 분야를 접목시켜 차별화된 NDN 응용 SW 플랫폼 설계/구현 Global NDN testbed using the NDN application for climate science 삼성종기원/서울대/아주대/숭실대 등을 중심으로 ICN 원천 기술 표준화/연구개발 선도
Big Science 분야 NDN Consumer/Producer 플랫폼 및 Global 테스트베드 설계/구현 플랫폼 및 국제 테스트베드 최초 설계 및 구현 ◈ original 기후데이터 파일 name을 DRS 계층적 구조의 NDN name으로 변환 및 metadata 관리 ◈ NDN repository ◈ Consumer로부터의 검색 및 fetching Interest packet에 대한 처리 및 응답 ◈ NDN name 기반으로 사용자가 찾고자 하는 기후 데이터 파일 (CMIP5 파일) 및 metadata 검색 ◈ NDN name 기반으로 사용자가 검색한 파일 fetching NDN consumer platform NDN producer platform ◈ NDN consumer/producer 플랫폼 활용 기후과학 분야 국제 분산 NDN 테스트베드망 최초 설계/구현 ◈ 국제분산 NDN 테스트베드 활용 Korea-US 대륙간 climate data file fetching 실험: big science를 위한 NDN의 유용성입증 NDN testbed established between Korea and US
NDN Overview NDN Platform and Applications Background on NDN for Big Science NDN Application SW for Climate Science/HEP Summary
NDN based Climate Science Application Workflow ESGF: P2P file fetching system using distributed data centers for climate science in IP networks Follow the ESGF workflow based on NDN name NDN name and metadata container Consumer requests the file list of his interest by entering keyword via a User Interface 2. Producer parses and sends query to NDN name and metadata container (DB) 0. Publish climate files in the NDN repository s NDN name translator/ metadata manager Front-end NDN Engine ( FNE) NDN Producer Application ESGF: IP 기반 네트워크 환경하에서 기후모델링 데이터 센터를 globally 구축하고 http/ftp 기술을 이용하여 기후 데이터의 fetching 및 관리 수행 High latency and corrupted ratio Duplicate big data 요청에 기인하여 데이터 폭증 문제 유발 3. Returns climate file name list and metadata Real data container 4. Consumer request climate data to NDN repository using Data name 5. Actual climate data are transferred from the data container (DB) in NDN repository <Climate data file searching and fetching workflow using NDN names>
기후과학 NDN consumer/producer 플랫폼 구조 Front-end system Back-end system Consumer platform 구조 NFD/NDN-cxx (NDN platform: open source), NDN-JS, Web Socket, HTTP server Front-end system (User Interface, Front-end NDN Engine (FNE)) NDN router platform 구조 NFD/NDN-cxx (NDN platform: open source) Producer platform 구조 NFD, NDN-cxx (NDN platform: open source) Back-end system (NDN repository, NDN name translator/metadata manager, NDN producer application)
Implementation Details Languages, SW, tools, etc. Common Environments OS Fedora 21 NDN Platform Open source - NDN Forwarding Daemon (NFD) v 0.3.2, NDN-CXX v 0.3.2 Back-end System NDN repository Repo-ng (data container) (https://github.com/remap/ndnfs-port) MySQL DB (metadata container) NDN producer app. Python, netCDF Front-end System Web UI HTML, JavaScript, NDN.JS (https://github.com/named-data/ndn-js) Web browser Fire-fox (Install add-on for NDN based request)
User Interface (Web-browser) Front-end NDN Engine (FNE) Front-end System 구성도 User Interface (Web-browser) User-friendly functions General file storage Send query: ‘/CMIP5’ Climate Data searching ‘/CMIP5/c/….‘ & metadata-c ‘/CMIP5/b/….‘ & metadata-b ‘/CMIP5/a/….‘ & metadata-a Search results & metadata browsing Front-end NDN Engine (FNE) Send Interest ‘/CMIP5/a/b.nc/0’ ‘/CMIP5/a/b.nc/1’ ‘/CMIP5/a/b.nc/2’ ‘/CMIP5/a/b.nc/3’ ‘/CMIP5/a/b.nc/4’ fetching Receive Data ‘/CMIP5/a/b.nc/1 Receive Data ‘/CMIP5/a/b.nc/0 Receive Data ‘/CMIP5/a/b.nc/4 Receive Data ‘/CMIP5/a/b.nc/2 Receive Data ‘/CMIP5/a/b.nc/3 Query results are returned: ‘/CMIP5/a/….‘ & metadata-a ‘/CMIP5/b/.…’ & metadata-b ‘/CMIP5/c/….’ & metadata-c ‘/CMIP5/d/….’ & metadata-d ‘/CMIP5/e/….’ & metadata-e ‘/CMIP5/f/….’ & metadata-f Using it, a consumer search potential climate data files (CMIP5 files) A consumer fetches the desired file with a target data name from the NDN network.
UI & Front-end NDN Engine (FNE) in Front-end System Searching User Interface Create a search Interest packet and send to producer Visualizing search results Send a search Interest packet 1. Separate NDN name and metadata 2. Create CMOR name and attach to NDN name Receive search results Fetching (Data-centric file fetching) Create a Interest packet with the file NDN name for fetching Send an Interest packet The user interface (UI) allows web-based access to the NDN testbed and supports climate modeling data searching and fetching based on an NDN name. The FNE processes search results returned from the NDN producer application after requesting an Interest packet to search potential CMIP5 files Screenshots of user interfaces; a) climate data (CMIP5) file searching and metadata browsing; b) climate data (CMIP5) file fetching
NDN name translator/metadata manager NDN Producer Application Back-end System 구성도 NDN name translator/metadata manager Climate data Climate data NDN Repository psl_6hrPlev_.._.nc Climate data Climate data Climate data Original climate data files Metadata Container Data Container ndn producer application 에서 query 왔을 때 metadata 몇 개 나가는거 애니메이션 추가하고, data container 에 request 왔을 때 data 몇 개 나가는거 애니메이션 추가해서 마무리 짓기 BNE만 설명하는거면.. 이거 굳이 안해도 될듯 Name conversion metadata manager NDN Producer Application Using it, producers make NDN names by first converting a flat CMIP file (climate data file) name to a hierarchical NDN name format. They then establish a repository to store and manage original CMIP5 files and their NDN name lists/metadata sets.
Components in Back-end System Name translator To convert original climate data file flat name to a hierarchical NDN name format (using one of DRS rules) Metadata manager To extracts the metadata sets from each climate data file and manages them To provide detailed information for climate data files to consumers <Name conversion procedure> NDN repository Data container to store CMIP5 files and to support data fetching Name/metadata container to store converted NDN names and their metadata sets separately Climate data complies with global naming standard Data Reference Syntax (DRS) Using controlled vocabulary Original file name: Flat naming (“a_b_c_d_e.”) It allows data-centric access and file fetching. Back-end system returns corresponding names and their metadata sets, as results of a search request from a consumer Corresponding Data packets are returned to user when the repo receives download requests NDN producer application For an search Interest packet, it finds the corresponding data name carried in the Interest packet from the name/metadata container. It sends the NDN names and their metadata sets to the requesting consumer.
Name Conversion Rule Name conversion rule is based on Data Reference Syntax (DRS) Original climate file name (Flat names) NDN climate data name (Hierarchical names) (a) Original CMIP5 data file name format (CMOR name) <variable name>_<MIP table>_<model>_<experiment>_<ensemble member>[_<temporal subset>][_<geographical info>].nc psl_6hrPlev_MIROC5_historical_r1i1p1_1950010100-1950123118.nc 여기가 name converter 에 대한 설명.. 그 다음이 metadata manager 에 대한 설명 (b) Converted NDN name format <activity>/<product>/<institute>/<model>/<experiment>/<frequency>/<modeling realm>/<MIP table>/<ensemble member>/<variable name>[/<CMOR name>.nc] /CMIP5/output1/MIROC/MIROC5/historical/6hr/atmos/6hrPlev/r1i1p1/psl[/<CMOR name>.nc]
미국-한국 NDN 테스트베드 기반 기후 데이터 파일 Searching/Fetching
Advantages of NDN for Big Science Named data-driven Just data name is needed without location information In network caching High caching ratio of static scientific data improves throughput and user latency Security Secure scientific data itself by signing of publisher Symmetrical forwarding Allows multicasting and remove redundant traffic in whole networks Mobility in architecture itself No perceptible transport Control of interest rate between NDN routers no end-to-end transport control
Current IP based Xrootd VS NDN based Xrootd
Summary NDN Architecture NDN 플랫폼 Climate science 분야 NDN 응용 플랫폼 설계 및 구현 Clean-slate 기반으로 인터넷 구조를 재설계 (Interest/data packet 을 이용함)한 대표적인 ICN 기술 NDN 기술의 핵심은 WHERE (host location)가 아닌 WHAT (content data)에 초점 자체 내에 caching, security, multicasting, mobility 기능이 설계됨 Naming, access control, congestion control, Scalability 연구 이슈 NDN 플랫폼 NDN Platform (ver0.1~0.5): NDN-cxx, NFD, Node.js, NDN-ccl Global NDN testbed Streaming Video, IoT, Healthcare, data-intensive science Climate science 분야 NDN 응용 플랫폼 설계 및 구현 기후과학 분야에 name으로만 통신하는 차별화된 NDN 응용 SW을 설계/구현 (Front-end system/back-end system) 위 플랫폼을 활용하여 기후과학 분야에 대륙간 국제 분산 NDN testbed 망을 확립함 Big science 분야에 NDN 기술 접목을 위해 필요한 overall insight를 제공함. To do: HEP 분야 (암흑물질 탐색 연구데이터) NDN 응용 SW 연구개발