CMS-HI computing in Korea Dept. of Physics, University of Seoul 2007 HIM@APCTP Dec. 14, 2007 Inkyu PARK Dept. of Physics, University of Seoul Prof. H.S. Min, Prof. B.D. Yu, Prof. D.S. Park, Prof. J.D. Noh, … S.G. Seo, J.W. Park, G.R. Han, M.K. Choi, S.M. Han, Y.S. Kim, …
Contents 11 pages 10 pages 5 pages 8 pages 12 pages 4 pages 1 CMS computing: Why GRID? 11 pages 2 CMS computing: Tier structure 10 pages 3 WLCG : EGEE & OSG 5 pages 4 OSG based CMS-Tier2 @ SSCC 8 pages 5 Network readiness 12 pages 6 Remarks and Summary 4 pages
CMS Computing Why GRID?
LHC: Another kind of Olympic game For the HEP and HI discoveries + more, ~ few thousands physicists work together. 7000 physicists from 80 countries! Collaborate, but at the same time compete. LHC Olympic game
LHC (Large Hadron Collider) 14TeV for pp, 5.5TeV/n for AA Circumference ~ 27km few Billion Dollars / year bunch crossing rate ~ 40MHz start running this year!!
LHC accelerator schedule Year p+p 2008 450+450 GeV, 5x1032 2009 14 TeV, 0.5x1033 2010 14 TeV, 1x1033 2011 14 TeV, 1x1034 … ... Year HI (Pb-Pb) 2008 None 2009 5.5TeV, 5x1026 2010 5.5TeV, 1x1026 2011 5.5TeV, 1x1027 … ...
CMS Detectors Designed for precision measurements in high luminosity p+p collisions m chambers ZDC (z = 140 m, |η| > 8.2 neutrals) CASTOR (5.2 < |η| < 6.6) Si Tracker including Pixels ECAL HCAL In Heavy Ion Collisions: Functional at highest expected multiplicities Detailed studies at ~dNch/dh ~3000 cross-checks up to 7000-8000 Hermetic Calorimetry Large acceptance Tracker Excellent Muon Spectrometer
Gigantic detectors October 24, 2006
Wires everywhere! Theoretically, # of wires = # of channels 16M wires, soldering, etc…
CMS, raw data size Event data structure EDM Data MC FEVT SimFEVT RAW Digitized detector Generated, simulated RECO Reconstructed AOD Physics extracted 16 million channels ADC (12-16bit) Zero suppression 2MBytes raw data (p+p) Data containers: Run header, Event header, RAW data, Reconstruction data, AOD, calibration, slow control, etc.
AA hot ball + U m+ m- Pb+Pb event (dN/dy = 3500) with -> - Pb+Pb event display: Produced in pp software framework (simulation, data structures, visualization)
Not only data but also MC data Sensor ADC digitize trigger record Real data Data AOD Physics reconstruction Event reconstruction GEANT4 detector simulation MC data MC AOD
Total disaster! Who can save us? Data size Estimation pp AA Beam time / year (s) 107 106 Trigger rate 150Hz 70Hz # of events 1.5x109 0.7x108 Event size 2.5MB 5MB Data produced / year 3.75 PB 0.35 PB 10 years LHC run 40 PB 4 PB MC data required = PB = PB Order of magnitude ~ 100 PB Yearly computing size 10 PB : Compact Disc (700MB) 150 millions CD each CD is 1mm thick 150 km !! with DVD 20 km with 100G HDD 1,000,000 To simulate AA 1-6 hours/events ~ 108 hours to create AA MC ~ 104 CPU needed To reconstruct Data & MC Reprocessing Data analysis etc. Needs few tens of MSI2K newest CPU ~ 1000SI2K pp + AA Order of ~105 CPUs Total disaster! Who can save us? Dca distance of closest approach
Grid computing : E-Science
CMS computing: Tier structure
What happens at Tier0 October 24, 2006
Tier 0 Tier 1 Tier 2 DATA Major storage MC Many CPUs % LAT Bauerdick, 2006
Connection topology Tier-1 Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2
CMS Computing Tier structure Tier0 (CERN) Tier1 (World) PIC IFAE Spain Italy UK USA France Germany Taiwan Tier2 (USCMS)
US CMS tier2 case Total 48 universities 7 have Tier2, others have Tier3 CE: 200-3000CPUs (400-1000kSI2K) SE: > 100TB Network infra: 1-10Gbps USCMS Caltech Florida MIT Nebraska Purdue UC San Diego Wisconsin Site CPU (kSI2K) Disk (TB) WAN (Gbit/s) Caltech 586 60 10 Florida 519 104 MIT 474 157 1 Nebraska 650 105 Purdue 743 184 UCSD 932 188 Wisconsin 547 110
http://www.cmsaf.mit.edu/ (MIT공과대학) US-Tier2 homes http://www.cacr.caltech.edu/projects/tier2-support/ http://tier2.ihepa.ufl.edu/ http://www.cmsaf.mit.edu/ (MIT공과대학) http://www.hep.wisc.edu/cms/comp/
https://tier2.ucsd.edu/zope/UCSDTier2/ http://t2.unl.edu/cms http://www.physics.purdue.edu/Tier2/ https://tier2.ucsd.edu/zope/UCSDTier2/
Manpower Tier2기관 성격 책임자, 운영자 이메일주소 학위및 전공, 현직 Caltech 물리학과, 컴퓨팅센터 Ilya Narsky narsky@hep.caltech.edu 물리학박사, 물리학과 Michael Thomas thomas@hep.caltech.edu 물리학, 물리학과 입자물리연구실 MIT 공과대학 물리 학과, LNS연구소, Tier2 센터 Bolslaw Wyslouch wyslouch@mit.edu 핵물리학, 물리학과 교수, 책임자 Ilya Kravchenko Ilya.Kravchenko@cern.ch 물리학박사, Operation manager Constantin Loizides loizides@MIT.EDU 물리학박사, physics admin Maarten Ballintijn maartenb@mit.edu 물리학박사, system admin Purdue 대학 물리학 과, CMS컴퓨팅센터 Norbert Neumeister neumeist@purdue.edu 입자물리학, 물리학과 교수, 책임자 Tom Hacker hacker@cs.purdue.edu 컴퓨터공학부, 관리자 Preston Smith psmith@purdue.edu 물리학과, 매니저 Michael Shuey shuey@purdue.edu 물리학과, Physics support David Braun dbraun@purdue.edu 물리학과, Software Haiying Xu xu2@purdue.edu CMS연구원, 입자물리전공 Fengping Hu fhu@purdue.edu Wisconsin 대학 물 리학과, CMS 컴퓨팅 센터 Sridhara Dasu dasu@hep.wisc.edu 물리학, 책임자, 물리학과 교수 Dan Bradley dan@hep.wisc.edu 물리학, 입자물리연구실, 연구교수, software Will Maier wcmaier@hep.wisc.edu 물리학, 물리학과 입자물리연구실 연구원, admin Ajit Mohapatra ajit@hep.wisc.edu 물리학, 물리학과, 입자물리연구실 연구원, support Florida 대학 물리학 과 Yu Fu yfu@phys.ufl.edu 물리학과, OSG 매니저 Bockjoo Kim bockjoo@phys.ufl.edu 입자 물리학 박사, CMS 그리드컴퓨팅 관리자 (한국인) Nebraska 대학 물리 학과, Tier2 컴퓨팅 센터 Ken Bloom kenbloom@unl.edu 입자물리학, 물리학과 교수 Carl Lundstedt clundst@unlserve.unl.edu 입자물리학박사, 물리학과 연구교수 Brian Bockelman bbockelm@cse.unl.edu CMS 그리드컴퓨팅 Aaron Dominguez aarond@unl.edu Tier2운용, 물리학박사 Mako Furukawa mako@mako.unl.edu CMS물리, 입자 물리학 UC SanDiego 대학 물리학과, Tier2 컴 퓨팅센터 Terrence Martin tmartin@physics.ucsd.edu 물리학과 컴퓨팅센터 스탭 James Letts jletts@ucsd.edu 입자 물리학박사, 물리학과 연구원
Check points Centers: 7-8 universities 1 or 2 centers CE: 400kSI2K SE: minimum of 100TB Network infra: 1Gbps minimum Need national highways, KREONET / KOREN 1 director, 2 physicists who knows what to do + 3-4 operational staffs support CMSSW, Condor, dCache, + more
Korea CMS Tier2 guideline 최소설치용량 (추천용량) 실사 및 평가 방법 CE (Computing Element) 최소 400kSI2K (800kSI2K 추천) - 개인용 PC숫자는 제외하고 순수히 계산용으로 설치된 것을 확인 - ganglia모니터링과 Condor 모니터링 을 통해 클러스터링 및 배치잡 수행 성을 확인 - 각각의 CPU의 SI2K 확인 ganglia모니터링 설치운영 필수 Condor 배치시스템 설치 운영필수 SE (Storage Element) 최소 100TB (200TB 추천) - 사용자 디스크 (user disk)는 제외 - dCache 모니터링을 통해 스토리지로 사용 할 수 있는지를 실사함 dCache 서버 설치 운용 필수 Network 최소 1Gbps (10Gbps추천) - KREONET또는 KOREN 연동 확인 Location and equipments 물리학과내 냉방능력을 갖춘 독립 공간 필수 (독립 센터 추천) - 실사를 통해 공간을 확인 - 전력수급확인 필수 - 항온항습 시설 확인 필수 최소 50kW 급 전력 수급필수 최소 20RT급 항온항습장치 필수 Human resource LHC/CMS 입자물리 전공자의 운영 책임자 참여 필수 - 운영책임자의 CMSSW 사용능력여 부 확인 - 운영책임자의 LHC/CMS 실험 파악 정도 확인 - 운영팀 인적구성 및 행정인력 확인 국내/외국 CMS 물리학자들과의 공 동연구 능력 확인 운영팀과 행정조직 보유 필수
WLCG EGEE and OSG
World wide LHC Computing Grid Click the picture.
LCG uses three major grid solutions EGEE : most of European CMS institutions open mixed with LCG… (LCG ~ EGEE) OSG : all of US-CMS institution NorduGrid : Northern European contries
Most of European CMS institutions Most of American CMS institutions OSG in USA Europe USA Most of European CMS institutions Most of American CMS institutions
OSG-EGEE compatibility Common VOMS Virtual Organization Management System Condor-G interfaces multiple remote job execution services (GRAM, Condor-C). File Transfers using GridFTP. SRM for managed storage access. Storage Resource Manager Publish OSG BDII to shared BDII for Resource Brokers to route jobs across the two grids. Berkeley Database Information Index. c.f. GIIS, GRIS Active Joint Security groups: leading to common policies and procedures. Automate ticket routing between GOCs.
Software in OSG (installed by VDT) Job Management Condor (including Condor-G & Condor-C) Globus GRAM Data Management GridFTP (data transfer) RLS (replication location) DRM (storage management) Globus RFT Information Services Globus MDS GLUE schema & providers Security VOMS (VO membership) GUMS (local authorization) mkgridmap (local authorization) MyProxy (proxy management) GSI SSH CA CRL updater Accounting OSG Gratia Monitoring MonaLISA gLite CEMon Client tools Virtual Data System SRM clients (V1 and V2) UberFTP (GridFTP client) Developer Tools PyGlobus PyGridWare Testing NMI Build & Test VDT Tests Support Apache Tomcat MySQL (with MyODBC) Non-standard Perl modules Wget Squid Logrotate Configuration Scripts
OSG based CMS-Tier2 @ Seoul Supercomputer Center (SSCC)
CMS Tier 2 requirement (OSG) Network: 2-10Gbps Gbps intranet 2 Gbps out bound CPU: 1 M SI2K ~1000 CPU Storage: 200TB dCache system OSG middle ware CE, SE Batch system Condor + PBS CMS softwares CMSSW et al. at $OSG_APP None of Korean institutions have this amount of facilities for CMS Tier2 %KISTI ALICE Tier 2
Seoul SuperComputer Center Fig.1 256 PC cluster & 64TB storage for CMS Tier2 at University of Seoul Seoul SuperComputer Center SSCC (Seoul Supercomputer Center), established in 2003 with a funding of ~$1M$ Upgrade 2007: funding of ~$0.2M$ Total of 256 CPUs + Giga switches + KOREN2 2007 upgrade + 10Giga bps switch SE: Storage of 120TB ~ 400 HDD of 300GB CE: 128 CPUs MC generation + new 64bit HPC + KREONET Operate OSG
J.W. Park G.R. Hahn M.K. Choi Y.S. Kim Center organization Spokesperson, Director 3 Ph.D. researchers 4 admins/operators, 2 application managers, 2 staffs Deputy spokesperson Director Prof. Hyunsoo Min Prof. Inkyu Park System Software Web User support J.W. Park G.R. Hahn M.K. Choi Y.S. Kim
Condor Computing pool(+120 CPUs) CMS TIER2 TIER3 setup SSCC SPCC dCache pool (200TB) KREONET (GLORIAD) KOREN (APII, TEIN) 64bit cluster (+ 100CPUs) Nortel Passport 8800(Gb) 2ea Extream BlackDiamond 8810(10Gb/Gb) Extream BlackDiamond 8810(10Gb/Gb) 1-2 Gbps Foundry BigIron16(Gb) 2ea Nortel Passport 8800(Gb) Condor Computing pool(+120 CPUs) D-Link L3 Switch(Gb) Gate, Web, Condor-G dCache/gFTP, Ganglia 120 TB storage dCache 0.1M SI2K 2 Gbps network OSG 20Gbps 64bit 3GHz CPU 64 machines 32bit 2GHz CPU 32 machines 8TByte storage CMS-HI Tier 2 Analysis Tier 3
Tier 2 connection Tier0 (CERN) Tier1 (World) PIC Tier2 (USCMS) 3 1 2 KOREA TIER 1 PIC IFAE 3 Spain Italy UK USA France Germany Taiwan 1 Tier2 (USCMS) 2 SSCC Seoul 1 We hope, but we need Tier1 first Current approach! Geographical distance doesn’t really matter. SPCC Physics 2 3
Current Tier2 status
CE and SE status SE : currently 12TB CE : currently 102 CPUs
Documentation by Twiki
Network readiness
Thanks to this project… Yamanaka (KEK, Japan) Seogon KANG ( UoS) Inkyu Park Jinwoo Park (UoS) JPARC E391a CMS-HI David d’Enterria ( CERN, Swiss ) Garam Han (UoS) Bolek Wyslouch ( MIT, USA )
Traceroute example
Between UoS and KEK Existing: KEKAD.JPKDDNETUoS KEKAPIIKORENUoS 20 hops: hop between 9 and 10 takes 40ms. KEKAPIIKORENUoS 14 hops : hop beween 4 and 5 takes 30ms, which is 90% of total delay time
Bandwidth test between UoS and KEK 100Mbps at KEK, while 1G in UoS About a gain of 1.3, but need a correct KOREN usage Need more info and works
Between UoS and CERN 170ms delay in both We didn’t have time to correct this problem by the time of this review.
Between UoS and CERN Still unclear status Somehow we couldn’t see TEIN2
National KOREN Bandwidth Bandwidth between SSCC and KNU Bandwidth between SSCC and KU Bandwidth between SSCC and SKKU Iperf was used for the check of TCP/UDP performance Network 벤치마크 기관 KOREN 연동 속도 서울시립대-고려대학교 99Mbps 서울시립대-경북대학교 520Mbps 서울시립대-성균관대학교 100Mbps
National KOREN Bandwidth NAME Number of connections(threads) at the same time UNIV_NAME W_SIZE TIME 1 10 20 30 40 50 60 70 KNU-128k-10s 53.9 506.0 520.0 KNU 128k KNU-128k-60s 51.8 510.0 KNU-512k-10s 58.6 515.0 521.0 512k KNU-512k-60s 52.3 514.0 522.0 KNU-2m-10s 60.4 503.0 528.0 2m KNU-2m-60s 52.4 511.0 523.0 KNU-8m-10s 59.9 399.0 490.0 8m KNU-8m-60s 53.6 367.0 KNU-16m-10s 42.6 218.0 16m KNU-16m-60s 36.4 232.0 KU-8m-10s 88.5 97.4 87.4 88.0 87.7 KU 8m 10 KU-8m-60s 87.0 87.9 88.1 82.2 60 KU-16m-10s 29.7 87.8 87.2 16m KU-16m-60s 76.6 SKKU-512k-10s 94.1 95.6 96.1 98.3 98.9 98.1 98.7 97.6 SKKU 512k 10 SKKU-512k-60s 94.3 94.7 94.9 60 SKKU-8m-10s 97.3 117.0 111.0 138.0 144.0 137.0 251.0 8m SKKU-8m-60s 96.5 97.9 102.0 106.0 109.0 SKKU-16m-10s 100.0 130.0 147.0 146.0 155.0 324.0 16m SKKU-16m-60s 95.2 108.0 103.0
Bandwidth results SSCC-KNU shows 500Mbps connection 500Mbps is our test machine maximum
Optimized APII and TEIN2 Maximun TEIN2 connection is 622Mbps AS559 - SWITCH Swiss Education and Research Network AS20965 - GEANT IP Service AS24490 - TEIN2 Trans-Eurasia Information Network AS9270 - Asia Pacific Advanced Network Korea (APAN-KR) APII connection is 10Gbps (uraken3.kek.jp = 1G) NAME-W_SIZE-S Number of threads(connections) at the same time NAME SIZE TIME 1 10 20 30 40 50 60 70 CERN-512k-10s 7.9 30.0 32.2 39.9 CERN 512k 10 CERN-512k-60s 7.7 57.4 79.1 83.6 77.2 67.8 70.8 62.8 60 CERN-8m-10s 5.9 78.8 112.0 119.0 92.5 8m CERN-8m-60s 47.5 88.2 95.0 101.0 98.8 103.0 91.4 CERN-16m-10s 20.0 96.9 130.0 16m CERN-16m-60s 69.8 92.0 109.0 106.0 118.0 CERNNF-8m-Hs 141.0 431.0 429.0 446.0 CERNNF H CERNNF-512k-Hs 113.0 193.0 340.0 442.0 KEK-512k-10s 42.6 274.0 346.0 356.0 KEK KEK-512k-60s 43.6 398.0 478.0 495.0 473.0
Results Network to both institutions has been optimized, and shows 500Gbps
Final network map
Remarks & Summary
Brief history so far, now, and tomorrow 2006 summer: visit CERN, work with CMSSW.0.7.0 to 0.8.0, implement libraries. Work with HIROOT too 2006 fall: CMS-KR Heavy-Ion team was formed Mainly work in reconstruction software (Jet, muon) 2007 winter: Our team visited MIT. OSG installed, dCache tested, Monitoring system tested. 2007 spring: Upgrade for SSCC, ~$0.2M Not enough to be a standard CMS Tier2, but good for a physics program, CMS-HI 2007 summer: Tier2 in test operation, visit CERN 1 graduate student will stay at CERN 2007 winter: Full size CMS-HI tier2 are being built Starting from 2008, MOST will support a Tier2 center
Remarks The only solution for LHC/CMS Computing is Grid. HEP again leads the next computing technology, as it did in WWW. LCG(EGEE) and OSG will be the ones! Expect lots of industrial by-products SSCC at Univ. of Seoul starts CMS-Tier2 based on OSG Due to its limited resource, we only run CMS-HI Tier2 for now. Plugged in to US-CMS TIER1 for now. We should not loose this opportunity if we want to lead IT & Science. We need to do Korea Tier2 or Tier1, now.
Summary Seoul SuperComputing Centre (SSCC) becomes an OSG based CMS Tier2 centre CE :102 CPUs 200CPUs SE: 12 TB 140TB Network to CERN and KEK via APII and TEIN2 has been optimized UoS-KEK : 500Mbps UoS-CERN: 500Mbps Everything went smoothly. Further upgraded needed soon. OSG, LCG Tier2 center needs a connection of 2Gbps – 10Gbps Further KOREN /KREONET support is important An official launching of CMS Tier2 are coming MOST will launch a program to support a CMS Tier2 center Many thanks to our HEP and HIP communities.
Finale! OLYMPIC 2008
Supplementary Slides
BC 5c: Atom Korea CMS-HI uses the Open Science Grid (OSG) to provide a shared infrastructure in Korea to contribute to the WLCG. Mostly US Tier-1 and all US Tier-2s are part of the OSG. Integration with and interfacing to the WLCG is achieved through participation in many management, operational and technical activities. In 2006 OSG has effectively contributed to CSA06 and CMS simulation production. In 2007 OSG plans are to improve the reliability and scalability of the infrastructure to meet LHC needs, as well as add and support needed additional services, sites and users.
Web-Based Monitoring
Web-Based Monitoring: home tools for remote status display easy to use, flexible, interactive work with firewall and with security
Web-Based Monitoring : page1 Run info and overall detector status can be seen
Web-Based Monitoring : Run summary Query simple query sophisticate query
Web-Based Monitoring By clicking a specific link, you can access more elaborated info
CMS computing bottom line Fast reconstruction codes Streamed Primary Datasets Distribution of Raw and Reconstructed data Compact data formats Effective and efficient production reprocessing and bookkeeping systems
중성자의 발견 The event display and data quality monitoring visualisation systems are especially crucial for commissioning CMS in the imminent CMS physics run at the LHC. They have already proved invaluable for the CMS magnet test and cosmic challenge. We describe how these systems are used to navigate and filter the immense amounts of complex event data from the CMS detector and prepare clear and flexible views of the salient features to the shift crews and offline users. These allow shift staff and experts to navigate from a top-level general view to very specific monitoring elements in real time to help validate data quality and ascertain causes of problems. We describe how events may be accessed in the higher level trigger filter farm, at the CERN Tier-0 centre, and in offsite centres to help ensure good data quality at all points in the data processing workflow. Emphasis has been placed on deployment issues in order to ensure that experts and general users may use the visuslisation systems at CERN, in remote operations and monitoring centers offsite, and from their own desktops. 67 담당교수: 박인규
쿼크모델과 양자 색소역학 CMS offline software suite uses a layered approach to provide several different environments suitable for a wide range of analysis styles. At the heart of all the environments is the ROOT-based event data model file format. The simplest environment uses "bare" ROOT to read files directly, without the use of any CMS-specific supporting libraries. This is useful for performing simple checks on a file or plotting simple distributions (such as the momentum distribution of tracks). The second environment supports use of the CMS framework's smart pointers that read data on demand, as well as automatic loading of the libraries holding the object interfaces. This environment fully supports interactive ROOT sessions in either CINT or PyROOT. The third environment combines ROOT's TSelector with the data access API of the full CMS framework, facilitating sharing of code between the ROOT environment and the full framework. The final environment is the full CMS framework that is used for all data production activities as well as full access to all data available on the Grid. By providing a layered approach to analysis environments, physicists can choose the environment that most closely matches their individual work style.