Download presentation
Presentation is loading. Please wait.
Published byYandi Kusnadi Modified 6년 전
1
CMS-HI computing in Korea Dept. of Physics, University of Seoul
2007 Dec. 14, 2007 Inkyu PARK Dept. of Physics, University of Seoul Prof. H.S. Min, Prof. B.D. Yu, Prof. D.S. Park, Prof. J.D. Noh, … S.G. Seo, J.W. Park, G.R. Han, M.K. Choi, S.M. Han, Y.S. Kim, …
2
Contents 11 pages 10 pages 5 pages 8 pages 12 pages 4 pages 1
CMS computing: Why GRID? 11 pages 2 CMS computing: Tier structure 10 pages 3 WLCG : EGEE & OSG 5 pages 4 OSG based SSCC 8 pages 5 Network readiness 12 pages 6 Remarks and Summary 4 pages
3
CMS Computing Why GRID?
4
LHC: Another kind of Olympic game
For the HEP and HI discoveries + more, ~ few thousands physicists work together. 7000 physicists from 80 countries! Collaborate, but at the same time compete. LHC Olympic game
5
LHC (Large Hadron Collider)
14TeV for pp, 5.5TeV/n for AA Circumference ~ 27km few Billion Dollars / year bunch crossing rate ~ 40MHz start running this year!!
6
LHC accelerator schedule
Year p+p 2008 GeV, 5x1032 2009 14 TeV, 0.5x1033 2010 14 TeV, 1x1033 2011 14 TeV, 1x1034 … ... Year HI (Pb-Pb) 2008 None 2009 5.5TeV, 5x1026 2010 5.5TeV, 1x1026 2011 5.5TeV, 1x1027 … ...
7
CMS Detectors Designed for precision measurements in high luminosity p+p collisions m chambers ZDC (z = 140 m, |η| > 8.2 neutrals) CASTOR (5.2 < |η| < 6.6) Si Tracker including Pixels ECAL HCAL In Heavy Ion Collisions: Functional at highest expected multiplicities Detailed studies at ~dNch/dh ~3000 cross-checks up to Hermetic Calorimetry Large acceptance Tracker Excellent Muon Spectrometer
8
Gigantic detectors October 24, 2006
9
Wires everywhere! Theoretically, # of wires = # of channels
16M wires, soldering, etc…
10
CMS, raw data size Event data structure EDM Data MC FEVT SimFEVT RAW Digitized detector Generated, simulated RECO Reconstructed AOD Physics extracted 16 million channels ADC (12-16bit) Zero suppression 2MBytes raw data (p+p) Data containers: Run header, Event header, RAW data, Reconstruction data, AOD, calibration, slow control, etc.
11
AA hot ball + U m+ m- Pb+Pb event (dN/dy = 3500) with -> -
Pb+Pb event display: Produced in pp software framework (simulation, data structures, visualization)
12
Not only data but also MC data
Sensor ADC digitize trigger record Real data Data AOD Physics reconstruction Event reconstruction GEANT4 detector simulation MC data MC AOD
13
Total disaster! Who can save us?
Data size Estimation pp AA Beam time / year (s) 107 106 Trigger rate 150Hz 70Hz # of events 1.5x109 0.7x108 Event size 2.5MB 5MB Data produced / year 3.75 PB 0.35 PB 10 years LHC run 40 PB 4 PB MC data required = PB = PB Order of magnitude ~ 100 PB Yearly computing size 10 PB : Compact Disc (700MB) 150 millions CD each CD is 1mm thick 150 km !! with DVD 20 km with 100G HDD 1,000,000 To simulate AA 1-6 hours/events ~ 108 hours to create AA MC ~ 104 CPU needed To reconstruct Data & MC Reprocessing Data analysis etc. Needs few tens of MSI2K newest CPU ~ 1000SI2K pp + AA Order of ~105 CPUs Total disaster! Who can save us? Dca distance of closest approach
14
Grid computing : E-Science
15
CMS computing: Tier structure
16
What happens at Tier0 October 24, 2006
17
Tier 0 Tier 1 Tier 2 DATA Major storage MC Many CPUs
% LAT Bauerdick, 2006
18
Connection topology Tier-1 Tier-1 Tier-1 Tier-1 Tier-2 Tier-2 Tier-2
19
CMS Computing Tier structure
Tier0 (CERN) Tier1 (World) PIC IFAE Spain Italy UK USA France Germany Taiwan Tier2 (USCMS)
20
US CMS tier2 case Total 48 universities
7 have Tier2, others have Tier3 CE: CPUs ( kSI2K) SE: > 100TB Network infra: 1-10Gbps USCMS Caltech Florida MIT Nebraska Purdue UC San Diego Wisconsin Site CPU (kSI2K) Disk (TB) WAN (Gbit/s) Caltech 586 60 10 Florida 519 104 MIT 474 157 1 Nebraska 650 105 Purdue 743 184 UCSD 932 188 Wisconsin 547 110
21
http://www.cmsaf.mit.edu/ (MIT공과대학)
US-Tier2 homes (MIT공과대학)
22
https://tier2.ucsd.edu/zope/UCSDTier2/
23
Manpower Tier2기관 성격 책임자, 운영자 이메일주소 학위및 전공, 현직 Caltech 물리학과, 컴퓨팅센터
Ilya Narsky 물리학박사, 물리학과 Michael Thomas 물리학, 물리학과 입자물리연구실 MIT 공과대학 물리 학과, LNS연구소, Tier2 센터 Bolslaw Wyslouch 핵물리학, 물리학과 교수, 책임자 Ilya Kravchenko 물리학박사, Operation manager Constantin Loizides 물리학박사, physics admin Maarten Ballintijn 물리학박사, system admin Purdue 대학 물리학 과, CMS컴퓨팅센터 Norbert Neumeister 입자물리학, 물리학과 교수, 책임자 Tom Hacker 컴퓨터공학부, 관리자 Preston Smith 물리학과, 매니저 Michael Shuey 물리학과, Physics support David Braun 물리학과, Software Haiying Xu CMS연구원, 입자물리전공 Fengping Hu Wisconsin 대학 물 리학과, CMS 컴퓨팅 센터 Sridhara Dasu 물리학, 책임자, 물리학과 교수 Dan Bradley 물리학, 입자물리연구실, 연구교수, software Will Maier 물리학, 물리학과 입자물리연구실 연구원, admin Ajit Mohapatra 물리학, 물리학과, 입자물리연구실 연구원, support Florida 대학 물리학 과 Yu Fu 물리학과, OSG 매니저 Bockjoo Kim 입자 물리학 박사, CMS 그리드컴퓨팅 관리자 (한국인) Nebraska 대학 물리 학과, Tier2 컴퓨팅 센터 Ken Bloom 입자물리학, 물리학과 교수 Carl Lundstedt 입자물리학박사, 물리학과 연구교수 Brian Bockelman CMS 그리드컴퓨팅 Aaron Dominguez Tier2운용, 물리학박사 Mako Furukawa CMS물리, 입자 물리학 UC SanDiego 대학 물리학과, Tier2 컴 퓨팅센터 Terrence Martin 물리학과 컴퓨팅센터 스탭 James Letts 입자 물리학박사, 물리학과 연구원
24
Check points Centers: 7-8 universities 1 or 2 centers CE: 400kSI2K
SE: minimum of 100TB Network infra: 1Gbps minimum Need national highways, KREONET / KOREN 1 director, 2 physicists who knows what to do + 3-4 operational staffs support CMSSW, Condor, dCache, + more
25
Korea CMS Tier2 guideline
최소설치용량 (추천용량) 실사 및 평가 방법 CE (Computing Element) 최소 400kSI2K (800kSI2K 추천) - 개인용 PC숫자는 제외하고 순수히 계산용으로 설치된 것을 확인 - ganglia모니터링과 Condor 모니터링 을 통해 클러스터링 및 배치잡 수행 성을 확인 - 각각의 CPU의 SI2K 확인 ganglia모니터링 설치운영 필수 Condor 배치시스템 설치 운영필수 SE (Storage Element) 최소 100TB (200TB 추천) - 사용자 디스크 (user disk)는 제외 - dCache 모니터링을 통해 스토리지로 사용 할 수 있는지를 실사함 dCache 서버 설치 운용 필수 Network 최소 1Gbps (10Gbps추천) - KREONET또는 KOREN 연동 확인 Location and equipments 물리학과내 냉방능력을 갖춘 독립 공간 필수 (독립 센터 추천) - 실사를 통해 공간을 확인 - 전력수급확인 필수 - 항온항습 시설 확인 필수 최소 50kW 급 전력 수급필수 최소 20RT급 항온항습장치 필수 Human resource LHC/CMS 입자물리 전공자의 운영 책임자 참여 필수 - 운영책임자의 CMSSW 사용능력여 부 확인 - 운영책임자의 LHC/CMS 실험 파악 정도 확인 - 운영팀 인적구성 및 행정인력 확인 국내/외국 CMS 물리학자들과의 공 동연구 능력 확인 운영팀과 행정조직 보유 필수
26
WLCG EGEE and OSG
27
World wide LHC Computing Grid
Click the picture.
28
LCG uses three major grid solutions
EGEE : most of European CMS institutions open mixed with LCG… (LCG ~ EGEE) OSG : all of US-CMS institution NorduGrid : Northern European contries
29
Most of European CMS institutions Most of American CMS institutions
OSG in USA Europe USA Most of European CMS institutions Most of American CMS institutions
30
OSG-EGEE compatibility
Common VOMS Virtual Organization Management System Condor-G interfaces multiple remote job execution services (GRAM, Condor-C). File Transfers using GridFTP. SRM for managed storage access. Storage Resource Manager Publish OSG BDII to shared BDII for Resource Brokers to route jobs across the two grids. Berkeley Database Information Index. c.f. GIIS, GRIS Active Joint Security groups: leading to common policies and procedures. Automate ticket routing between GOCs.
31
Software in OSG (installed by VDT)
Job Management Condor (including Condor-G & Condor-C) Globus GRAM Data Management GridFTP (data transfer) RLS (replication location) DRM (storage management) Globus RFT Information Services Globus MDS GLUE schema & providers Security VOMS (VO membership) GUMS (local authorization) mkgridmap (local authorization) MyProxy (proxy management) GSI SSH CA CRL updater Accounting OSG Gratia Monitoring MonaLISA gLite CEMon Client tools Virtual Data System SRM clients (V1 and V2) UberFTP (GridFTP client) Developer Tools PyGlobus PyGridWare Testing NMI Build & Test VDT Tests Support Apache Tomcat MySQL (with MyODBC) Non-standard Perl modules Wget Squid Logrotate Configuration Scripts
32
OSG based CMS-Tier2 @ Seoul Supercomputer Center (SSCC)
33
CMS Tier 2 requirement (OSG)
Network: 2-10Gbps Gbps intranet 2 Gbps out bound CPU: 1 M SI2K ~1000 CPU Storage: 200TB dCache system OSG middle ware CE, SE Batch system Condor + PBS CMS softwares CMSSW et al. at $OSG_APP None of Korean institutions have this amount of facilities for CMS Tier2 %KISTI ALICE Tier 2
34
Seoul SuperComputer Center
Fig PC cluster & 64TB storage for CMS Tier2 at University of Seoul Seoul SuperComputer Center SSCC (Seoul Supercomputer Center), established in 2003 with a funding of ~$1M$ Upgrade 2007: funding of ~$0.2M$ Total of 256 CPUs + Giga switches + KOREN2 2007 upgrade + 10Giga bps switch SE: Storage of 120TB ~ 400 HDD of 300GB CE: 128 CPUs MC generation + new 64bit HPC + KREONET Operate OSG
35
J.W. Park G.R. Hahn M.K. Choi Y.S. Kim
Center organization Spokesperson, Director 3 Ph.D. researchers 4 admins/operators, 2 application managers, 2 staffs Deputy spokesperson Director Prof. Hyunsoo Min Prof. Inkyu Park System Software Web User support J.W. Park G.R. Hahn M.K. Choi Y.S. Kim
36
Condor Computing pool(+120 CPUs)
CMS TIER2 TIER3 setup SSCC SPCC dCache pool (200TB) KREONET (GLORIAD) KOREN (APII, TEIN) 64bit cluster (+ 100CPUs) Nortel Passport 8800(Gb) 2ea Extream BlackDiamond 8810(10Gb/Gb) Extream BlackDiamond 8810(10Gb/Gb) 1-2 Gbps Foundry BigIron16(Gb) 2ea Nortel Passport 8800(Gb) Condor Computing pool(+120 CPUs) D-Link L3 Switch(Gb) Gate, Web, Condor-G dCache/gFTP, Ganglia 120 TB storage dCache 0.1M SI2K 2 Gbps network OSG 20Gbps 64bit 3GHz CPU 64 machines 32bit 2GHz CPU 32 machines 8TByte storage CMS-HI Tier 2 Analysis Tier 3
37
Tier 2 connection Tier0 (CERN) Tier1 (World) PIC Tier2 (USCMS) 3 1 2
KOREA TIER 1 PIC IFAE 3 Spain Italy UK USA France Germany Taiwan 1 Tier2 (USCMS) 2 SSCC Seoul 1 We hope, but we need Tier1 first Current approach! Geographical distance doesn’t really matter. SPCC Physics 2 3
38
Current Tier2 status
39
CE and SE status SE : currently 12TB CE : currently 102 CPUs
40
Documentation by Twiki
41
Network readiness
42
Thanks to this project…
Yamanaka (KEK, Japan) Seogon KANG ( UoS) Inkyu Park Jinwoo Park (UoS) JPARC E391a CMS-HI David d’Enterria ( CERN, Swiss ) Garam Han (UoS) Bolek Wyslouch ( MIT, USA )
43
Traceroute example
44
Between UoS and KEK Existing: KEKAD.JPKDDNETUoS KEKAPIIKORENUoS
20 hops: hop between 9 and 10 takes 40ms. KEKAPIIKORENUoS 14 hops : hop beween 4 and 5 takes 30ms, which is 90% of total delay time
45
Bandwidth test between UoS and KEK
100Mbps at KEK, while 1G in UoS About a gain of 1.3, but need a correct KOREN usage Need more info and works
46
Between UoS and CERN 170ms delay in both
We didn’t have time to correct this problem by the time of this review.
47
Between UoS and CERN Still unclear status
Somehow we couldn’t see TEIN2
48
National KOREN Bandwidth
Bandwidth between SSCC and KNU Bandwidth between SSCC and KU Bandwidth between SSCC and SKKU Iperf was used for the check of TCP/UDP performance Network 벤치마크 기관 KOREN 연동 속도 서울시립대-고려대학교 99Mbps 서울시립대-경북대학교 520Mbps 서울시립대-성균관대학교 100Mbps
49
National KOREN Bandwidth
NAME Number of connections(threads) at the same time UNIV_NAME W_SIZE TIME 1 10 20 30 40 50 60 70 KNU-128k-10s 53.9 506.0 520.0 KNU 128k KNU-128k-60s 51.8 510.0 KNU-512k-10s 58.6 515.0 521.0 512k KNU-512k-60s 52.3 514.0 522.0 KNU-2m-10s 60.4 503.0 528.0 2m KNU-2m-60s 52.4 511.0 523.0 KNU-8m-10s 59.9 399.0 490.0 8m KNU-8m-60s 53.6 367.0 KNU-16m-10s 42.6 218.0 16m KNU-16m-60s 36.4 232.0 KU-8m-10s 88.5 97.4 87.4 88.0 87.7 KU 8m 10 KU-8m-60s 87.0 87.9 88.1 82.2 60 KU-16m-10s 29.7 87.8 87.2 16m KU-16m-60s 76.6 SKKU-512k-10s 94.1 95.6 96.1 98.3 98.9 98.1 98.7 97.6 SKKU 512k 10 SKKU-512k-60s 94.3 94.7 94.9 60 SKKU-8m-10s 97.3 117.0 111.0 138.0 144.0 137.0 251.0 8m SKKU-8m-60s 96.5 97.9 102.0 106.0 109.0 SKKU-16m-10s 100.0 130.0 147.0 146.0 155.0 324.0 16m SKKU-16m-60s 95.2 108.0 103.0
50
Bandwidth results SSCC-KNU shows 500Mbps connection
500Mbps is our test machine maximum
51
Optimized APII and TEIN2
Maximun TEIN2 connection is 622Mbps AS559 - SWITCH Swiss Education and Research Network AS GEANT IP Service AS TEIN2 Trans-Eurasia Information Network AS Asia Pacific Advanced Network Korea (APAN-KR) APII connection is 10Gbps (uraken3.kek.jp = 1G) NAME-W_SIZE-S Number of threads(connections) at the same time NAME SIZE TIME 1 10 20 30 40 50 60 70 CERN-512k-10s 7.9 30.0 32.2 39.9 CERN 512k 10 CERN-512k-60s 7.7 57.4 79.1 83.6 77.2 67.8 70.8 62.8 60 CERN-8m-10s 5.9 78.8 112.0 119.0 92.5 8m CERN-8m-60s 47.5 88.2 95.0 101.0 98.8 103.0 91.4 CERN-16m-10s 20.0 96.9 130.0 16m CERN-16m-60s 69.8 92.0 109.0 106.0 118.0 CERNNF-8m-Hs 141.0 431.0 429.0 446.0 CERNNF H CERNNF-512k-Hs 113.0 193.0 340.0 442.0 KEK-512k-10s 42.6 274.0 346.0 356.0 KEK KEK-512k-60s 43.6 398.0 478.0 495.0 473.0
52
Results Network to both institutions has been optimized, and shows 500Gbps
53
Final network map
54
Remarks & Summary
55
Brief history so far, now, and tomorrow
2006 summer: visit CERN, work with CMSSW to 0.8.0, implement libraries. Work with HIROOT too 2006 fall: CMS-KR Heavy-Ion team was formed Mainly work in reconstruction software (Jet, muon) 2007 winter: Our team visited MIT. OSG installed, dCache tested, Monitoring system tested. 2007 spring: Upgrade for SSCC, ~$0.2M Not enough to be a standard CMS Tier2, but good for a physics program, CMS-HI 2007 summer: Tier2 in test operation, visit CERN 1 graduate student will stay at CERN 2007 winter: Full size CMS-HI tier2 are being built Starting from 2008, MOST will support a Tier2 center
56
Remarks The only solution for LHC/CMS Computing is Grid.
HEP again leads the next computing technology, as it did in WWW. LCG(EGEE) and OSG will be the ones! Expect lots of industrial by-products SSCC at Univ. of Seoul starts CMS-Tier2 based on OSG Due to its limited resource, we only run CMS-HI Tier2 for now. Plugged in to US-CMS TIER1 for now. We should not loose this opportunity if we want to lead IT & Science. We need to do Korea Tier2 or Tier1, now.
57
Summary Seoul SuperComputing Centre (SSCC) becomes an OSG based CMS Tier2 centre CE :102 CPUs 200CPUs SE: 12 TB 140TB Network to CERN and KEK via APII and TEIN2 has been optimized UoS-KEK : 500Mbps UoS-CERN: 500Mbps Everything went smoothly. Further upgraded needed soon. OSG, LCG Tier2 center needs a connection of 2Gbps – 10Gbps Further KOREN /KREONET support is important An official launching of CMS Tier2 are coming MOST will launch a program to support a CMS Tier2 center Many thanks to our HEP and HIP communities.
58
Finale! OLYMPIC 2008
59
Supplementary Slides
60
BC 5c: Atom Korea CMS-HI uses the Open Science Grid (OSG) to provide a shared infrastructure in Korea to contribute to the WLCG. Mostly US Tier-1 and all US Tier-2s are part of the OSG. Integration with and interfacing to the WLCG is achieved through participation in many management, operational and technical activities. In 2006 OSG has effectively contributed to CSA06 and CMS simulation production. In 2007 OSG plans are to improve the reliability and scalability of the infrastructure to meet LHC needs, as well as add and support needed additional services, sites and users.
61
Web-Based Monitoring
62
Web-Based Monitoring: home
tools for remote status display easy to use, flexible, interactive work with firewall and with security
63
Web-Based Monitoring : page1
Run info and overall detector status can be seen
64
Web-Based Monitoring : Run summary
Query simple query sophisticate query
65
Web-Based Monitoring By clicking a specific link, you can access more elaborated info
66
CMS computing bottom line
Fast reconstruction codes Streamed Primary Datasets Distribution of Raw and Reconstructed data Compact data formats Effective and efficient production reprocessing and bookkeeping systems
67
중성자의 발견 The event display and data quality monitoring visualisation systems are especially crucial for commissioning CMS in the imminent CMS physics run at the LHC. They have already proved invaluable for the CMS magnet test and cosmic challenge. We describe how these systems are used to navigate and filter the immense amounts of complex event data from the CMS detector and prepare clear and flexible views of the salient features to the shift crews and offline users. These allow shift staff and experts to navigate from a top-level general view to very specific monitoring elements in real time to help validate data quality and ascertain causes of problems. We describe how events may be accessed in the higher level trigger filter farm, at the CERN Tier-0 centre, and in offsite centres to help ensure good data quality at all points in the data processing workflow. Emphasis has been placed on deployment issues in order to ensure that experts and general users may use the visuslisation systems at CERN, in remote operations and monitoring centers offsite, and from their own desktops. 67 담당교수: 박인규
68
쿼크모델과 양자 색소역학 CMS offline software suite uses a layered approach to provide several different environments suitable for a wide range of analysis styles. At the heart of all the environments is the ROOT-based event data model file format. The simplest environment uses "bare" ROOT to read files directly, without the use of any CMS-specific supporting libraries. This is useful for performing simple checks on a file or plotting simple distributions (such as the momentum distribution of tracks). The second environment supports use of the CMS framework's smart pointers that read data on demand, as well as automatic loading of the libraries holding the object interfaces. This environment fully supports interactive ROOT sessions in either CINT or PyROOT. The third environment combines ROOT's TSelector with the data access API of the full CMS framework, facilitating sharing of code between the ROOT environment and the full framework. The final environment is the full CMS framework that is used for all data production activities as well as full access to all data available on the Grid. By providing a layered approach to analysis environments, physicists can choose the environment that most closely matches their individual work style.
Similar presentations