Message Driven Architecture for Massive Service Elastic Scalability, High Availability 2011.11.18 박혜웅
Massive Service Think different
No good solution for all cases Bad 디자인이 이쁘다. 귀가 무겁다. 선이 없어어 편하다. 가끔 끊긴다. 겨울에 귀가 따뜻하다 여름에 귀에 땀이 난다.
Cloud Architecture Elastic Scalability (기민한 확장성) 시스템 부하에 따라 빠르게 확장,축소할 수 있어야 한다. 부하의 종류에 따라 확장할 수 있는 아키텍쳐가 필요하다. High Availability (고가용성) 가용성이 99%와 99.999%는 매우 큰 차이이다. Availability = 서비스 가능 시간 / 전체 시간 99.999% (무중단 시스템) downtime: 26초/월 (약5분/년) 원자력 발전소 서비스 정기 점검도 장애시간(downtime)에 포함됨. Single Point Of Failure 을 제거하는 것이 중요. Automatic Resource Management (자동 리소스 관리) Resources: CPU, MEM, Disk... Self-healing (자동 복구/치료) "클라우드 컴퓨팅 구현 기술(김형준 외)"의 p66에서 발췌
What we need for Massive Service? coupled vs decoupled architecture decoupled architecture distributed data cache distributed message queue systems for removing SPOF for All System distributed coordinator for Load balancer health-checking script for RDBMS/NoSQL Hadoop/HBase dual namenode (next version, 0.23) MySQL cluster or MySQL replication( + heartbeat) or MySQL multiple-master blocking vs non-blocking (synchronous vs asynchronous) blocking(synchronous): easy coding, big resources non-blocking(asynchronous): hard coding, small resources multi-thread(single-port) vs single-thread(multi-port) advantage of single thread cheap server No locking, No Synchronization easy to coding
What we need for Massive Service? low cost money hardware based vs software based commercial software vs free software time: development & debugging management human resouces performance tunning Linux options ulimit, ... JVM options Xms, Xmx, GC option the number of processes, threads (each system) stress test socket options TCP_NODELAY, SEND/RECV_BUFFERSIZE... RDBMS/NoSQL options
What we need for 칼퇴근? experts for each technical area = DRI(Directly Responsible Individual in Apple Inc.) coding & interface code convention, design pattern, UML DB & storage RDBMS(MySQL, MyBatis) NoSQL(Hbase) storage(DAS, NAS, HDFS, Haystack ...) network & threading Java NIO, Netty data analysis MapReduce, machine learning distributed system software coordinator(Zookeeper) cache server(Redis, Memcached, Ehcache) queue server(RabbitMQ, ZeroMQ) util software Google Protocol Buffer, Guice, Log4j, Slf4, Xstream, Jackson, Java mail, .... system management Linux, monitoring tools, JMX hardware L4 switch
What we need for 칼퇴근? fast & easy development/debugging good architecture system architecture design pattern code convention common util classes Apache Commons, Google Guava,... Test Driven Development (TDD) JUnit well-known system or not? RDBMS vs NoSQL JSON vs Google Protocol Buffer JUnit vs Guice easy management logging system logging, collecting, parsing, log visualization JMX Admin/Monitoring tools or web pages
many Kinds of Decoupling decoupling(removing) of SPOF and our system Distributed Coordinator decoupling of business logic and data Distributed Cache decoupling of function and control(message) Message Queue process process SPOF process SPOF Coordinator SPOF process process logic data process logic Cache data DB DB process logic data process logic process function process function Queue message process function
the steps of Decoupling (step1) Distributed Coordinator registry: important data (small size) server status server configuration common data removing SPOF from our system Coordinator registry process function data registry process function data DB DB process function data registry process function data
the steps of Decoupling (step2) Distributed Data Cache fast read/write in memory 10~100times faster than DB query. alleviate DB overload read query: read cache instead of DB. write query: lazy update for DB with write-through queue. remove duplicated data remove overhead of data synchronization among processes. fault tolerant system no matter what process terminated in the same cluster. Coordinator registry Coordinator registry process function data process function Cache data cluster DB data process function process function data process function DB data
the steps of Decoupling (step3) Distributed Message Queue scale out (elastic scalibility) auto scaling by fan-out exchange rule. light-weight processes(daemons). fault tolerant system when all process terminated, message queue server preserves messages. prevent server overload or failure. but lazy processing system monitoring just monitor queue status. Coordinator registry Coordinator registry process function process function Queue message process function Cache data Cache data cluster process function process function process function Queue message process function DB data DB data
Scale Out cluster cluster cluster cluster cluster task #1 work queue Coordinator registry node Cache data node cluster node Cache data cluster cluster node process function node node task #1 message process function Queue message process function message work queue Queue message n connections task #2 node DB data node
SEDA vs Message Driven Architecture process SEDA data/heap area global variable thread function Queue event thread function thread data thread function Queue event thread function DB data service MDA node Coordinator registry node node node process function Queue message process function node Cache data node node process function Queue message process function DB data
code of Message Driven Architecture simple chatting service (simple client-server based model vs MDA) /** Simple Client-Server Model **/ /* Send Thread */ myInfo = xml.getInfo(xmlFile); // from local file db.setAlive(myInfo); // updates server status servers = connectAll(relayServers);//connects to other servers. while( (input=client.getInput()) !=null ){ roomInfo = localData.getRoomInfo(client.userId); for( userId: roomInfo.getUserIds() ){ for( server : servers ){ if( server.hasUser(userId) ) server.send(userId, input); } /* Receive Thread */ while(true){ message = socket.receive(); // from other server user = localData.getUser(message.userId); //from local client = getClient(message.userId); client.send(user.name + ":" + message.input); /** Message Driven Architecture **/ /* Send Thread (Process) */ myInfo = Zookeeper.getInfo(zookeeperList, myIp, myPort); Zookeeper.setAlive(myInfo); queue = Queue.getQueue(myInfo.queue); cache = Cache.getCache(myInfo.cache); while( (input=client.getInput()) !=null ){ roomInfo = cache.getRoomInfo(client.userId); for( userId : cache.getUserIds(roomInfo.no) ){ queue.publish(new Message(userId, input)); } /* Receive Thread (Process) */ while(true){ message = queue.consume(); // from queue user = cache.getUser(message.userId); // from cache client = getClient(message.userId); client.send(user.name + ":" + message.input); inter-server networking (p2p) queueing/dequeuing (work queue)
shared data scheme (cache) Summary 개발자 관점 Client-Server Based Message Driven 시스템/역할 분담 서비스별 기능별 (e.g. API, file, DB, logging, ....) 개인 전문성 비지니스 로직(서비스 흐름) 기술적 지식 서비스 개발 개인별 협업 선행 개발 문서(필수) 없어도 개발 시작 가능 process간 연동 문서 필요 inter-process interface (queue) shared data scheme (cache) 팀내 의사소통 약함 (개인별 프로젝트 진행) 긴밀 (한 서비스를 위해 구성원 대부분의 협의 필요) 타부서와 협의(PM) 모든 개발자 일부 담당자 기획/마케팅팀 디자인팀 클라이언트팀 PM (service manager) API Part inter-process interface (queue) Logic Part inter-process interface (queue) DB Part shared data scheme (cache)
Summary 시스템 관점 Client-Server Based Message Driven 서버간 복잡도 매우 복잡 (서버끼리 모두 연결 필요) 덜 복잡 (coordinator, cache, queue에만 연결) 확장성/효율성 낮음 (불필요한 로직도 구동) 높음 (간단한 로직의 process만 구동) 서버 업데이트 어려움 (전체 패치만 가능) 쉬움 (queue서버가 임시로 task 저장가능) (상위 버전용 process 미리 구동 가능) 서비스 장애 서비스 단위 장애 부분 장애 (로직의 크기에 따라 다름) 프로토콜 수정 쉬움 (함수 재정의) 어려움 (message scheme를 공유해야 함) 서버상태/로깅 서비스별 (개인별) 중앙식 (queue 서버만 모니터링/로깅하면 됨) 비즈니스 로직 모든 비즈니스 로직 가능 loop또는 rollback이 필요한 비즈니스 로직 어려움. 코드 복잡도 복잡 단순 (간단한 로직 단위) 코딩 스타일 서비스별로 다름 기능별로 다름 Thread/Worker Model process내에 다양한 모델이 공존 process종류별로 다른 Thread모델 사용
Appendix Think deeply
Single-thread vs Multi-thread I/O intensive task (blocked task) Single-thread CPU/Mem intensive task (non-blocked task) process thread DB thread thread data process thread data process thread data MEM process thread data