Hongik Univ. Software Engineering Laboratory Jin Hyub Lee

Slides:

Advertisements

Similar presentations

★질문 1: Pai 가 원하는 것은 할아 버지의 ( ) 과 ( ) 이 다. ★질문 2: Maori 족 선조의 이름과 직업은 ?? ( ), ( ) Attention 관심 Love 사랑 이름 Paikea 직업 the whale rider.

Advertisements

영어의미론 단원 7 직시와 한정성 복습 발화 / 문장은 특정한 시간 및 장소와 관련되어 있는가 ? “A/The man from Dundee stole my wallet.” 라는 발화에서 화자는 청자가 그 사람을 아는 것으로 가정하는가 ? 담화세계는 부분적으로 허구일.

“ PPT WORLD PowerPoint template, you can become an expert. Your wishes for the successful presentation. Our company wishes to own a successful presentation.

☆ Vocabulary ☆ 단어 및 표현뜻 1. miserable 비참한 2. pimple 여드름 3. treat 치료하다 4. upset 화난 5. huge 거대한, 큰 6. prevent 예방하다 7. soap 비누 8. circular 원형의, 둥근 [Reading]

English at your school Korean - English. English at your school 수고했다 Well done. I was very impressed!

Copyright ⓒ 2011 by JIHOON KIM 제 6 회 나사렛대학교 총장배 어린이영어 경시대회 4~6 학년 ( 노래, 구연동화, 말하기 ) 2015 년 11 월 28 일 ( 토 ) 4~6 학년 ( 노래, 구연동화, 말하기 ) 2015 년 11 월 28 일 (

Oh Ji Ae Young-do Elementary School 다양한 활동을 통한 영어 학습자의 자율성 향상 방안 탐구 (Improving Learning Autonomy through Using Various Activities)

실험 8. Cyclic Voltammetry - 7 조 : 한지영, 이호연, 최은진, 최효린 -

일과 삶의 균형잡힌 조화를 통한 올바른 직업생활을 지원하는. CONTENTS 왜 인성교육인가 ? 01 인성교육 전문가 양성 교육과정 소개 02 인성교육 전문가 양성과정 특장점 / 기대효과 03.

김예슬 김원석 김세환. Info Northcutt Bikes Northcutt Bikes The Forecasting problem The Forecasting problem The solution 1~6 The.

Lesson 11 What’s Your Type? 여러분의 유형은 무엇인가요 ?. What job do you want to have in the future? 여러분은 미래에 어떤 직업을 갖고 싶은가 ? p.218.

Lesson 1 Joining a School Club 교내 동아리 가입하기 YBM.

A: Could you tell me how to make a call from this phone

1-1. How to Make a Strong First Impression vocabulary

ALL IN ONE WORKING HOLIDAY!

* 07/16/96 처음으로 배우는 C 프로그래밍 제1부 기초 제1장 시작하기 *.

Error Data Analysis of the Photovoltaic Energy Monitoring System Using the Prediction Interval for Multivariate Linear Regression SE Lab, Hongik University,

Fifth theme : Writing Class Superhero powers

Multiple features Linear Regression with multiple variables (다변량 선형회귀)

MIND STORM 창의적 공학 설계 FORKLIFT All in One!! 윤 호, 전유기, 이헌중, 주준성.

Evaluation of Green roofs from the Water-Energy-Food Nexus perspective

어떤 과정으로 쓰면 될까.

Hallasan Is Higher Then Jirisan

축산 인식개선을 위한 농협의 추진 사례 ( ) 농협중앙회 축산지원단장 박인희.

LISTEN AND UNDERSTAND LISTEN AND SING

7장 : 캐시와 메모리.

Refrigerative Engineering

EPS Based Motion Recognition algorithm Comparison

화학공장 사고예방을 위한 앗차사고 관리 및 사고 유발 요인 모델에 관한 연구

Fifth theme Superhero powers

제 14 장 거시경제학의 개관 PowerPoint® Slides by Can Erbil

1 도시차원의 쇠퇴실태와 경향 Trends and Features of Urban Decline in Korea

After You Read, Talk and Talk

숭실대학교 마이닝연구실 김완섭 2009년 2월 8일 아이디어 - 상관분석에 대한 연구

Semi-supervised Document classification (probabilistic model and EM)

EnglishCare 토.마.토. 토익 L/C 일상 어휘 ④ 강 사 : 김 태 윤.

Team no.13 Tech TonicS.

Open Class Lesson- L2B3 Greeting (5’ 00”) Word Like Daddy, Like Mommy

PCA Lecture 9 주성분 분석 (PCA)

The Best Thing I've Learned This Year

Write and say bye to friends,

7. Korea in the World One more step, DIY reading 영어 8-b단계

성문영어구문 pattern 관계대명사의 생 략.

9. Do you have a scientific mind?

Talk and talk Could you…? 영어 7-b

Course Guide - Algorithms and Practice -

McGraw-Hill Technology Education

9. Do You Have a Scientific Mind?

9. Four Seasons in One Day? 중학교 1학년 영어 7-b

Progress Seminar 신희안.

9. Do You Have a Scientific Mind?

Read and Think 영어 8-a단계 A Story of Two Seeds(3/8) [제작의도] [활용방법]

: 부정(negative)의 의미를 나타내는 접두사

CEO가 가져야 할 품질 혁신 마인드.

Speaking -두 번째 강의 (Part 1 실전테스트 1,2) RACHEL 선생님

9. Do You Have a Scientific Mind?

평생 간직할 멋진 말 Excellent thought applicable through our whole life

9. Do You Have a Scientific Mind?

점화와 응용 (Recurrence and Its Applications)

소프트웨어 종합설계 (Software Capstone Design)

The general form of 0-1 programming problem based on DNA computing

Ⓒ Copyright CARROT Global. All Rights Reserved.

Presentation by Timothy Kane

Speaking -첫 번째 강의 ( Part 1 유형별분석) RACHEL 선생님

빈칸에 알맞은 것을 [보기]에서 골라 문장을 완성하시오

Fifth theme Superhero powers

Progress Seminar 이준녕.

Ⓒ Copyright CARROT Global. All Rights Reserved.

Speaking -여섯 번째 강의 (Review ) RACHEL 선생님

Presentation transcript:

Analysis of Integrated Renewable Energy Monitoring System Data using KNN for Pre-Processing Hongik Univ. Software Engineering Laboratory Jin Hyub Lee Professor : Robert Youngchul Kim 2017.08.19 안녕하세요, Analysis of Integrated Renewable Energy Monitoring System Data using KNN for Pre-Processing을 발표할 홍익대학교 소프트웨어공학 연구실에 이진협입니다. Hello, I’m Jinhyub Lee. I’m Studying Software Engineering at Hongik University. My topic is Analysis of Integrated Renewable Energy Monitoring System Data using KNN for Pre-Processing. In here, Pre-Processing is taking action before economic loss occur. Hongik Univ. Software Engineering Laboratory

Contents 1 Introduction 2 Renewable Energy Integrated Monitoring System 3 K-Nearest Neighbor Classification 4 Application of K-Nearest Neighbor with TensorFlow 5 Conclusion and Future Works 목차는 다음과 같습니다. Contents are this order. Hongik Univ. Software Engineering Laboratory

01 Introduction

Most of companies have Solar Power generation Monitoring System. Introduction Most of companies have Solar Power generation Monitoring System. Most of companies have Solar Power generation Monitoring System. Power generation and other sensors can be checked in variety forms such as numbers, graphs, gauges. However, They can not fault prediction. Fault prediction is very important. If you take action after a system fault, it is already economically damaged. So, we need to take action before it gets damaged. 대부분의 회사들은 태양광 발전 모니터링 시스템을 갖고 있다. 수치, 그래프, 게이지 등의 다양한 형태로 발전량 및 다른 센서들의 확인을 가능하게 한다. 하지만 그들은 고장 예측은 없다. 고장 예측은 아주 중요하다. 만약 시스템이 고장 난 후에 조치를 취하면, 이미 경제적 손실을 입은 상태인 것이다. 그래서 우리는 데미지를 받기 전에 조치를 취해야 한다. However, They can not fault prediction. Fault prediction is very important. If you take action after a system fault, it is already economic damaged. So, we need to take action before it gets damaged.

Motivation - We are software engineering guys. - We develop total monitoring system during 2 years for HS solar energy company. - We test this system to find and remove error. - In this time, we will solve the total monitoring system to predict which device will out of order before happening. We are software engineering guys. We develop total monitoring system during 2 years for HS solar energy company. We test this system to find and remove error. In this time, we will solve the total monitoring system to predict which device will out of order before happening. 우리는 소프트웨어 공학을 하는 사람들이다. 우리는 2년간 HS Solar Energy 회사를 위해 total monitoring system을 개발했다. 우리는 에러를 찾고 제거하기 위해 이 시스템을 테스트 했다. 이번에 우리는 전체 모니터링 시스템을 해결하여 어떤 장치가 고장 났는지 예측한다.

Renewable Energy Integrated Monitoring System 02

Renewable Energy Integrated Monitoring System This is our Renewable Energy Integrated Monitoring System for HS Solar Energy company. It can show many data like today power generation, monthly power generation, modules temperature, graphs and so on. We used this system to store these data into BigData System during 2 years. This is our Renewable Energy Integrated Monitoring System. It can show many data like today power generation, monthly power generation, modules temperature, graphs and so on. We used this system to store these data for a year. We have different things with other monitoring systems. Our System was created by using a metamodel mechanism. 이 그림들은 우리가 구축한 재생에너지 통합 모니터링 시스템입니다. 이 시스템은 일일 발전량, 월별 발전량, 모듈 온도, 그래프 등을 보여줍니다. 우리는 이 시스템에 1년간 데이터를 축적해왔습니다. 이 시스템은 다른 모니터링 시스템들과 다른 점이 있습니다. 우리의 시스템은 메타모델 메커니즘이 적용되어 있습니다.

K-Nearest Neighbor Classification 03 K-Nearest Neighbor Classification

Why K-Nearest Neighbor? K-Nearest Neighbor Algorithm is learning to labeled data as standard. After that, when new unlabeled data is come, KNN will determine that what label will be added to data. In this paper, there are 3 labels. Normal : Data close to y = x line. Abnormal : Too far from y = x line. Fault : Data close to x intercept or y intercept. - Why 3 labels? The number of data for each label is in order of Normal > Abnormal > Fault. We applied the KNN algorithm. Because when the number of Abnormal Data increase continuously, probability of failure can be increased. K-Nearest Neighbor Algorithm은 기준이 되는 labeled 데이터를 학습한다. 그 후, 새로운 unlabeled 데이터가 입력 되었을 때, 어떤 label을 추가할지 분류해 주는 알고리즘이다. 이 논문에서는 label 이 3가지가 있다. Normal : y = x 선에 가까운 데이터 Abnormal : 발전량 대비 경사 일사량 또는 경사 일사량 대비 발전량 수치가 y = x 기준에 너무 벗어난 데이터 Fault : 발전량 대비 경사 일사량 또는 경사 일사량 대비 발전량 수치가 0에 가까운 데이터 - label이 3개인 이유? 각 label 별 데이터 개수는 Normal > Abnormal > Fault 순으로 많다. Abnormal Data의 개수가 계속해서 늘어나면, 고장 날 확률이 높아 질 수 있다는 관점에서 KNN 알고리즘을 적용하였다.

Why K-Nearest Neighbor? y = x - If data is close to y = x line, we regard that it is normal data. It is because of data preprocessing. - Initial Data : 1. The measure of data is different. 2. It is difficult to compare the differences in numbers. Slope solar radiation y = x 선에 가까울수록 정상 데이터로 보는 이유? 데이터 전처리 초기 데이터 : 데이터 수치의 단위가 다름, 수치의 차이가 커 비교 어렵다. Power generation

Why K-Nearest Neighbor? Data Preprocessing between 0~1 by Normalization Data Before Normalization After Normalization Power Generation 0~80 0~1 Slope Solar Radiation 0~650 Normalization을 통해 데이터를 0~1 사이 값으로 전처리 따라서, 전처리 후에는 발전량 증가에 따라 1:1 비율로 경사 일사량도 증가한다. 즉, y = x 축에 가까울수록 데이터가 정확하다는 의미이다. After preprocessing, If power generation increase, slope solar radiation will increase, too. It means that if the data is closer to y = x line, data is more exact.

the most numerous label K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

the most numerous label K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

the most numerous label K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

the most numerous label K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

the most numerous label K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

the most numerous label K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

the most numerous label K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Train the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

the most numerous label K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

Correlation graph between Power Generation & Slope solar radiation Training Data 1:3 Abnormal Data Correlation graph between Power Generation & Slope solar radiation Power Generation Slope Solar Radiation 3:1 This is the first set of training we have tried. We created this training set because we needed more specific data as the base data for KNN classification. The rate between Power Generation and Slope Solar Radiation is 1 to 3. 좌측 표는 저희가 시도했던 첫 번째 트레이닝 데이터이고, 우측에는 이에 대한 그래프 입니다. 우리는 새로운 데이터가 정상 데이터인지, 비정상 데이터인지, 고장 데이터인지 분류하기 위해 명확한 기준이 되는 트레이닝 데이터가 필요했습니다. 그래서 트레이닝 데이터를 직접 생성하였습니다. 이 트레이닝 데이터의 발전량 대비 경사 일사량의 비율은 1:3 입니다. This is the first training set we have tried. We created this training set because we needed more specific data as the base data for KNN classification. The rate of abnormal data between Power Generation and Slope Solar Radiation is 1 to 3.

Correlation graph between Power Generation & Slope solar radiation Training Data Correlation graph between Power Generation & Slope solar radiation Power Generation Slope Solar Radiation 4:1 1:4 We tried the second training data. The ratio is 1 to 4. Unlike the previous these points were excluded. This is because the distance from normal data is too close to distinguish between normal data and abnormal data. 이 데이터는 두 번째로 시도했던 트레이닝 데이터 입니다. 비율은 1:4 입니다. 이전과는 달리 (0.025, 0.1), (0.1, 0.025) 지점은 제외시켰습니다. 이는 정상데이터와의 거리가 너무 가까워 정상 데이터와 비정상 데이터의 구분을 어렵게 하기 때문입니다. This is the second set of training we have tried. The rate of abnormal data between Power Generation and Inclined solar radiation is 1 to 4.

Application of K-Nearest Neighbor Algorithm with TensorFlow 04

Application of K-Nearest Neighbor Algorithm with Tensorflow This graph is the result of learning the training set that was created first, inputting and learning the test data. Overall, the classification was good, but the parts that should have been classified as normal between (0, 0) and (0.3, 0.3) were classified as abnormal. 이 그래프는 처음 생성했던 트레이닝 셋을 학습한 후, 테스트 데이터를 입력하고 학습한 결과 입니다. 전체적으로 보면 분류가 잘되었지만, (0, 0)과 (0.3, 0.3) 사이에 정상데이터로 분류되었어야 했던 부분들이 비정상으로 분류 되었습니다. This is result of applying the first training set. In this graph, the parts that should appear normal data between (0, 0), (0.3, 0.3) are abnormal data. So, We tried a different rate of training set.

Application of K-Nearest Neighbor Algorithm with Tensorflow So I created a second set of training at a different rate. This graph is the result graph for that. You can see that the classification is better overall. Normal data is also well classified between (0, 0) and (0.3, 0.3). But there are still a lot of abnormal data (0.2, 0.2). 그래서 다른 비율의 두 번째 트레이닝 셋을 생성했습니다. 이 그래프는 그에 대한 결과 그래프입니다. 전체적으로 분류가 더 잘 된 것을 알 수 있습니다. (0, 0)에서 (0.3, 0.3) 사이에 정상 데이터도 대체로 잘 분류 되었습니다. 하지만 아직 (0.2, 0.2) 부분의 비정상 데이터가 조금 많습니다. This is result of applying the second training set. In this graph, We can see that the overall classification is better than before. Red points are fault data. Yellow points are abnormal data. Green points are normal data.

Conclusion and Future Works 05 Conclusion and Future Works

Conclusion and Future Works Many solar power generation companies operate a monitoring system, but they have problem that fault prediction is not done. In this paper, we are applied KNN algorithm for fault prediction We proposed prediction algorithm that use KNN algorithm in solar generation monitoring system. The result that we analyzed stored data during 2 years, if the number of abnormal data is increased continuously, probability of fault is getting higher. With this in mind, We used K-nearest Neighbor Algorithm to classify the solar power generation and the slope radiation data into Normal, Abnormal, and Fault. To create accurate classification standard, we directly generated and learned training data. As a result, we classified all unlabeled data to normal, abnormal and fault automatically. In the future, we will study how to classify more exactly, and develop a system that can predicts and actually takes action before the fault occur using the classified data. 많은 태양광발전 회사들이 모니터링 시스템을 운영하지만, 고장 예측은 이루어지지 않는 문제가 있다. 본 논문에서 태양광 발전 모니터링의 고장 예측을 위해서, KNN 알고리즘을 적용하였다. we proposed prediction algorithm that use KNN algorithm in solar generation monitoring system. prediction algorithm 우리가 지난 2년간 저장된 데이터를 분석해본 결과 abnormal data의 개수가 계속해서 증가하면, 고장이 발생할 확률이 높아진다. 이점을 착안하여, K-Nearest Neighbor Algorithm을 사용하여 발전량과 경사 일사량 데이터를 Normal, Abnormal, Fault로 분류하였다. 정확한 분류 기준을 만들기 위해, Training Data를 직접 생성하여 training 하였다. 결론적으로, 모든 unlabed 데이터를 자동으로 정상, 비정상, 고장 데이터로 구분하였다. In the future, we will study how to classify more exact, and use the classified data to implement a system that can predicts and actually takes action before the fault occur. 분류 결과, 결과 그래프의 (0.2, 0.2)부분에서 디테일한 분류가 부족했다. 그 이유는 (0,0) 주위의 분류 기준이 서로 너무 가까워 분류가 까다로웠기 때문으로 생각한다. Future Works

Q & A

Thank you