Hongik Univ. Software Engineering Laboratory Jin Hyub Lee

Analysis of Integrated Renewable Energy Monitoring System Data using KNN for Pre-Processing
Hongik Univ. Software Engineering Laboratory Jin Hyub Lee Professor : Robert Youngchul Kim 안녕하세요, Analysis of Integrated Renewable Energy Monitoring System Data using KNN for Pre-Processing을 발표할 홍익대학교 소프트웨어공학 연구실에 이진협입니다. Hello, I’m Jinhyub Lee. I’m Studying Software Engineering at Hongik University. My topic is Analysis of Integrated Renewable Energy Monitoring System Data using KNN for Pre-Processing. In here, Pre-Processing is taking action before economic loss occur. Hongik Univ. Software Engineering Laboratory

Contents 1 Introduction 2 Renewable Energy Integrated Monitoring System 3 K-Nearest Neighbor Classification 4 Application of K-Nearest Neighbor with TensorFlow 5 Conclusion and Future Works 목차는 다음과 같습니다. Contents are this order. Hongik Univ. Software Engineering Laboratory

01 Introduction

Most of companies have Solar Power generation Monitoring System.
Introduction Most of companies have Solar Power generation Monitoring System. Most of companies have Solar Power generation Monitoring System. Power generation and other sensors can be checked in variety forms such as numbers, graphs, gauges. However, They can not fault prediction. Fault prediction is very important. If you take action after a system fault, it is already economically damaged. So, we need to take action before it gets damaged. 대부분의 회사들은 태양광 발전 모니터링 시스템을 갖고 있다. 수치, 그래프, 게이지 등의 다양한 형태로 발전량 및 다른 센서들의 확인을 가능하게 한다. 하지만 그들은 고장 예측은 없다. 고장 예측은 아주 중요하다. 만약 시스템이 고장 난 후에 조치를 취하면, 이미 경제적 손실을 입은 상태인 것이다. 그래서 우리는 데미지를 받기 전에 조치를 취해야 한다. However, They can not fault prediction. Fault prediction is very important. If you take action after a system fault, it is already economic damaged. So, we need to take action before it gets damaged.

Motivation - We are software engineering guys.
- We develop total monitoring system during 2 years for HS solar energy company. - We test this system to find and remove error. - In this time, we will solve the total monitoring system to predict which device will out of order before happening. We are software engineering guys. We develop total monitoring system during 2 years for HS solar energy company. We test this system to find and remove error. In this time, we will solve the total monitoring system to predict which device will out of order before happening. 우리는 소프트웨어 공학을 하는 사람들이다. 우리는 2년간 HS Solar Energy 회사를 위해 total monitoring system을 개발했다. 우리는 에러를 찾고 제거하기 위해 이 시스템을 테스트 했다. 이번에 우리는 전체 모니터링 시스템을 해결하여 어떤 장치가 고장 났는지 예측한다.

Renewable Energy Integrated Monitoring System
02

Renewable Energy Integrated Monitoring System
This is our Renewable Energy Integrated Monitoring System for HS Solar Energy company. It can show many data like today power generation, monthly power generation, modules temperature, graphs and so on. We used this system to store these data into BigData System during 2 years. This is our Renewable Energy Integrated Monitoring System. It can show many data like today power generation, monthly power generation, modules temperature, graphs and so on. We used this system to store these data for a year. We have different things with other monitoring systems. Our System was created by using a metamodel mechanism. 이 그림들은 우리가 구축한 재생에너지 통합 모니터링 시스템입니다. 이 시스템은 일일 발전량, 월별 발전량, 모듈 온도, 그래프 등을 보여줍니다. 우리는 이 시스템에 1년간 데이터를 축적해왔습니다. 이 시스템은 다른 모니터링 시스템들과 다른 점이 있습니다. 우리의 시스템은 메타모델 메커니즘이 적용되어 있습니다.

K-Nearest Neighbor Classification
03 K-Nearest Neighbor Classification

Why K-Nearest Neighbor?
K-Nearest Neighbor Algorithm is learning to labeled data as standard. After that, when new unlabeled data is come, KNN will determine that what label will be added to data. In this paper, there are 3 labels. Normal : Data close to y = x line. Abnormal : Too far from y = x line. Fault : Data close to x intercept or y intercept. - Why 3 labels? The number of data for each label is in order of Normal > Abnormal > Fault. We applied the KNN algorithm. Because when the number of Abnormal Data increase continuously, probability of failure can be increased. K-Nearest Neighbor Algorithm은 기준이 되는 labeled 데이터를 학습한다. 그 후, 새로운 unlabeled 데이터가 입력 되었을 때, 어떤 label을 추가할지 분류해 주는 알고리즘이다. 이 논문에서는 label 이 3가지가 있다. Normal : y = x 선에 가까운 데이터 Abnormal : 발전량 대비 경사 일사량 또는 경사 일사량 대비 발전량 수치가 y = x 기준에 너무 벗어난 데이터 Fault : 발전량 대비 경사 일사량 또는 경사 일사량 대비 발전량 수치가 0에 가까운 데이터 - label이 3개인 이유? 각 label 별 데이터 개수는 Normal > Abnormal > Fault 순으로 많다. Abnormal Data의 개수가 계속해서 늘어나면, 고장 날 확률이 높아 질 수 있다는 관점에서 KNN 알고리즘을 적용하였다.

y = x - If data is close to y = x line, we regard that it is normal data. It is because of data preprocessing. - Initial Data : 1. The measure of data is different It is difficult to compare the differences in numbers. Slope solar radiation y = x 선에 가까울수록 정상 데이터로 보는 이유? 데이터 전처리 초기 데이터 : 데이터 수치의 단위가 다름, 수치의 차이가 커 비교 어렵다. Power generation

Data Preprocessing between 0~1 by Normalization Data Before Normalization After Normalization Power Generation 0~80 0~1 Slope Solar Radiation 0~650 Normalization을 통해 데이터를 0~1 사이 값으로 전처리 따라서, 전처리 후에는 발전량 증가에 따라 1:1 비율로 경사 일사량도 증가한다. 즉, y = x 축에 가까울수록 데이터가 정확하다는 의미이다. After preprocessing, If power generation increase, slope solar radiation will increase, too. It means that if the data is closer to y = x line, data is more exact.

the most numerous label
K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Train the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

K-Nearest Neighbor 1st, Input the labeled data as a training set. 2nd, Learn the input data. 3rd, Input one row during the test set. 4th, Select the closest k data through Euclidean distance calculations. 5th, The label with the highest number of k data is put into the label of test data. 6th, Training the labeled test data. Finally, Repeat all these steps in the Test set. K-Nearest Neighbor the most numerous label k Data’s labels [yes] [no] new Data’s label k ≥ the number of data New data by Euclidean Distance Training set (Labeled Data) Learning BigData Power Generation Input Data Data To prevent damage, we used the KNN algorithm. This is the total process. first, input the labeled data as a training set. second, learn the input data. third, From the big data accumulated for a year, enter the power generation and sensor data as test set and input one of them. fourth, select the closest k data through Euclidean distance calculations. fifth, the label with the highest number of k data is put into the label of test data. sixth, training the labeled test data. finally, repeat all the rows in the Test set. 손해가 발생하기 전에 미리 처리하기 위해 저희는 KNN 알고리즘을 사용하였습니다. 먼저, 레이블된 데이터를 트레이닝 셋으로 입력합니다. 그 다음, 입력 데이터를 학습합니다. 셋째로, 1년간 축적된 빅데이터로부터 발전량과 센서 데이터를 테스트 셋으로 하여 그 중 한 줄을 입력합니다. 넷째로, 유클리드 거리 계산을 통해 가장 가까운 k개의 데이터를 선택합니다. 다섯째로, k 데이터 중 가장 많은 레이블을 테스트 데이터의 레이블로 지정합니다. 여섯째로, 테스트 레이블된 테스트 데이터를 학습합니다. 마지막으로, 테스트 셋의 모든 줄을 반복합니다. after preprocessing Test set (Unlabeled Data) Sensor Data one row at a time Label In this paper, we used 58,000 of test data. Stored data for a year

Correlation graph between Power Generation & Slope solar radiation
Training Data 1:3 Abnormal Data Correlation graph between Power Generation & Slope solar radiation Power Generation Slope Solar Radiation 3:1 This is the first set of training we have tried. We created this training set because we needed more specific data as the base data for KNN classification. The rate between Power Generation and Slope Solar Radiation is 1 to 3. 좌측 표는 저희가 시도했던 첫 번째 트레이닝 데이터이고, 우측에는 이에 대한 그래프 입니다. 우리는 새로운 데이터가 정상 데이터인지, 비정상 데이터인지, 고장 데이터인지 분류하기 위해 명확한 기준이 되는 트레이닝 데이터가 필요했습니다. 그래서 트레이닝 데이터를 직접 생성하였습니다. 이 트레이닝 데이터의 발전량 대비 경사 일사량의 비율은 1:3 입니다. This is the first training set we have tried. We created this training set because we needed more specific data as the base data for KNN classification. The rate of abnormal data between Power Generation and Slope Solar Radiation is 1 to 3.

Correlation graph between Power Generation & Slope solar radiation
Training Data Correlation graph between Power Generation & Slope solar radiation Power Generation Slope Solar Radiation 4:1 1:4 We tried the second training data. The ratio is 1 to 4. Unlike the previous these points were excluded. This is because the distance from normal data is too close to distinguish between normal data and abnormal data. 이 데이터는 두 번째로 시도했던 트레이닝 데이터 입니다. 비율은 1:4 입니다. 이전과는 달리 (0.025, 0.1), (0.1, 0.025) 지점은 제외시켰습니다. 이는 정상데이터와의 거리가 너무 가까워 정상 데이터와 비정상 데이터의 구분을 어렵게 하기 때문입니다. This is the second set of training we have tried. The rate of abnormal data between Power Generation and Inclined solar radiation is 1 to 4.

Application of K-Nearest Neighbor Algorithm with TensorFlow
04

Application of K-Nearest Neighbor Algorithm with Tensorflow
This graph is the result of learning the training set that was created first, inputting and learning the test data. Overall, the classification was good, but the parts that should have been classified as normal between (0, 0) and (0.3, 0.3) were classified as abnormal. 이 그래프는 처음 생성했던 트레이닝 셋을 학습한 후, 테스트 데이터를 입력하고 학습한 결과 입니다. 전체적으로 보면 분류가 잘되었지만, (0, 0)과 (0.3, 0.3) 사이에 정상데이터로 분류되었어야 했던 부분들이 비정상으로 분류 되었습니다. This is result of applying the first training set. In this graph, the parts that should appear normal data between (0, 0), (0.3, 0.3) are abnormal data. So, We tried a different rate of training set.

Application of K-Nearest Neighbor Algorithm with Tensorflow
So I created a second set of training at a different rate. This graph is the result graph for that. You can see that the classification is better overall. Normal data is also well classified between (0, 0) and (0.3, 0.3). But there are still a lot of abnormal data (0.2, 0.2). 그래서 다른 비율의 두 번째 트레이닝 셋을 생성했습니다. 이 그래프는 그에 대한 결과 그래프입니다. 전체적으로 분류가 더 잘 된 것을 알 수 있습니다. (0, 0)에서 (0.3, 0.3) 사이에 정상 데이터도 대체로 잘 분류 되었습니다. 하지만 아직 (0.2, 0.2) 부분의 비정상 데이터가 조금 많습니다. This is result of applying the second training set. In this graph, We can see that the overall classification is better than before. Red points are fault data. Yellow points are abnormal data. Green points are normal data.

Conclusion and Future Works
05 Conclusion and Future Works

Conclusion and Future Works
Many solar power generation companies operate a monitoring system, but they have problem that fault prediction is not done. In this paper, we are applied KNN algorithm for fault prediction We proposed prediction algorithm that use KNN algorithm in solar generation monitoring system. The result that we analyzed stored data during 2 years, if the number of abnormal data is increased continuously, probability of fault is getting higher. With this in mind, We used K-nearest Neighbor Algorithm to classify the solar power generation and the slope radiation data into Normal, Abnormal, and Fault. To create accurate classification standard, we directly generated and learned training data. As a result, we classified all unlabeled data to normal, abnormal and fault automatically. In the future, we will study how to classify more exactly, and develop a system that can predicts and actually takes action before the fault occur using the classified data. 많은 태양광발전 회사들이 모니터링 시스템을 운영하지만, 고장 예측은 이루어지지 않는 문제가 있다. 본 논문에서 태양광 발전 모니터링의 고장 예측을 위해서, KNN 알고리즘을 적용하였다. we proposed prediction algorithm that use KNN algorithm in solar generation monitoring system. prediction algorithm 우리가 지난 2년간 저장된 데이터를 분석해본 결과 abnormal data의 개수가 계속해서 증가하면, 고장이 발생할 확률이 높아진다. 이점을 착안하여, K-Nearest Neighbor Algorithm을 사용하여 발전량과 경사 일사량 데이터를 Normal, Abnormal, Fault로 분류하였다. 정확한 분류 기준을 만들기 위해, Training Data를 직접 생성하여 training 하였다. 결론적으로, 모든 unlabed 데이터를 자동으로 정상, 비정상, 고장 데이터로 구분하였다. In the future, we will study how to classify more exact, and use the classified data to implement a system that can predicts and actually takes action before the fault occur. 분류 결과, 결과 그래프의 (0.2, 0.2)부분에서 디테일한 분류가 부족했다. 그 이유는 (0,0) 주위의 분류 기준이 서로 너무 가까워 분류가 까다로웠기 때문으로 생각한다. Future Works

Thank you

Hongik Univ. Software Engineering Laboratory Jin Hyub Lee

Similar presentations

Presentation on theme: "Hongik Univ. Software Engineering Laboratory Jin Hyub Lee"— Presentation transcript:

Similar presentations

About project

지원

로그인

Auth with social network:

Hongik Univ. Software Engineering Laboratory Jin Hyub Lee

Similar presentations

Presentation on theme: "Hongik Univ. Software Engineering Laboratory Jin Hyub Lee"— Presentation transcript:

Similar presentations

About project

지원