您现在的位置是:首页 >其他 >使用opencv进行场景识别网站首页其他

使用opencv进行场景识别

CaCu999 2023-06-01 16:00:02
简介使用opencv进行场景识别

opencv场景识别

一、需求

1、现状

​ 现在有一个ppt插入软件,可以根据图片的名字,将对应的图片插入到ppt中去。

​ 但是,既然都简化了,拍完的照片还需要挑选然后重命名,实际上也是一个麻烦且琐碎的活,于是试图简化这一步骤。

2、设想

​ 将拍好的图片分类重命名

二、模型使用

1、opencv dnn支持的功能

​ Opencv Dnn可以对图像和视频完成基于深度学习的计算机视觉推理。opencv DNN支持的功能:

  • 图像分类
  • 目标检测
  • 图像分割
  • 文字检测和识别
  • 姿态估计
  • 深度估计
  • 人脸验证和检测
  • 人体重新识别

2、ANN_MLP相关知识

  1. 介绍

    人工神经网络——多层感知器

    先创建一个model,所有的权重都设置为0。

    网络训练层使用一层输入和输出的向量,训练过程可以重复重复不止一次,权重可以基于新的训练数据调整权重

  2. 一些函数

    • create():创建一个model

    • setLayerSizes():指定每层中神经元的数量,包括输入和输出层。第一个元素指定输入图层中的元素数。最后一个元素-输出图层中的元素数

      例子的数量要和标签的数量相同。第一项为图片的像素数,最后一项 为训练的种类数

       Mat layerSizes = (Mat_<int>(1, 4) << image_rows*image_cols, int(image_rows*image_cols / 2), int(image_rows*image_cols / 2), class_num);
       bp->setLayerSizes(layerSizes);
      
    • setActivationFunction():设置激活函数

      setActivationFunction(int type, double param1 = 0, double param2 = 0)

      type含义
      IDENTITY f ( x ) = x f(x) = x f(x)=x
      SIGMOID_SYM f ( x ) = β ∗ 1 − e α x 1 + e − α x f(x) = eta * frac{1 - e^{alpha x}}{1+e^{-alpha x}} f(x)=β1+eαx1eαx
      GAUSSIAN f ( x ) = β e − α x ∗ x f(x) = eta e^{-alpha x*x} f(x)=βeαxx
      RELU f ( x ) = m a x ( 0 , x ) f(x) = max(0,x) f(x)=max(0,x)
      LEAKYRELU对于x>0, f ( x ) = x f(x) = x f(x)=x;对于x<0, f ( x ) = α x f(x) = alpha x f(x)=αx
      bp->setActivationFunction(ANN_MLP::SIGMOID_SYM, 1, 1);
      
    • setTrainMethod()

      训练方式/传播方式

      cv::ml::ANN_MLP::setTrainMethod(int method,double param1 = 0,double param2 = 0 )

      method函数
      BACKPROP反向传播算法
      RPROP弹性反向传播
      ANNEAL模拟退火算法
    • setTermCriteria

      设置迭代算法终止的判断条件

      TermCriteria:迭代算法终止的判断条件

      class CV_EXPORTS TermCriteria {
      public:
          enum Type{
              COUNT = 1;
              MAX_ITER=COUNT;
              EPS=2;
          }
          TermCriteria(int type, int maxCount, double epsilon);
          int type;
          int maxCount;
          double epsilon;
      }
      

      类变量有三个参数:类型、迭代的 最大次数、特定的阈值

    • 加载分类器

      Ptr<ANN_MLP> bp = StatModel::load<ANN_MLP>("*.xml");
      Ptr<ANN_MLP> bp = ANN_MLP::load<ANN_MLP>("*.xml");
      Ptr<ANN_MLP> bp = Algorithm::load<ANN_MLP>("*.xml");
      
    • predict函数

      float predict( InputArray samples, OutputArray results=noArray(), int flags=0 )

      最终的预测结果 应该是第二个参数OutputArray results

3、图像分类模型训练学习

​ 没试过相关的模型训练,先尝试一下,参考:https://blog.csdn.net/qq_15985873/article/details/125087166

#include <iostream>
#include <opencv2/opencv.hpp>
#include <vector>
using namespace std;
using namespace cv;
using namespace ml;

#define ROW 45
#define COL 30
#define TYPE_NUM 5
#define SUM 100
// string fruits[5] = {"apple", "cabbage", "carrot", "cucumber", "pear"};
string fruits[5] = {"apple", "cabbage", "carrot", "banana", "pear"};
Mat Label = Mat::eye(5, 5, CV_32FC1);

void fruitAnnPrePro(Mat &src, Mat &dst) {
    src.convertTo(src, CV_32FC1);
    resize(src, src, Size(ROW, COL));
    dst.push_back(src.reshape(0,1));
}

void trainChar(string &root_path, string &model_path) {
    vector<string> dir_path;
    for(string s : fruits)
        dir_path.push_back(root_path + s +"\");
    Mat trainDataMat,trainLabelMat;
    
    vector<vector<string>> files(TYPE_NUM, vector<string>());
    for(int i = 0; i < dir_path.size(); i++) {
        glob(dir_path[i], files[i], false);
    }
    for(int i = 0; i < dir_path.size(); i++) {
        printf("%d  size   %d
", i, files[i].size() );
    }
    for(int i = 0; i < files.size(); i++) {
        for(int j = 0; j < files[i].size(); j++) {
            Mat src = imread(files[i][j], IMREAD_GRAYSCALE);
            if(src.empty()){
                break;
            }
            fruitAnnPrePro(src, trainDataMat);
            trainLabelMat.push_back(Label.row(i));
        }
    }
    imshow("trainData", trainDataMat);
    waitKey(0);
    cout << trainDataMat.size() << endl;
    cout << trainLabelMat.size() << endl;
    Ptr<ANN_MLP> model = ANN_MLP::create();

    Mat labelSizes = (Mat_<int>(1,5) << 1350, 1350, 1350, 10, TYPE_NUM);
    model->setLayerSizes(labelSizes);
    model->setActivationFunction(ANN_MLP::SIGMOID_SYM, 1, 1);
    model->setTermCriteria(TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 300,0.95));
    model->setTrainMethod(ANN_MLP::BACKPROP, 0, 1);

    Ptr<TrainData> trainData = TrainData::create(trainDataMat, ROW_SAMPLE, trainLabelMat);
    printf("开始训练!
");
    model->train(trainData);

    // //保存模型
    model->save(model_path);
    printf("+++++++++++++++++++++++++++训练完成+++++++++++++++++++++++++++

");
}

void useModel(string imgPath, string &model_path) {
    Mat src = imread(imgPath, IMREAD_GRAYSCALE);
    Mat testDataMat;
    fruitAnnPrePro(src, testDataMat);

    Ptr<ANN_MLP> model = ANN_MLP::create();
    model = Algorithm::load<ANN_MLP>(model_path);
    Mat res;
    double maxVal;
    Point maxLoc;
    for(int i = 0; i < testDataMat.rows; i++) {
        model->predict(testDataMat.row(i), res);
        cout << format(res, Formatter::FMT_NUMPY) << endl;
        minMaxLoc(res, NULL, &maxVal, NULL, &maxLoc);
        printf("测试结果:%s 置信度: %f% 
", fruits[maxLoc.x].c_str(), maxVal * 100);
    }
}

int main() {
    // string rootpath = "cpp\opencv\fruitClassify\img\";
    string rootpath = "C:\Users\CaCu999\Pictures\img\";
    string modelpath = "cpp\opencv\fruitClassify\model\fruitModel.xml";
    trainChar(rootpath, modelpath);
    // string imgpath = "cpp\opencv\fruitClassify\img\apple\r1_304.jpg";
    string imgpath = "C:\Users\CaCu999\Pictures\img\banana\10.jpg";
    useModel(imgpath, modelpath);
    return 0;
}

4、目标检测模型

  1. 知识了解

    仔细想想,我需要的不是图像的分类,而是内部的物体的识别。所以需要的是目标检测

    https://blog.csdn.net/virobotics/article/details/124008160

  2. 物体识别的概念

    需要解决的问题是目标在哪里以及其状态

    秒检测用的比较多的主要是RCNN,spp-net,fast-rcnn;yolo系列(yolov3和yolov4);还有SSD,ResNet等

  3. YOLO算法概述

    对出入的图片,将图片分为mxm个方格。当某个物体的中心点落在了某个方格中,该方格负责预测该物体。每个方格会被预测物体产生n个候选框并生成每个框的置信度。最后选区置信度比较高的方框作为结果

5、opencv调用darknet物体识别模型

  1. darknet模型获取

    • 文件

      • cfg文件:模型描述文件
      • weights文件:模型权重文件
    • yolov3

      https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg
      https://pjreddie.com/media/files/yolov3.weights

      https://github.com/pjreddie/darknet/blob/master/data/coco.names

    • yolov4

      https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.cfg
      https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

  2. python代码

    直接粘贴的

    import cv2
    cv=cv2
    import numpy as np
    import time
    
    #读取模型
    net = cv2.dnn.readNetFromDarknet("cpp\opencv\ObjectDetection\model\yolov3.cfg", 
                                     "cpp\opencv\ObjectDetection\model\yolov3.weights")
    #gpu计算之类的……
    # net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
    # net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
    #阈值
    confThreshold = 0.5  #Confidence threshold
    nmsThreshold = 0.4   #Non-maximum suppression threshold
    #读取图片
    frame=cv2.imread("cpp\opencv\ObjectDetection\img\20221122_110416.jpg")
    #等比例缩小
    frame = cv2.resize(frame,((int)(frame.shape[1]/10),(int)(frame.shape[0]/10)))
    print(frame.shape[0] * frame.shape[1] / 100)
    #标签文件并读取所有的标签
    classesFile = "cpp\opencv\ObjectDetection\model\coco.names";
    classes = None
    with open(classesFile, 'rt') as f:
        classes = f.read().rstrip('
    ').split('
    ')
    
    #读取网络名字
    def getOutputsNames(net):
        # Get the names of all the layers in the network
        layersNames = net.getLayerNames()
        # Get the names of the output layers, i.e. the layers with unconnected outputs
        return [layersNames[i - 1] for i in net.getUnconnectedOutLayers()]
    print(getOutputsNames(net))
    # Remove the bounding boxes with low confidence using non-maxima suppression
    
    def postprocess(frame, outs):
        frameHeight = frame.shape[0]
        frameWidth = frame.shape[1]
        # Scan through all the bounding boxes output from the network and keep only the
        # ones with high confidence scores. Assign the box's class label as the class with the highest score.
        classIds = []
        confidences = []
        boxes = []
        # 解析结果
        for out in outs:
            print(out.shape)
            for detection in out:
                scores = detection[5:]
                classId = np.argmax(scores)
                confidence = scores[classId]
                if confidence > confThreshold:
                    center_x = int(detection[0] * frameWidth)
                    center_y = int(detection[1] * frameHeight)
                    width = int(detection[2] * frameWidth)
                    height = int(detection[3] * frameHeight)
                    left = int(center_x - width / 2)
                    top = int(center_y - height / 2)
                    classIds.append(classId)
                    confidences.append(float(confidence))
                    boxes.append([left, top, width, height])
     
        # Perform non maximum suppression to eliminate redundant overlapping boxes with
        # lower confidences.
        print(boxes)
        print(confidences)  
        indices = cv2.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold) 
        for i in indices:
            #print(i)
            #i = i[0]
            box = boxes[i]
            left = box[0]
            top = box[1]
            width = box[2]
            height = box[3]
            drawPred(classIds[i], confidences[i], left, top, left + width, top + height)
    
        # Draw the predicted bounding box
    def drawPred(classId, conf, left, top, right, bottom):
        # Draw a bounding box.
        cv2.rectangle(frame, (left, top), (right, bottom), (0, 0, 255))
        label = '%.2f' % conf    
        # Get the label for the class name and its confidence
        if classes:
            assert(classId < len(classes))
            label = '%s:%s' % (classes[classId], label)
        #Display the label at the top of the bounding box
        labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1)
        top = max(top, labelSize[1])
        cv2.putText(frame, label, (left, top), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255))
    #读取图片,改为416*416的大小
    blob = cv2.dnn.blobFromImage(frame, 1/255, (416, 416), [0,0,0], 1, crop=False)
    t1=time.time()
    #设置输入的图片获取结果
    net.setInput(blob)
    #使用模型
    outs = net.forward(getOutputsNames(net))
    print(time.time()-t1)
    postprocess(frame, outs)
    t, _ = net.getPerfProfile()
    label = 'Inference time: %.2f ms' % (t * 1000.0 / cv.getTickFrequency())
    cv2.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))
    print(frame.shape)
    cv2.imshow("result",frame)
    cv2.waitKey(0)
    
  3. 遇到的一些问题

    • blobFromImage(frame, (double)1/255, Size(416, 416),Scalar(0,0,0), true);需要注意这里的收缩比例以及需要转为double

    • net.forward的输出结果

      一共80个label,输出结果为n*85。第1和2个是方框中心点(x,y)的位置的比例。第3和4个是宽和高的比例。第五个不知道。后面的80个是这个区域的每个标签的可能性

      所以针对这个结果,需要做的是:从后面80个结果里面获取最大值和索引。这个索引也就是label的索引。如果这个最大值大于置信度的阈值,就获取前面的选框区域

  4. c++代码

    #include <iostream>
    #include <opencv2/opencv.hpp>
    #include <fstream>
    
    using namespace std;
    using namespace cv;
    using namespace dnn;
    
    void readClasses(vector<string> &classes, string classFile) {
        ifstream readFile;
        readFile.open(classFile);
        if(readFile.is_open()) {
            // printf("文件打开成功!
    开始读取内容……
    ");
            string label;
            while(getline(readFile, label)) {
                // printf("%s
    ", label.c_str());
                classes.push_back(label);
            }
            // printf("读取结束,一共%d个标签
    ", classes.size());
        } else {
            cerr << "文件打开失败" << endl;
        }
        readFile.close();
    }
    
    vector<string> getOutputsNames(Net net) {
        vector<string> layers = net.getLayerNames();
        cout << layers.size()<< endl;
        //获得末端连接神经网络名字,用于指定forward输出层的名字
        auto idx = net.getUnconnectedOutLayers();
        vector<string> res;
        for(int i = 0; i < idx.size(); i++) {
            res.push_back(layers[idx[i] - 1]);
        }
        for(int i = 0; i < res.size(); i++) {
            cout << res[i] << endl;
        }
        return res;
    }
    
    void readAndGetRes(string imgPath,Mat &frame, vector<Mat> &outs) {
        Net net = readNetFromDarknet("cpp\opencv\ObjectDetection\model\yolov3.cfg", 
                                     "cpp\opencv\ObjectDetection\model\yolov3.weights");
        frame = imread(imgPath);
        resize(frame, frame, Size(frame.cols / 10, frame.rows / 10));
        string classFile = "cpp\opencv\ObjectDetection\model\coco.names";
    
        Mat blob = blobFromImage(frame, (double)1/255, Size(416, 416),Scalar(0,0,0), true);
        time_t startTime;
        time(&startTime);
        net.setInput(blob);
        net.forward(outs, getOutputsNames(net));
    }
    
    void postProcess(Mat &frame,vector<Mat> probs) {
        vector<string> classes;
        readClasses(classes, "cpp\opencv\ObjectDetection\model\coco.names");
        int w = frame.cols;
        int h = frame.rows;
        float confThreshold = 0.5;
        float nmxThreshold = 0.4;
        vector<int> classIds;
        vector<float> confidences;
        vector<Rect> boxes;
        for(auto prob: probs) {
            cout << prob.size() << endl;
            for(int i = 0; i < prob.rows; i++) {
                //获取该选框的所有label的可能性
                Mat scores = prob.row(i).colRange(5, prob.cols);
                //获取选框的位置
                vector<float> detect = prob.row(i).colRange(0, 4);
                Point maxLoc;
                double confidence;
                //获取最大可能的可能性以及它的索引
                minMaxLoc(scores, 0, &confidence, 0, &maxLoc);
                if(confidence > confThreshold) {
                    printf("confidence   %0.2f, loc %s,  %0.2f  %0.2f  %0.2f  %0.2f %0.2f
    ",
                    confidence, classes[maxLoc.x].c_str(), detect[0], detect[1], detect[2], detect[3], detect[4]);
                    int center_x = detect[0] * w;
                    int center_y = detect[1] * h;
                    int width = detect[2] * w;
                    int height = detect[3] * h;
                    Rect box = Rect(center_x - width / 2, center_y - height / 2, width, height);
                    boxes.push_back(box);
                    confidences.push_back(confidence);
                    classIds.push_back(maxLoc.x);
                }
            }
        }
        vector<int> indices;
        NMSBoxes(boxes, confidences, confThreshold, nmxThreshold, indices);
        cout << "the res " <<  indices.size() << endl;
        for(int i:indices) {
            cout << i << endl;
            rectangle(frame, boxes[i], Scalar(0, 0, 255));
            string label = format("%s:%0.2f%", classes[classIds[i]].c_str(), confidences[i]*100);
            cout << label << endl;
            putText(frame, label, Point(boxes[i].x,boxes[i].y), FONT_HERSHEY_SIMPLEX,0.5,Scalar(255,255,255));
            cout << label << endl;
        }
        cout << "this is end" << endl;
        imshow("label frame",frame);
        waitKey(0);
    }
    
    int main() {
        vector<Mat> res;
        Mat img;
        readAndGetRes("cpp\opencv\ObjectDetection\img\20221122_110416.jpg", img, res);
        postProcess(img, res);
        return 0;
    }
    

    上述部分参考自:https://blog.csdn.net/virobotics/article/details/124008160

三、模型训练

1、现状

​ 之前尝试了使用yolo3进行了图像的识别,但是里面的label跟我说需要的不匹配,尝试针对自己的需求进行模型训练

2、步骤-模型编译

  1. 下载代码

    git clone https://github.com/pjreddie/darknet.git

  2. 源码编译-环境准备

    https://blog.csdn.net/m0_52571715/article/details/109713440

    • 环境准备

      1. GitBash运行make命令,提示bash: make: command not found

        说明我现有的环境变量里面不包含make.exe

        我现有的环境已经有c++的套件,去mingw/bin目录下看了一下,确实没有make.exe

        于是去网上下了一个make.exe复制进去了http://www.equation.com/servlet/equation.cmd?fa=make

      2. 尝试编译的时候发现,没有cuda库

        https://blog.csdn.net/weixin_43848614/article/details/117221384

        • 下载:https://developer.nvidia.com/cuda-toolkit-archive

        • 安装

          在这里插入图片描述

          在这里插入图片描述

        • 环境变量

          添加刚刚安装的位置NVIDIA GPU Computing Toolkit的那个

        • 测试

          打开cmd

          > nvcc --version
          nvcc: NVIDIA (R) Cuda compiler driver
          Copyright (c) 2005-2020 NVIDIA Corporation
          Built on Tue_Sep_15_19:12:04_Pacific_Daylight_Time_2020
          Cuda compilation tools, release 11.1, V11.1.74
          Build cuda_11.1.relgpu_drvr455TC455_06.29069683_0
          
      3. windows编译和解编译问题-失败了所以重来

        • Makefile:修改GPU=1

          GPU=1
          CUDNN=0
          OPENCV=0
          OPENMP=0
          DEBUG=0
          
        • Makefile:修改/usr/local/cuda/include/改成自己的路径

        • include/darknet.h:780:11: error: unknown type name ‘clock_t’;

          1. 改成clockid_t

            include/darknet.h:780:11: error: unknown type name 'clock_t'; did you mean 'clockid_t'?
             float sec(clock_t clocks);
                       ^~~~~~~
                       clockid_t
            compilation terminated due to -Wfatal-errors.
            
          2. 直接注释了

            修改了类型以后冲突了,直接把这个头文件里面的注释了试试看

            ./src/utils.h:48:7: error: conflicting types for 'sec'
             float sec(clock_t clocks);
                   ^~~
            compilation terminated due to -Wfatal-errors.
            make: *** [makefile:89: obj/gemm.o] Error 1
            
        • nvcc fatal : Unsupported gpu architecture ‘compute_30’

          compute_30已经被cuda11放弃使用了,修改Makefile

          ARCH= -gencode arch=compute_30,code=sm_30 
                -gencode arch=compute_35,code=sm_35 
                -gencode arch=compute_50,code=[sm_50,compute_50] 
                -gencode arch=compute_52,code=[sm_52,compute_52]
                
          
          ARCH= -gencode arch=compute_35,code=sm_35 
                -gencode arch=compute_50,code=[sm_50,compute_50] 
                -gencode arch=compute_52,code=[sm_52,compute_52]
          
        • nvcc fatal : Cannot find compiler ‘cl.exe’ in PATH

          懒得再装了,印象里好像在别的地方安装过。去搜了一下,在./Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.35.32215/bin/Hostx86/x86/cl.exe找到了

          先试试看直接把文件复制到cuda目录下面,似乎可以了

        • nvcc fatal : Failed to preprocess host compiler properties

          配置cudnn

          在这里插入图片描述

      4. 卸载并重新安装visual studio

        后面在尝试的过程中,出现了visual studio版本不对的通知,建议为2015-2019。本来直接跳过的,但是后面报错编译不过了QAQ

        • 卸载vs

          在这里插入图片描述

        • 下载2019

          https://visualstudio.microsoft.com/zh-hans/vs/older-downloads/

          在这里插入图片描述

        • 安装

          在这里插入图片描述
          在这里插入图片描述

          在这里插入图片描述

      5. 重新下载代码编译

        • 下载:git clone https://github.com/AlexeyAB/darknet

        • 打开PowerShell 管理员,输入Set-ExecutionPolicy RemoteSigned

          在这里插入图片描述

        • 进入darknet目录执行:.uild.ps1

          在这里插入图片描述

          从过程里面看

          • 下载了vcpkg

            在这里插入图片描述

          • 确认VS

            在这里插入图片描述

          • 确认CUDA

            在这里插入图片描述

          • 编译命令及版本确认

            在这里插入图片描述

          • cmake版本不对于是重新下

            在这里插入图片描述

          • powershell版本不对又下

            在这里插入图片描述

            ……后面太多了,懒得一一看了

3、步骤-模型训练

https://blog.csdn.net/qq_38915710/article/details/97112788

  1. 准备数据集

    根目录新建myData文件夹,在myData文件夹下再新建三个文件夹,分别为annotations、ImageSets、JPEGImages,其中annotations文件下存放xml文件,ImageSets下新建main文件夹,其中建一个train.txt,其中放图片名称(不要带.jpg,只是单独的图片名称),JPEGImages下放要训练的图片。

    • 读取所有文件名

      #include <io.h>
      #include <vector>
      
      void readImageFile(string filePath) {
          struct _finddata_t fileinfo;
          intptr_t file_ptr;
          file_ptr = _findfirst(filePath.c_str(), &fileinfo);
          vector<string> files;
          if(file_ptr != -1) {
              do {
                  cout << fileinfo.name << "   ";
                  string name = fileinfo.name;
                  int idx = name.find_last_of('.');
                  if (idx == -1) continue;            
                  string type = name.substr(idx);
                  if(type == ".jpg" || type == ".jpeg" || type == ".png") {
                      cout << name.substr(0, idx) << endl;
                      files.push_back(name.substr(0, idx));
                  }
              } while ((_findnext(file_ptr, &fileinfo)) == 0);
              _findclose(file_ptr);
          }
      	ofstream outfile;
          outfile.open("D:\Project\OpencvProject\darknet\myData\ImageSets\main\train.txt");
          if (!outfile.is_open()) {
              cerr << "open file failed" << endl;
          }
          for (string name : files) {
              outfile << name << endl;
          }
          outfile.close();
      }
      
      

      [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-334444qg-1681956764395)(opencv场景识别.assets/image-20230418144352893.png)]

    • 准备标签

      • 下载labelimg:pip install labelimg

        后面下载了labelimg的代码,然后压成exe文件pyinstaller.exe --hidden-import=xml --hidden-import=xml.etree --hidden-import=xml.etree.ElementTree --hidden-import=lxml.etree -D -F -n labelImg -c "../labelImg.py" -p ../libs -p ../

        在这里插入图片描述

      • 启动

        在这里插入图片描述

        在这里插入图片描述

      • 加标签

        在这里插入图片描述

      • 转换标签

        python的(复制来的)

        import xml.etree.ElementTree as ET
        import pickle
        import os
        from os import listdir, getcwd
        from os.path import join
         
        sets=[('myData', 'train')]  #元组数组
         
        classes = ['screen'] # each category's name
         
         
        def convert(size, box):
            dw = 1./(size[0])
            dh = 1./(size[1])
            x = (box[0] + box[1])/2.0 - 1
            y = (box[2] + box[3])/2.0 - 1
            w = box[1] - box[0]
            h = box[3] - box[2]
            x = x*dw
            w = w*dw
            y = y*dh
            h = h*dh
            return (x,y,w,h)
         
        def convert_annotation(wd, year, image_id):
            #读取xml,输出为txt文件
            in_file = open(wd + './Annotations/%s.xml'%(image_id))
            out_file = open(wd + './labels/%s.txt'%(image_id), 'w')
            #读取xml内容
            tree=ET.parse(in_file)
            root = tree.getroot()   #这两行表示从xml文件中读取内容
            size = root.find('size')    #从读取到的xml内容中发现size
            w = int(size.find('width').text)    #取宽度
            h = int(size.find('height').text)   #取高度
         
            for obj in root.iter('object'):
                difficult = obj.find('difficult').text
                cls = obj.find('name').text
                cls_id = classes.index(cls)
                xmlbox = obj.find('bndbox')
                b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
                try:
                    bb = convert((w,h), b)
                except:
                    print(image_id)
                out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '
        ')
         
        if __name__ == '__main__':
         
            wd = getcwd()   #返回当前工作目录
            wd += '/darknet/myData/'   #定位到数据目录
            classes = open(wd + './Annotations/classes.txt').read().strip().split()
            print(classes)
            for year, image_set in sets:
                #判断并创建labels目录
                if not os.path.exists(wd + './labels/'):    #os.path.exists("")表示如果路径 path 存在,返回 True;如果路径 path 不存在,返回 False。
                    os.makedirs(wd + './labels/')   #创建目录
                #读取train.txt下所有的图片名字
                image_ids = open(wd + './ImageSets/main/%s.txt'%(image_set)).read().strip().split() #打开该文件,并一次性读取该文件所有内容,再去掉文本中句子开头与结尾的符号的
                print(wd + './%s_%s.txt'%(year, image_set))
                # 在myData目录下创建myData_train文件
                list_file = open(wd + './%s_%s.txt'%(year, image_set), 'w') #打开myData_train该文件
                for image_id in image_ids:
                    #在myData_train目录下写入图片的绝对路径-(有些问题,可能后缀不是jpg)
                    list_file.write('%s/JPEGImages/%s.jpg
        '%(wd, image_id)) #在myData_train.txt里面写入图片地址+名称
                    convert_annotation(wd, year, image_id)
                list_file.close()
        

        c++

        本来想转为c++的实现的,但是后面发现,转化出来的结果与yonglabelImage的时候直接选择yolo格式一致=。=,只有精度有点区别。用c++先来一波复制

        #include <iostream>
        #include <io.h>
        #include <vector>
        #include <fstream>
        using namespace std;
        
        void readImageFile(string root) {
            string filePath = root + ".\JPEGImages\*";
            struct _finddata_t fileinfo;
            intptr_t file_ptr;
            file_ptr = _findfirst(filePath.c_str(), &fileinfo);
            filePath = filePath.substr(0, filePath.size()-1);
            vector<string> files;
            vector<string> absfiles;
            if(file_ptr != -1) {
                do {
                    // cout << filePath + fileinfo.name << "   ";
                    string name = fileinfo.name;
                    int idx = name.find_last_of('.');
                    if (idx == -1) continue;            
                    string type = name.substr(idx);
                    if(type == ".jpg" || type == ".jpeg" || type == ".png") {
                        // cout << name.substr(0, idx) << endl;
                        //保存文件名和绝对路径
                        files.push_back(name.substr(0, idx));
                        absfiles.push_back(filePath + name);
                    }
                } while ((_findnext(file_ptr, &fileinfo)) == 0);
                _findclose(file_ptr);
            }
            //写入文件名和绝对路径
            ofstream outfile;
            ofstream trainfile;
            outfile.open(root + ".\ImageSets\main\train.txt");
            trainfile.open(root + ".\myData_train.txt");
            if (!outfile.is_open() || !trainfile.is_open()) {
                cerr << "open file failed" << endl;
            }
            for (int i = 0; i < files.size(); i++) {
                outfile << files[i] << endl;
                trainfile << absfiles[i] << endl;
            }
            outfile.close();
            trainfile.close();
            //调用cmd命令创建label文件夹
            string labelspath = root + "labels\";
            string cmd = "mkdir " + labelspath;
            system(cmd.c_str());
            cmd = "copy " + root + ".\Annotations\*.txt " + labelspath;
            system(cmd.c_str());
        }
        
        int main() {
            string root = "D:\Project\OpencvProject\darknet\myData\";
            readImageFile(root);
            return 0;
        }
        

        在这里插入图片描述

  2. 训练数据

    • 准备文件

      • ImgDetect.data

        #类数
        classes= 23
        #训练数据集
        train  = myData/myData_train.txt
        #测试数据集
        valid  = myData/myData_train.txt
        #类名
        names = myData/labels/classes.txt
        #结果
        backup = myData/
        
      • yolo.cfg

        [net]

        batch为一次可训练的图片数量
        subdivision是分割一次训练数量的值(使用该参数可以让显卡较差的电脑也能够使用高batch进行训练)。其两者的关系就是batch÷subdivision=实际上一次训练图片数量
        width和height是输入图片的尺寸,这个无需改变,只要是16的倍数即可
        max_batches是总共训练的轮数(官方推荐是:种类量2000,但最少6000。即如果你训练一个种类。则填6000,若训练4个种类则填8000)
        steps的值设置为0.6
        max_batches和0.8*max_batches即可

        https://blog.csdn.net/weixin_52939176/article/details/122554179

        在这里插入图片描述

        [convolutional]和[yolo]

        主要修改,如上图所示的【convolutional】和【yolo】组(在yolo-tiny算法中共两组,yolo算法中三组)
        修改filters为:(种类数+5)*3,即种类数8,(8+5)*3=39
        修改classes为种类数

        在这里插入图片描述

      • yolov3.weights

        需要和.cfg文件对应,直接去网上找

    • 开始训练

      darknet.exe detector train cfg_myData/ImgDetect.data cfg_myData/yolov3.cfg cfg_myData/yolov3.weights

      在这里插入图片描述

      !可能是数据太少了,结果很差。重新来一下QAQ
      csdn的图片也太难传了,居然要一张张图片往里贴

风语者!平时喜欢研究各种技术,目前在从事后端开发工作,热爱生活、热爱工作。