Ukrainian startup creates a neural network that detects people wearing masks in a crowd

Ukrainian tech company Fulcrum has created a neural network that can detect people without medical face masks in a crowd. The company told AIN.UA about it.

The company uploaded the project code to GitHub and described its creation process and technologies. According to the backend developer Serhii Kalachnikov, who worked on the project, the idea was to check whether it is possible to detect faces without masks only through web cameras. It is a non-commercial project. According to the developer, the team was motivated by curiosity.

It took the team two weeks to train the neural network. The process of its creation was described in the blog by the developers. Briefly, it looked in the following way:

  • In the final version of the neural network, the team used TensorFlow 2 Nightly, OpenCV 2, Keras, Yolov3. OpenCV – for processing images and creating ‘squares’ when detecting masks. Yolov3 is the ‘brain’ of the neural network.
  • The team started with a simple task: to train the network to find masks in images, and then to move to video processing. Two applications were created in the process. The first one was written in Node.JS and is used to create labels. It helps to compose datasets and transform coordinates of objects in the image from Labelbox JSON to XML Yolov3-format.
  • First, it was necessary to identify the precise location of the mask (or any other object). For this purpose, the team used the Labelbox website. It is convenient because it generates a file with the necessary settings: mask location, image dimensions, time spent on the image, etc. These files later get into one of the mentioned programs.
  • The team wrote a code for Labelbox that parses all the data. This data is later spread between other files in the view required for the neural network to work with it. The program also creates anchors based on this data. The anchors are used to define the height and width of the mask, and how to scale it. The result is a final dataset with images and annotations.
  • The second app is written in Python. It includes Yolov3 and trains the neural network. With this application, developers have created their own model of object recognition in the image.
  • The maximum size for the recognized piece of the image was set to 288 px. This number may be larger, but it was chosen small for faster processing.
  • num.epoch stands for the number of steps for training. It took the team 12 hours to complete 30 steps (with an image size of 288 px).
  • The developers had to write a separate script for the video. However, it worked on the same principles that were used to analyze images. It’s based on Yolov3. The team also set up openCV for uploading video and searching for frames at a certain fps. The app works in the following way: you need to upload a video file to a certain folder and it starts processing the video frame by frame.
  • Webcams usually record short videos for 10-15 minutes. These videos could be sent to a server where they will be processed by similar software. This could be useful if the company or organization, for example, wants to make sure that all its employees wear masks.

You can see the results of the neural network work on the following video: