Goal of MLCamera demo is to demonstrates using of Vision and Core ML frameworks, to process AVCaptureVideoDataOutput and to perform image classification. Complete working demo can be dowloaded from github. Relevant WWDC session are Introduction to Core ML, Vision framework building on Core ML. Apple provides few models converted to its Core ML supported format here. Apple also provides tools to convert most popular machine learning model types to Core ML supported format. With that out of the way let’s get started.
Setting up AVCaptureSession
Lets start by setting up AVCaptureSession
and sample buffer delegate.
guard let device = AVCaptureDevice.default(for: .video) else {
fatalError("Capture device not available")
}
guard let input = try? AVCaptureDeviceInput(device: device) else {
fatalError("Capture input not available")
}
let output = AVCaptureVideoDataOutput()
let session = AVCaptureSession()
session.addInput(input)
session.addOutput(output)
output.setSampleBufferDelegate(self, queue: DispatchQueue.global(qos: .userInteractive))
// Setup preview layer
let previewLayer = AVCaptureVideoPreviewLayer(session: session)
cameraPreview.addCaptureVideoPreviewLayer(previewLayer)
self.session = session
session.startRunning()
Next we need to handle orientation for AVCaptureConnection
. Setting videoOrientation
based on device orientation will result in rotated CVImageBuffer
. Even though rotation of pixel buffer is hardware accelerated, this is not the most efficient way of handling it. I was not able to test getting exif rotation based on device rotation, hence rotating the buffer. I will update it in a near future. More info.
let orientation = UIDevice.current.videoOrientation
previewLayer?.connection?.videoOrientation = orientation
session?.outputs.forEach {
$0.connections.forEach {
$0.videoOrientation = orientation
}
}
Setting up VNImageRequestHandler and Core ML model
Time to get a model.Drag and drop the model to Xcode project. You’ll have to tick the checkbox to add target membership. Xcode auto generates source code for the model. One of the goals of Vision & Core ML to provide unified interface to work with variety of models.
Next step is to create classification request using Xcode generated class Inceptionv3
. To try different models just drag them into Xcode (remove previous one). And change class name in classificationRequest
.
lazy var classificationRequest: VNCoreMLRequest = {
// Load the ML model through its generated class and create a Vision request for it.
do {
let model = try VNCoreMLModel(for: Inceptionv3().model)
let request = VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
request.imageCropAndScaleOption = VNImageCropAndScaleOptionCenterCrop
return request
} catch {
fatalError("can't load Vision ML model: \(error)")
}
}()
Setup classification request handler that simply shows results in UILabel
func handleClassification(request: VNRequest, error: Error?) {
guard let observations = request.results as? [VNClassificationObservation] else {
fatalError("unexpected result type from VNCoreMLRequest")
}
// Filter observation
let filteredOservations = observations[0...10].filter({ $0.confidence > 0.1 })
// Update UI
DispatchQueue.main.async { [weak self] in
let text = "\(observations.first?.identifier) \(observations.first?.confidence)"
label.text = text
}
}
Lastly we need to process the samples. Create CVImageBuffer
and VNImageRequestHandler
. Note appropriate thing to do is to create VNImageRequestHandler
with Exif orientation
parameter. Since we set videoOrientation
earlier on, pixel buffer is already correctly oriented.
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// Note: Pixel buffer is already correctly rotated based on device rotation
// See: deviceOrientationDidChange(_:) comment
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
var requestOptions: [VNImageOption: Any] = [:]
if let cameraIntrinsicData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
requestOptions = [.cameraIntrinsics: cameraIntrinsicData]
}
// Run the Core ML classifier - results in handleClassification method
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: requestOptions)
do {
try handler.perform([classificationRequest])
} catch {
print(error)
}
}
That’s all there is to it. More complete working demo can be dowloaded from here. Feedback, questions, suggestions are welcomed. Hit me up on twitter. Thank you for reading.