MLCamera – Vision & Core ML with AVCaptureSession Inceptionv3 model

June 13, 2017

Goal of MLCamera demo is to demonstrates using of Vision and Core ML frameworks, to process AVCaptureVideoDataOutput and to perform image classification. Complete working demo can be dowloaded from github. Relevant WWDC session are Introduction to Core ML, Vision framework building on Core ML. Apple provides few models converted to its Core ML supported format here. Apple also provides tools to convert most popular machine learning model types to Core ML supported format. With that out of the way let’s get started.

Setting up AVCaptureSession

Lets start by setting up AVCaptureSession and sample buffer delegate.

    guard let device = AVCaptureDevice.default(for: .video) else {
        fatalError("Capture device not available")
    guard let input = try? AVCaptureDeviceInput(device: device) else {
        fatalError("Capture input not available")
    let output = AVCaptureVideoDataOutput()
    let session = AVCaptureSession()
    output.setSampleBufferDelegate(self, queue: .userInteractive))
    // Setup preview layer
    let previewLayer = AVCaptureVideoPreviewLayer(session: session)
    self.session = session

Next we need to handle orientation for AVCaptureConnection. Setting videoOrientation based on device orientation will result in rotated CVImageBuffer. Even though rotation of pixel buffer is hardware accelerated, this is not the most efficient way of handling it. I was not able to test getting exif rotation based on device rotation, hence rotating the buffer. I will update it in a near future. More info.

    let orientation = UIDevice.current.videoOrientation
    previewLayer?.connection?.videoOrientation = orientation
    session?.outputs.forEach {
        $0.connections.forEach {
            $0.videoOrientation = orientation

Setting up VNImageRequestHandler and Core ML model

Time to get a model.Drag and drop the model to Xcode project. You’ll have to tick the checkbox to add target membership. Xcode auto generates source code for the model. One of the goals of Vision & Core ML to provide unified interface to work with variety of models.

Next step is to create classification request using Xcode generated class Inceptionv3. To try different models just drag them into Xcode (remove previous one). And change class name in classificationRequest.

    lazy var classificationRequest: VNCoreMLRequest = {
        // Load the ML model through its generated class and create a Vision request for it.
        do {
            let model = try VNCoreMLModel(for: Inceptionv3().model)
            let request = VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
            request.imageCropAndScaleOption = VNImageCropAndScaleOptionCenterCrop
            return request
        } catch {
            fatalError("can't load Vision ML model: \(error)")

Setup classification request handler that simply shows results in UILabel

    func handleClassification(request: VNRequest, error: Error?) {
        guard let observations = request.results as? [VNClassificationObservation] else {
            fatalError("unexpected result type from VNCoreMLRequest")
        // Filter observation
        let filteredOservations = observations[0...10].filter({ $0.confidence > 0.1 })
        // Update UI
        DispatchQueue.main.async { [weak self] in
            let text = "\(observations.first?.identifier) \(observations.first?.confidence)"
                    label.text = text

Lastly we need to process the samples. Create CVImageBuffer and VNImageRequestHandler. Note appropriate thing to do is to create VNImageRequestHandler with Exif orientation parameter. Since we set videoOrientation earlier on, pixel buffer is already correctly oriented.

    func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        // Note: Pixel buffer is already correctly rotated based on device rotation
        // See: deviceOrientationDidChange(_:) comment
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
        var requestOptions: [VNImageOption: Any] = [:]
        if let cameraIntrinsicData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
            requestOptions = [.cameraIntrinsics: cameraIntrinsicData]
        // Run the Core ML classifier - results in handleClassification method
        let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: requestOptions)
        do {
            try handler.perform([classificationRequest])
        } catch {

That’s all there is to it. More complete working demo can be dowloaded from here. Feedback, questions, suggestions are welcomed. Hit me up on twitter. Thank you for reading.

#AI, #Blog posts, #Coding, #iOS Development, #Swift, #Technology