How to See Like a Machine ?

Enos Jeba
The Deep Hub
Published in
10 min readMay 27, 2023

--

A Guide to Computer Vision Tools

Hello and welcome to my blog on computer vision tools! If you are like me, you love playing with images and videos and making them do cool things. Maybe you want to detect faces, recognize objects, apply filters, or create deepfakes. Whatever your goal is, there is a tool for you. In this blog, I will introduce you to some of the most popular and powerful computer vision tools that you can use to unleash your creativity and have fun. Let’s get started!

Note : This blog is more biased towards python as it is the language most developers use to get started in computer vision.

Hardware tools

Camera

This could be a simple USB Camera, Webcam, Depth sensing camera, CCTV or anything as per the requirements of your task

Edge Device

Devices such as Nvidia Jetson etc. if available or required.

Workflow tools

To make sure you finish your task smoothly, it’s good to make your goal clear. This essentially means writing down the steps you are going to take, and deadlines associated with so that you are not wasting your time.

By doing this step your mind will be clutter free and focus only on working on the task at hand.

Notepad

Physical notepad to basically draw a rough outline before starting. There is something about sketching it out and starting from there.

Excel

To keep track of time, deadlines, task status or additional workflow enhancement features. You can skip excel if you use notion as the same work can be blended totally into one page.

Notion

Access here

It is a good tool to quickly type notes or random ideas which occur. Moreover It is structured by providing multiple styles, formats of the imformation you want to add including picutures, Boards and many other methods

The important tool I use is Tables. I use it to keep track of my model training (Advance Step mentioned below). Since I will be experimenting with multiple datasets, It’s easy to get lost of which model performed how.

Here is a template you can take reference of

Draw.io

Access here

The most important tool. I have never started a project without mind mapping the idea’s. You can add 100 layers of thoughts on your head on how your solution might work but keep it on diagrams with interconnected boxes makes a whole lot of impact.

You also have other alternatives for this such as miro and Excalidraw.

Stepping in tools

Ubuntu

Ubuntu has several advantages over other OS for computer vision, such as:

  1. Ubuntu is free and customizable and Unlike proprietary OS like Windows or Mac OS, Linux does not require any license fee or activation code to use.
  2. Linux users can also modify the source code of the OS to suit their needs and preferences. (Highly useful for deployments)
  3. Linux can run on various types of computers, from desktops and laptops to servers and embedded systems.
  4. Linux also supports many types of cameras, sensors, and GPUs that are essential for computer vision tasks.

Python / C++

The programming language to compose our solution and make it work.

Why Python?

  1. Easy to Use: Python is easy to read and write, which makes it suitable for beginners and experts alike.
  2. Many Libraries: Python has many libraries and frameworks (We will be looking some of them below) that provide ready-made solutions for common computer vision tasks, such as image processing, face detection, object recognition, and deep learning.

Why C++?

  1. Performance: C++ is well-known for its rapid execution speed and resource efficiency. This is critical for computer vision jobs that need real-time processing and precision. C++ also provides direct access to low-level features like pointers and bitwise operations, which can improve the efficiency of algorithms and data structures.
  2. Portability: C++ is a cross-platform language that can run on different operating systems and architectures. This makes it easier to deploy computer vision applications on various devices and platforms, such as desktops, mobiles, embedded systems, and cloud servers.

If you are new to programming, you can get started with Python as it is easy to learn, however since C++ offer’s more resource management features you might have to use C++ at the later end of your career based on the need.

Text Editors or IDE

A simple tool to write codes.

VsCode

Download here

Visual Studio Code is a lightweight yet capable source code editor for Windows, macOS, and Linux that runs on your desktop.

It has built-in support for JavaScript, TypeScript, and Node.js, as well as a robust ecosystem of extensions for additional languages and runtimes (including C++, C#, Java, Python, PHP, Go, and.NET).

Sublime Text

Download here

Sublime Text is a powerful text editor with many features to help make coding easier. It has a clean and easy-to-use interface that makes it an excellent choice for both beginners and experienced developers. One of the best things about Sublime Text is its vast ecosystem of plugins and themes that can be used to customize the editor to your liking.

You can check out more options here

Code editors completely comes down to personal preference. You can try different tools and see which one works best for you.

Moving along

Let’s explore libraries which can get the job done.

Opencv

OpenCV is a powerful open-source library for computer vision and image processing. It can be used to perform various tasks such as face detection, object recognition, image stitching, video analysis and more. OpenCV has interfaces for C++, Python and supports multiple platforms such as Windows, Linux, Android and iOS.

Opencv opens up the possibility to take camera input feed and perform further analysis on the video / image. It also has multiple camera setting functionality which can be used to adjust color parameters of the image input.

Pillow

Pillow is a Python library that allows you to manipulate and process images in various ways. It is a fork of the Python Imaging Library (PIL), which was discontinued in 2011. Pillow supports many image formats, such as PNG, JPEG, GIF, TIFF, and BMP. It also provides features such as cropping, resizing, rotating, filtering, drawing, and adding text to images.

Numpy

NumPy is a Python library that provides tools for working with numerical arrays, including images. In computer vision, images are often represented as NumPy arrays, where each pixel corresponds to a value or a vector of values. For example, a grayscale image can be stored as a 2D array of integers from 0 to 255, where 0 is black and 255 is white. A color image can be stored as a 3D array of integers from 0 to 255, where each pixel has three values for the red, green and blue channels.

By using NumPy arrays to store images, we can perform a wide range of operations on them, such as resizing, cropping, filtering, transforming, and more. NumPy also offers various functions and methods for manipulating arrays, such as slicing, indexing, broadcasting, and linear algebra. These features make NumPy a powerful and versatile tool for computer vision applications.

Advancing tools

Deep learning frameworks are software libraries that provide tools and functionalities for developing and deploying deep learning models. Deep learning frameworks are widely used in computer vision, which is the field of artificial intelligence that deals with understanding and analyzing visual data such as images and videos.

PyTorch

An open-source framework developed by Facebook that is based on the Torch library. PyTorch is known for its dynamic computational graph, which allows users to modify the network structure at runtime. PyTorch also supports distributed training, automatic differentiation, and various pre-trained models.

TensorFlow

An open-source framework for machine learning and deep learning. It was developed by Google and released in 2015. TensorFlow allows users to create, train, and deploy neural networks and other models on various platforms, such as CPUs, GPUs, TPUs, and mobile devices. TensorFlow supports a variety of programming languages, such as Python, C++, Java, and Swift. TensorFlow also provides tools and libraries for data processing, visualization, debugging, and optimization.

These two are the most popular frameworks as of mid 2023.

Nearer to goal

Matplotlib

Matplotlib is a popular Python library for creating and customizing various types of plots and graphs. You can also be used for visualizing deep learning training processes, such as loss curves, accuracy curves, confusion matrices, and more.

Netron

This is another tool you might be using when you are almost done with training. Netron is a tool that allows you to visualize and explore neural network models. It supports many popular formats, such as TensorFlow, PyTorch, ONNX, Keras, and Caffe. With Netron, you can inspect the structure, parameters, and metadata of your models, as well as export them to other formats.

Netron is open source and available for Windows, Mac OS, Linux, and web browsers.

Out of the way tools

Obs

Open broadcast software is used for live streaming videos to online streaming services like YouTube. Obs has some features which will be helpful for computer vision task

  • Create Database Obs has a pretty good recording engine which can help us collect data once we connect it with the camera
  • Test Camera Inputs Obs also can get camera streams from your NVR camera’s as well as other methods of inputs such as USB. Since it is plug and view, testing cameras becomes easy

Ubuntu Image Viewer

This is a nitpick. Ubuntu image viewer displays coordinates of mouse hovering it, which is useful in some situation such as placing a text on a specific part of the image or cropping a exact section of image.

VLC effects panel

We all are familiar with VLC video player, The first media player we install on a new pc. VLC is packed with plenty of hidden features. VLC’s control panel provides features such as

  • Making Short Clips You can make short version of your long videos by recording just a part of it.
  • Image Effects You can change contrast, brightness etc and see if you can make your data more clearer for details or some other purpose as well. Once you are okay with the effect replicate the same effect in opencv and you can try these effects in just few button clicks, It will save you tons of time trying different methods via opencv.

Video Editors

Video editors have a good user experience when it comes to taking the important part of the video. This will be helpful for us when there is a long video and we just want to extract the essential part to train our model.

Task Dependent tools

Tools which is used to do some specific tasks

LabelImg

LabelImg is a popular annotation tool.

Image annotation is a process of labeling images with relevant information, such as objects, regions, or landmarks. Image annotation is essential for training computer vision models, which can perform tasks like image classification, object detection, and image segmentation. Image annotation can be done manually by human annotators, or with the help of automated tools that use computer vision algorithms.

CUDA

An open-source project by NVIDIA that enables GPU-accelerated processing pipelines for image pre- and post-processing. CUDA provides high-performance computer vision and image processing operators that can help you achieve higher throughput. CUDA also offers easy integration with deep learning frameworks like PyTorch and TensorFlow, as well as examples of end-to-end object detection, segmentation, and classification applications.

Conclusion

Computer vision is a fascinating and powerful field that can help you solve many real-world problems.

But before you go and start building your own computer vision applications, there are a few things you should keep in mind. Here are some tips and best practices for using computer vision tools effectively:

Choose the right tool for the right task. Different tools have different strengths and weaknesses, so you should always evaluate your requirements and goals before picking a tool.

Learn the basics of computer vision theory. While computer vision tools can make your life easier, they are not magic. You still need to understand how they work and what they can and cannot do.

Experiment with different parameters and settings. Computer vision tools often have many options and parameters that can affect the performance and accuracy of your results. You should always try different combinations and see how they affect your output.

Evaluate your results objectively. Computer vision tools can produce impressive results, but they are not perfect. You should always test your results on different datasets and scenarios, and compare them with other methods or benchmarks. You should also use appropriate metrics and criteria to measure the quality and accuracy of your results.

I hope you enjoyed this blog on computer vision tools and learned something useful along the way. Computer vision is a fun and rewarding field that can open up many possibilities for your projects and career. So don’t be afraid to dive in and start creating amazing computer vision applications!

Happy coding! 😎

BECOME a WRITER at MLearning.ai // AI Factory XR Super Cheap AI

--

--