博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
神秘的数组初始化_图像识别神秘化
阅读量:2524 次
发布时间:2019-05-11

本文共 8515 字,大约阅读时间需要 28 分钟。

神秘的数组初始化

by gk_

由gk_

图像识别神秘化 (Image Recognition Demystified)

Nothing in machine learning captivates the imagination quite like the ability to recognize images. Identifying imagery must connote “intelligence,” right? Let’s demystify.

机器学习没有什么能像图像识别能力那样吸引着想象力 识别图像必须表示“智能”,对吗? 让我们揭开神秘面纱。

The ability to “see,” when it comes to software, begins with the ability to classify. Classification is pattern matching with data. Images are data in the form of 2-dimensional matrices.

对于软件,“查看”的能力始于分类的能力。 分类是与数据进行模式匹配。 图像是二维矩阵形式的数据。

Image recognition is classifying data into one bucket out of many. This is useful work: you can classify an entire image or things within an image.

图像识别将数据分类到众多存储桶中。 这项工作很有用:您可以对整个图像或图像中的事物进行分类。

One of the classic and quite useful applications for image classification is optical character recognition (): going from images of written language to structured text.

光学字符识别( )是图像分类的经典且非常有用的应用程序之一: 从书面图像到结构化文本

This can be done for any alphabet and a wide variety of writing styles.

可以针对任何字母和多种书写方式来完成此操作。

过程中的步骤 (Steps in the process)

We’ll build code to recognize numerical digits in images and show how this works. This will take 3 steps:

我们将构建代码以识别图像中的数字并显示其工作原理。 这将需要3个步骤:

  1. gather and organize data to work with (85% of the effort)

    收集和整理数据以进行合作(85%的努力)

  2. build and test a predictive model (10% of the effort)

    建立和测试预测模型 (工作量的10%)

  3. use the model to recognize images (5% of the effort)

    使用模型识别图像(工作量的5%)

Preparing the data is by far the largest part of our work, this is true of most data science work. There’s a reason it’s called DATA science!

到目前为止,准备数据是我们工作的最大部分,大多数数据科学工作都是如此 。 有一个原因叫数据科学!

The building of our predictive model and its use in predicting values is all math. We’re using software to iterate through data, , and to work with data structures. The software isn’t “intelligent”, it works mathematical equations to do the narrow knowledge work, in this case: recognizing images of digits.

我们的预测模型的建立及其在预测值中的用途都是数学上的 。 我们正在使用软件迭代数据, 以及使用数据结构。 该软件不是“智能”软件,它通过数学方程式来完成狭义的知识工作,在这种情况下,即:识别数字图像。

In practice, most of what people label “AI” is really just software .

实际上,人们标记为“ AI”的大多数实际上只是软件。

我们的预测模型和数据 (Our predictive model and data)

We’ll be using one of the simplest predictive models: the “k-nearest neighbors” or “kNN” regression, first published by E. Fix, J.L. Hodges in 1952.

我们将使用最简单的预测模型之一:“ k最近邻居”或“ kNN”回归模型,该模型最早由E. Fix,JL Hodges于1952年发布。

A simple explanation of this algorithm is and a video of its math . And also for those that want to build the algorithm from scratch.

该算法的简单解释就是和数学的视频 。 而且为那些想从头开始构建的算法。

Here’s how it works: imagine a graph of data points and circles capturing k points, with each value of k validated against your data.

它的工作方式如下:想象一下一个数据点和捕获k个点的圆的图形,其中k的每个值都针对您的数据进行了验证。

The validation error for k in your data has a minimum which can be determined.

数据中k的验证误差有一个可以确定的最小值。

Given the ‘best’ value for k you can classify other points with some measure of precision.

给定k的“最佳”值,您可以用某种精度来对其他点进行分类。

We’ll use to avoid building the math ourselves. Conveniently this library will also provides us our .

我们将使用来避免自己构建数学。 方便地,该库还将为我们提供 。

Let’s begin.

让我们开始。

The code is , we’re using which is a productive way of working on data science projects. The code syntax is Python and our example is borrowed .

代码在 ,我们使用的是 ,这是处理数据科学项目的一种有效方式。 代码语法是Python,我们的示例是借来的。

Start by importing the necessary libraries:

首先导入必要的库:

Next we organize our data:

接下来,我们整理数据:

training images: 1527, test images: 269

You can manipulate the fraction and have more or less test data, we’ll see shortly how this impacts our model’s accuracy.

您可以操纵分数并拥有或多或少的测试数据,我们很快就会看到这如何影响模型的准确性。

By now you’re probably wondering: how are the digit images organized? They are arrays of values, one for each pixel in an 8x8 image. Let’s inspect one.

现在,您可能想知道:数字图像是如何组织的? 它们是值的数组,在8x8图像中每个像素一个。 让我们检查一个。

# one-dimension[  0.   1.  13.  16.  15.   5.   0.   0.   0.   4.  16.   7.  14.  12.   0.   0.   0.   3.  12.   2.  11.  10.   0.   0.   0.   0.   0.   0.  14.   8.   0.   0.   0.   0.   0.   3.  16.   4.   0.   0.   0.   0.   1.  11.  13.   0.   0.   0.   0.   0.   9.  16.  14.  16.   7.   0.   0.   1.  16.  16.  15.  12.   5.   0.]
# two-dimensions[[  0.   1.  13.  16.  15.   5.   0.   0.] [  0.   4.  16.   7.  14.  12.   0.   0.] [  0.   3.  12.   2.  11.  10.   0.   0.] [  0.   0.   0.   0.  14.   8.   0.   0.] [  0.   0.   0.   3.  16.   4.   0.   0.] [  0.   0.   1.  11.  13.   0.   0.   0.] [  0.   0.   9.  16.  14.  16.   7.   0.] [  0.   1.  16.  16.  15.  12.   5.   0.]]

The same image data is shown as a flat (one-dimensional) array and again as an 8x8 array in an array (two-dimensional). Think of each row of the image as an array of 8 pixels, there are 8 rows. We could ignore the gray-scale (the values) and work with 0’s and 1’s, that would simplify the math a bit.

相同的图像数据显示为平面(一维)阵列,再次显示为阵列中的8x8阵列(二维)。 将图像的每一行视为一个8像素的数组,共有8行。 我们可以忽略灰度(值)并使用0和1,这将简化数学运算。

We can ‘plot’ this to see this array in its ‘pixelated’ form.

我们可以对此进行“绘制”以查看其“像素化”形式的数组。

What digit is this? Let’s ask our model, but first we need to build it.

这是几位数 让我们问一下我们的模型,但是首先我们需要构建它。

KNN score: 0.951852

Against our test data our nearest-neighbor model had an accuracy score of 95%, not bad. Go back and change the ‘fraction’ value to see how this impacts the score.

根据我们的测试数据,我们的最近邻居模型的准确度得分为95%,还不错。 返回并更改“分数”值以查看其如何影响分数。

array([2])

The model predicts that the array shown above is a ‘2’, which looks correct.

该模型预测上面显示的数组为' 2 ',看起来正确。

Let’s try a few more, remember these are digits from our test data, we did not use these images to build our model (very important).

让我们再尝试一些,记住这些是测试数据中的数字 ,我们没有使用这些图像来构建我们的模型(非常重要)。

Not bad.

不错。

We can create a fictional digit and see what our model thinks about it.

我们可以创建一个虚构的数字,然后看看我们的模型对此有何看法。

If we had a collection of nonsensical digit images we could add those to our training with a non-numeric label — just another classification.

如果我们收集了一系列无意义的数字图像,则可以使用非数字标签将它们添加到我们的训练中,这只是另一种分类。

那么图像识别如何工作? (So how does image recognition work?)

  • image data is organized: both training and test, with labels (X, y)

    图像数据组织起来 :训练和测试都带有标签(X,y)

Training data is kept separate from test data, which also means we remove duplicates (or near-duplicates) between them.

训练数据与测试数据是分开的,这也意味着我们删除了它们之间的重复项(或几乎重复项)。

  • a model is built using one of several mathematical models (, , , etc.)

    使用几种数学模型( , , 等)之一构建模型

Which type of model you choose depends on your data and the type and complexity of the classification work.

选择哪种类型的模型取决于您的数据以及分类工作的类型和复杂性。

  • new data is put into the model to generate a prediction

    将新数据放入模型以生成预测

This is lighting fast: the result of a single mathematical calculation.

这是很快的事情:一次数学计算的结果。

If you have a collection of pictures with and without cats, you can build a model to classify if a picture contains a cat. Notice you need training images that are devoid of any cats for this to work.

如果您有带和不带猫的图片集合,则可以建立模型来分类图片是否包含猫。 请注意,您需要没有任何猫的训练图像才能起作用。

Of course you can apply multiple models to a picture and identify several things.

当然,您可以将多个模型应用于一张图片并识别几件事。

大数据 (Large Data)

A significant challenge in all of this is the size of each image since 8x8 is not a reasonable image size for anything but small digits, it’s not uncommon to be dealing with 500x500 pixel images, or larger. That’s 250,000 pixels per image, so 10,000 images of training means doing math on 2.5Billion values to build a model. And the math isn’t just addition or multiplication: we’re multiplying matrices, multiplying by floating-point weights, calculating derivatives. This is why processing power (and memory) is key in certain machine learning applications.

所有这方面一个重大挑战是每张图像的大小,因为8x8对于除小数位以外的其他任何东西都不是合理的图像大小,处理500x500像素或更大的图像并不少见。 那就是每张图像250,000像素,因此10,000张训练图像意味着对25亿个值进行数学运算以建立模型。 数学不只是加法或乘法:我们要乘以矩阵,再乘以浮点权重,然后计算导数。 这就是为什么处理能力(和内存)在某些机器学习应用程序中至关重要的原因。

There are strategies to deal with this image size problem:

有解决此图像尺寸问题的策略:

  • use hardware graphic processor units () to speed up the math

    使用硬件图形处理器单元( )加速数学运算

  • reduce images to smaller dimensions, without losing clarity

    将图像缩小到较小的尺寸,而不会失去清晰度
  • reduce colors to gray-scale and gradients (you can still see the cat)

    将颜色降低为灰度和渐变(您仍然可以看到猫)

  • look at sections of an image to find what you’re looking for

    查看图像的各个部分以找到所需的内容

The good news is once a model is built, no matter how laborious that was, the prediction is fast. Image processing is used in applications ranging from facial recognition to OCR to self-driving cars.

好消息是,一旦建立了模型,无论多么费力,预测都很快。 图像处理用于从面部识别到OCR到自动驾驶汽车的各种应用。

Now you understand the basics of how this works.

现在您了解了其工作原理。

翻译自:

神秘的数组初始化

转载地址:http://rpgwd.baihongyu.com/

你可能感兴趣的文章
Spring+SpringMVC+MyBatis深入学习及搭建(四)——MyBatis输入映射与输出映射
查看>>
opacity半透明兼容ie8。。。。ie8半透明
查看>>
CDOJ_24 八球胜负
查看>>
Alpha 冲刺 (7/10)
查看>>
一款jQuery打造的具有多功能切换的幻灯片特效
查看>>
SNMP从入门到开发:进阶篇
查看>>
@ServletComponentScan ,@ComponentScan,@Configuration 解析
查看>>
unity3d 射弹基础案例代码分析
查看>>
thinksns 分页数据
查看>>
os模块
查看>>
LINQ to SQL vs. NHibernate
查看>>
基于Angular5和WebAPI的增删改查(一)
查看>>
windows 10 & Office 2016 安装
查看>>
最短路径(SP)问题相关算法与模板
查看>>
js算法之最常用的排序
查看>>
Python——交互式图形编程
查看>>
经典排序——希尔排序
查看>>
团队编程项目作业2-团队编程项目代码设计规范
查看>>
英特尔公司将停止910GL、915GL和915PL芯片组的生产
查看>>
Maven配置
查看>>