AI 900 PREP : Understanding essential AI concepts

What is AI?

Artificial Intelligence is the name of one of my most favourite movies, and it left a lasting impression on me. I remember feeling deeply moved by the story of the robot boy—how lonely he was in a world that didn’t fully accept him, even as he developed what seemed like real emotions. Despite his human-like behaviour, everyone still saw him as just a machine, just software mimicking feelings without truly being. Interestingly, that idea aligns with how AI is defined today: a system designed to simulate human intelligence, without consciousness or identity. But it also made me wonder—could that definition evolve in the future? Could AI one day have something more—like its own ‘personality’ or emotional depth?

Every day, we humans interact with AI in more ways than we realise. One of the most common is simply having a conversation. For example, you might ask a generative AI for travel advice or recommendations on what movies to watch over the weekend. Whether we notice it or not, we’re becoming more attached to AI in our daily lives than we might expect—almost like a virtual friend. This article will help you understand more about AI, the different types of AI services and how they are integrated to innovate and simplify everyday tasks.

✍️ AI That Creates: Generative AI (GenAI)

ChatGPT, ClaudeAI, and PerplexityAI—tools we often enjoy chatting with—are great examples of generative AI. This type of AI is capable of creating brand-new content such as text, images, code, video, and even music. For instance, you can ask it to suggest a movie or a place and then generate an image of that as well.

Generative AI is built on powerful language models. These models are typically trained on large datasets (known as LLMs—Large Language Models), or sometimes smaller, more focused datasets (SLMs—Small Language Models). They learn the relationships between words, phrases, and concepts, enabling them to understand context and generate coherent, relevant responses. But generative AI today is more than just a single language model—it often works as a multi-modal system, integrating other AI services like computer vision, natural language processing (NLP), AI search, and even speech recognition

In the sections that follow, we’ll explore each of these capabilities in more detail.

👁️ AI That Sees: Computer Vision

Let’s say you have a picture of a place you want to visit but don’t know where it is. AI with computer vision capabilities can recognise the location and even help guide you there.

Computer vision refers to AI’s ability to see, interpret, and understand images or videos. Behind this capability is a machine learning model specifically trained for visual tasks. It can identify distinct features in images, differentiate one object from another, and even detect multiple items within a single image—a process known as object detection. Even better, it can work in sync with other language models to create a multi-modal model AI system.This means AI can not only see what is in the photo, but also describe it and answer questions about it – for example, if you are planning a trip and need more details about a location.

🧏‍♂️ AI That Listens and Talks: Speech Recognition & Synthesis

We do not need to text or upload images all the time—we can have a verbal conversation with AI as well. How cool is that! This is made possible by capabilities like speech recognition (speech-to-text) and speech synthesis (text-to-speech).

  • Speech Recognition

Unlike humans—who understand speech directly through tone, context, and meaning—AI must first convert spoken language into text in order to understand it. This process, known as speech recognition, allows AI to “read” our words before generating a response. So, every time we interact verbally with AI, there’s a hidden step where it transcribes our voice into text to understand what we’re saying.

When we speak, AI pick up the sound as much as it can, then break the sound into multiple sound patterns or piece of words — also called phonemes. Behind the scene, those phonemes are fed into a language ML model that has been trained to predict the words. This is how voice assistants like Siri or Google Assistant knows what exact words you are saying — even before they process the meaning.

  • Speech Synthesis

Just like it can listen, AI can also talk. Speech synthesis, or text-to-speech, allows AI to turn written text into spoken words. For example, navigation apps typically generate spoken directions from pre-written or dynamically generated text (like “Turn left in 200 meters”). That text may not be visible to you, but it exists behind the scenes as part of the app’s logic. This is how your navigation apps or digital assistants respond with a voice that feels natural and helpful.

  • Translation

This is an extra helpful layer that allows AI to have conversations not just in one language, but across multiple language. Behind the scenes, the ML model is trained to understand the structure of different languages. You can say something in Japanese, and it replies in English in real-time — with a pretty decent accent too.

But wait—in the sections above, we’ve seen that AI can process speech or text, and even translate them into multiple languages. However, what we haven’t yet explored is whether AI can truly understand what we mean. We’ll answer that in the next section.

🧠 AI That Understands: Natural Language Processing (NLP)

This gives AI the ability to reason through various types of input—text, images, video, code, and more. Natural Language Processing (NLP) is also the power behind many of Generative AI’s capabilities. In the context of a chat, GenAI relies on NLP to understand prompts, and then uses a specialised language model called Generative Pretrained Transformer (GPT) to generate a response. Without understanding capability, AI could generate nonsense or miss the intent of your questions. In fact, you could say that NLP is at the heart of AI development, shaping how intelligent AI systems can become.

Today, NLP enables AI to perform a range of language-related tasks, including:

  • Entity Extraction: Identifying specific names, places, organisations..
  • Text Classification: Sorting documents or messages by topic or category
  • Sentiment Analysis: Detecting the emotional tone (positive, negative, neutral)
  • Language Detection: Determining what language a piece of text is written in

Behind the scene, NLP models are typically trained using labeled data (supervised learning) or unlabeled data (unsupervised learning), and then used to analysed new content.

  • Supervised learning

In this approach, a large dataset that includes both input data and the correct output is used to teach the model how to map the correct input-output pairs. This enables the model to learn the association between specific inputs and their corresponding outputs. Supervised learning is particularly effective for tasks that require high precision. In fact, many real-world applications still rely on labeled data. For example, in medical image analysis or financial fraud detection, labeled datasets are essential for training accurate models. Additionally, supervised learning is often used to fine-tune models initially trained with unsupervised learning techniques. It is widely applied in fields such as computer vision, speech recognition, natural language processing (NLP), and recommendation systems.

  • Unsupervised learning

The dataset does not have pre-determined labels. However, in natural language processing (NLP), a specific approach known as self-supervised learning allows the model to generate its own pseudo-labels during training by using parts of the text to predict other parts. One common method is predicting the next word in a sequence, similar to an auto-complete feature, based on the surrounding context. This allows the model to learn patterns, relationships, and structures of language— also called sematic relationships between elements, from the unlabeled data. For instance, it can recognise that the word “dog” often appears with words like “bark” in sentences. As the model encounters more data over time, it adjusts its internal parameters, learns from its errors, and improves its accuracy. This self-supervised technique underlies many state-of-the-art language models such as GPT and BERT.

📄 AI That Proceeds Documents: Document Intelligence

AI don’t just read text. It can understand documents and extract useful information.

Document intelligence use a technology called Optical Character Recognition behind the scene to read text from images or scanned documents. This enables AI to capture the entire text or content from the input, also recognise document structure with key-value pairs, tables, etc. From there, AI can go further by classifying document types (such as invoices, contract, etc) extract key data and understand the context.

One core component of document intelligence services is Form Recognition – a technology designed to process semi-structured or structured data that contains defined fields, rows, and columns. Behind the scenes, form recognition involves a pipeline that begins with Optical Character Recognition (OCR) to extract text, followed by advanced machine learning models that analyse the form layout using techniques like LayoutLM and Graph Convolutional Networks (GCNs). Natural Language Processing (NLP) is then applied to efficiently understand the context. This multimodal approach is crucial because understanding the spatial arrangement helps disambiguate fields and interpret context in forms. It enables AI to accurately extract structured data from forms and images, facilitating automated processing and deeper document understanding.

For unstructured documents, such as free-form contracts or lengthy texts that consist of large blocks of continuous text with minimal reliance on spatial structure, OCR is still used to extract the text. NLP techniques are then applied to interpret and understand the content without depending on predefined layouts or fields. However, the use of models like LayoutLM and Graph Convolutional Networks (GCNs) is less common in this context. Instead, the focus is primarily on the semantic understanding of the text rather than on spatial relationships, which requires more advanced NLP models.

🕵️‍♀️ AI That Searches: Cognitive Search (AI Search)

Azure AI Search (formerly known as Azure Cognitive Search) was renamed in October 2023 to better reflect its enhanced AI enrichment capabilities that transform raw data into actionable insights. Originally, it is a cloud-based, enterprise-ready search service designed to provide fast and intelligent retrieval of information from both individual documents and large, diverse data sources. Azure AI Search acts as the backbone of knowledge mining solutions, enabling AI to “think”. It integrates seamlessly with advanced analytics and visualisation tools such as Power BI, Azure Machine Learning, and Azure Databricks, empowering organisations to build end-to-end AI-powered applications.

This service is essential for building modern AI applications—such as generative AI—that require intelligent search capabilities, including enterprise document search, conversational AI, and retrieval-augmented generation (RAG) scenarios, where relevant information is dynamically retrieved to support the generative model’s output.

Behind the scene, AI search is built on the combination of traditional search function with advanced AI techniques like NLP, semantic ranking and vector search. Instead of matching exact words, vector search finds items that are semantically similar by measuring the distance between vectors in a high-dimensional space using nearest neighbour algorithms. This enables more context-aware search results, include in results documents related in meaning even if they don’t share exact keywords.

It also supports rich query capabilities such as fuzzy matching, autocomplete, geo-spatial search, and hybrid queries that combine vector and lexical search to improve relevance and recall. Fuzzy matching enables the engine to return results even when there are typos, misspellings, or slight variations in the search term. Geo-spatial search allows queries based on geographic coordinates like latitude and longitude. For example, it can identify all points of interest within a specific radius (“find restaurants within 5 kilometers”). Azure AI Search supports geo-spatial queries by indexing location data and enabling filters or ranking based on distance or defined geographic areas. This is especially useful for map-based interfaces and location-aware applications.

💬AI That Thinks: Knowledge Mining

This capability often goes hand in hand with document intelligence and Azure AI search, empowering AI to dig into unstructured information and extract patterns, keywords, and insights that would take a human hours to uncover. It is especially valuable for researchers, paralegals, and analysts, as it helps them find the signal in the noise across all sources of information-whether audio, images, text, or videos.

Behind the scenes, Knowledge Mining leverages Azure AI Search, which can include Document Intelligence as part of its enrichment pipeline. This means there’s no need to implement those services separately when using a skillset-based approach. Additional Azure AI services, such as Computer Vision, Language Understanding, or custom AI models, can be integrated to enhance understanding of handwriting, images, or domain-specific content.

The indexed and enriched data is stored within Azure, enabling fast and intelligent retrieval. For further analysis or visualisation, tools like Power BI, Azure Machine Learning, and Azure Databricks can be seamlessly integrated into the solution.

Furthermore, Azure AI Search can be connected to Azure OpenAI, enabling users to interact with the indexed content through natural language conversations — essentially allowing them to “chat” with their own data.This technique is known as Retrieval-Augmented Generation (RAG), where relevant content is first retrieved from the search index and then passed as context to a large language model (like GPT) to generate accurate and contextual responses.

Final Thoughts

From talking to AI like a friend, to asking it to draw, write, translate, or even help us find hidden insights in documents—AI is no longer just a futuristic concept. It’s already deeply woven into our daily lives. Whether we realise it or not, we increasingly rely on it for entertainment, productivity, decision-making, and discovery. While AI still lacks consciousness, identity, or true emotion, it has reached a point where it can collaborate with us, simplify our work, and even spark creativity in us. As AI continues to evolve, so will our relationship with it. The more we understand how it works, the better we can use it to solve real problems, and explore new possibilities. Who knows—maybe one day, like the robot boy from my favourite movie, AI might feel even more “real” than we ever thought possible.

By Kat

One thought on “AI 900 PREP : Understanding essential AI concepts”

Leave a Reply

Your email address will not be published. Required fields are marked *