Chapter 10Artificial Intelligence~1 min read

NLP — Natural Language Processing

Text समजणे आणि Process करणे

Natural Language Processing (NLP) म्हणजे computers ला human language समजवणे — text analyze करणे, sentiment detect करणे, translation, summarization, chatbots. ChatGPT आणि Google Translate — दोन्ही NLP आहेत.

Text Preprocessing

Basic NLP pipeline

python

import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

nltk.download('punkt')
nltk.download('stopwords')

text = "Machine Learning is transforming the world! AI models are amazing."

# 1. Lowercase
text = text.lower()

# 2. Remove punctuation
text = re.sub(r'[^a-zA-Zs]', '', text)

# 3. Tokenize — words मध्ये split
tokens = word_tokenize(text)
print(tokens)
# ['machine', 'learning', 'is', 'transforming', ...]

# 4. Stop words remove (is, the, are, a...)
stop_words = set(stopwords.words('english'))
tokens = [t for t in tokens if t not in stop_words]
print(tokens)
# ['machine', 'learning', 'transforming', 'world', 'ai', 'models', 'amazing']

# 5. Stemming — words reduce to root
stemmer = PorterStemmer()
stems = [stemmer.stem(t) for t in tokens]
print(stems)  # ['machin', 'learn', 'transform', ...]

Sentiment Analysis

Sentiment analysis with transformers

python

from transformers import pipeline

# Pre-trained sentiment model load करा
sentiment = pipeline("sentiment-analysis")

texts = [
    "हे tutorial खूप छान आहे! मला खूप आवडलं.",
    "Service खराब आहे, बिल्कुल recommend नाही.",
    "Product ठीक आहे, खूप चांगलं नाही पण वाईटही नाही."
]

for text in texts:
    result = sentiment(text)
    print(f"Text: {text[:40]}...")
    print(f"Sentiment: {result[0]['label']} ({result[0]['score']:.2%})")
    print()

Word Embeddings

Words ला numbers (vectors) मध्ये convert करणे — यालाच Word Embeddings म्हणतात. "King" - "Man" + "Woman" = "Queen" — हे vector arithmetic! Similar words similar vectors असतात.

▸Word2Vec — shallow neural network, word vectors
▸GloVe — co-occurrence statistics based
▸BERT Embeddings — context-aware, same word different context = different vector
▸Sentence Transformers — entire sentence embedding
▸Semantic search, recommendation systems साठी embeddings खूप useful

✅ Key Points — लक्षात ठेवा

▸NLP pipeline: tokenize → clean → vectorize → model
▸Sentiment Analysis: positive/negative/neutral
▸Embeddings: words → numbers (vectors)
▸Transformers library (HuggingFace): pre-trained models
▸spaCy: fast, production NLP library

0/11 chapters पूर्ण

मागे — Previous

AI APIs — Apps मध्ये AI Add करा

पुढे — Next

AI Career आणि Resources