|10. NLP — Natural Language Processing
Chapter 10Artificial Intelligence~1 min read

NLP — Natural Language Processing

Text समजणे आणि Process करणे

Natural Language Processing (NLP) म्हणजे computers ला human language समजवणे — text analyze करणे, sentiment detect करणे, translation, summarization, chatbots. ChatGPT आणि Google Translate — दोन्ही NLP आहेत.

Text Preprocessing

Basic NLP pipeline

python
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

nltk.download('punkt')
nltk.download('stopwords')

text = "Machine Learning is transforming the world! AI models are amazing."

# 1. Lowercase
text = text.lower()

# 2. Remove punctuation
text = re.sub(r'[^a-zA-Zs]', '', text)

# 3. Tokenize — words मध्ये split
tokens = word_tokenize(text)
print(tokens)
# ['machine', 'learning', 'is', 'transforming', ...]

# 4. Stop words remove (is, the, are, a...)
stop_words = set(stopwords.words('english'))
tokens = [t for t in tokens if t not in stop_words]
print(tokens)
# ['machine', 'learning', 'transforming', 'world', 'ai', 'models', 'amazing']

# 5. Stemming — words reduce to root
stemmer = PorterStemmer()
stems = [stemmer.stem(t) for t in tokens]
print(stems)  # ['machin', 'learn', 'transform', ...]

Sentiment Analysis

Sentiment analysis with transformers

python
from transformers import pipeline

# Pre-trained sentiment model load करा
sentiment = pipeline("sentiment-analysis")

texts = [
    "हे tutorial खूप छान आहे! मला खूप आवडलं.",
    "Service खराब आहे, बिल्कुल recommend नाही.",
    "Product ठीक आहे, खूप चांगलं नाही पण वाईटही नाही."
]

for text in texts:
    result = sentiment(text)
    print(f"Text: {text[:40]}...")
    print(f"Sentiment: {result[0]['label']} ({result[0]['score']:.2%})")
    print()

Word Embeddings

Words ला numbers (vectors) मध्ये convert करणे — यालाच Word Embeddings म्हणतात. "King" - "Man" + "Woman" = "Queen" — हे vector arithmetic! Similar words similar vectors असतात.

  • Word2Vec — shallow neural network, word vectors
  • GloVe — co-occurrence statistics based
  • BERT Embeddings — context-aware, same word different context = different vector
  • Sentence Transformers — entire sentence embedding
  • Semantic search, recommendation systems साठी embeddings खूप useful

Key Points — लक्षात ठेवा

  • NLP pipeline: tokenize → clean → vectorize → model
  • Sentiment Analysis: positive/negative/neutral
  • Embeddings: words → numbers (vectors)
  • Transformers library (HuggingFace): pre-trained models
  • spaCy: fast, production NLP library
0/11 chapters पूर्ण