Chapter 10Artificial Intelligence~1 min read
NLP — Natural Language Processing
Text समजणे आणि Process करणे
Natural Language Processing (NLP) म्हणजे computers ला human language समजवणे — text analyze करणे, sentiment detect करणे, translation, summarization, chatbots. ChatGPT आणि Google Translate — दोन्ही NLP आहेत.
Text Preprocessing
Basic NLP pipeline
python
import re
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
nltk.download('punkt')
nltk.download('stopwords')
text = "Machine Learning is transforming the world! AI models are amazing."
# 1. Lowercase
text = text.lower()
# 2. Remove punctuation
text = re.sub(r'[^a-zA-Zs]', '', text)
# 3. Tokenize — words मध्ये split
tokens = word_tokenize(text)
print(tokens)
# ['machine', 'learning', 'is', 'transforming', ...]
# 4. Stop words remove (is, the, are, a...)
stop_words = set(stopwords.words('english'))
tokens = [t for t in tokens if t not in stop_words]
print(tokens)
# ['machine', 'learning', 'transforming', 'world', 'ai', 'models', 'amazing']
# 5. Stemming — words reduce to root
stemmer = PorterStemmer()
stems = [stemmer.stem(t) for t in tokens]
print(stems) # ['machin', 'learn', 'transform', ...]Sentiment Analysis
Sentiment analysis with transformers
python
from transformers import pipeline
# Pre-trained sentiment model load करा
sentiment = pipeline("sentiment-analysis")
texts = [
"हे tutorial खूप छान आहे! मला खूप आवडलं.",
"Service खराब आहे, बिल्कुल recommend नाही.",
"Product ठीक आहे, खूप चांगलं नाही पण वाईटही नाही."
]
for text in texts:
result = sentiment(text)
print(f"Text: {text[:40]}...")
print(f"Sentiment: {result[0]['label']} ({result[0]['score']:.2%})")
print()Word Embeddings
Words ला numbers (vectors) मध्ये convert करणे — यालाच Word Embeddings म्हणतात. "King" - "Man" + "Woman" = "Queen" — हे vector arithmetic! Similar words similar vectors असतात.
- ▸Word2Vec — shallow neural network, word vectors
- ▸GloVe — co-occurrence statistics based
- ▸BERT Embeddings — context-aware, same word different context = different vector
- ▸Sentence Transformers — entire sentence embedding
- ▸Semantic search, recommendation systems साठी embeddings खूप useful
✅ Key Points — लक्षात ठेवा
- ▸NLP pipeline: tokenize → clean → vectorize → model
- ▸Sentiment Analysis: positive/negative/neutral
- ▸Embeddings: words → numbers (vectors)
- ▸Transformers library (HuggingFace): pre-trained models
- ▸spaCy: fast, production NLP library
0/11 chapters पूर्ण