Are you excited to understand Machine Leaning(ML) basic and do some experiment. This blog is best fit for you. This is ML With BigQuery just Trailer.
BigQuery is Google cloud based Big Data Data warehouse it provides machine learning capability as well.
BigQuery also provides Public dataset where you can play around, so no need to waste time on populate huge test data. Here we will try to trained a Model and then will give some sample and test model will classify the text source.
Step-1 : Create Dataset for Input
WITH
extracted AS (
SELECT
source,
REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title
FROM (
SELECT
ARRAY_REVERSE( SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.'))[
OFFSET
(1)]AS source,
url,
title
FROM
`bigquery-public-data.hacker_news.stories`
WHERE
REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.com$')
AND LENGTH(title )>10 )T ),
DS AS (
SELECT
ARRAY_CONCAT(SPLIT(title," "),['NULL','NULL','NULL','NULL','NULL']) AS word,
source
FROM
extracted
WHERE
(source ='github'
OR source='nytimes'
OR source='techcrunch' ) )
SELECT
source,
word[
OFFSET
(0)] AS word1,
word[
OFFSET
(1)] AS word2,
word[
OFFSET
(2)] AS word3,
word[
OFFSET
(3)] AS word4,
word[
OFFSET
(4)] AS word5,
FROM
ds
Step-2 : Create a Model and Assign your Dataset to Train Model
CREATE OR REPLACE MODEL
`total-pillar-275405.VIKAS_ML.textclass` OPTIONS (model_type='logistic_reg',
input_label_cols=['source']) AS
WITH
extracted AS (
SELECT
source,
REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title
FROM (
SELECT
ARRAY_REVERSE( SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.'))[
OFFSET
(1)]AS source,
url,
title
FROM
`bigquery-public-data.hacker_news.stories`
WHERE
REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.com$')
AND LENGTH(title )>10 )T ),
DS AS (
SELECT
ARRAY_CONCAT(SPLIT(title," "),['NULL','NULL','NULL','NULL','NULL']) AS word,
source
FROM
extracted
WHERE
(source ='github'
OR source='nytimes'
OR source='techcrunch' ) )
SELECT
source,
word[
OFFSET
(0)] AS word1,
word[
OFFSET
(1)] AS word2,
word[
OFFSET
(2)] AS word3,
word[
OFFSET
(3)] AS word4,
word[
OFFSET
(4)] AS word5,
FROM
ds
Step-3 : Validate Model how accurate it will be
/* Validate accuracy*/
SELECT
*
FROM
ML.EVALUATE(MODEL `total-pillar-275405.VIKAS_ML.textclass`)
--accuracy :0.8148073431510994 means 81%
Step-4 : Test Model with live data
/* Testing Model*/
SELECT
*
FROM
ML.PREDICT(MODEL `total-pillar-275405.VIKAS_ML.textclass`,
(
SELECT
'government'word1,
'shutdown' word2,
'leave' word3,
'workers' word4,
'reeling' word5
UNION ALL
SELECT
'unlikly',
'parternership',
'in',
'house',
'gives'
UNION ALL
SELECT
'downloading',
'the',
'android',
'studio',
'project'
UNION ALL
SELECT
'Facing',
'criticism',
'on',
'the',
'pandemic'
UNION ALL
SELECT
'chekin',
'commit',
null,
null,
null
))
Now here you Go, you can input many different text and try it, I hope you enjoyed.
:)
Reference : https://www.coursera.org/ Training
Thanks for providing such a valuable Knowledge on Data Analysis With Python. I have learned many things with this blog. Keep sharing.Very knowledgeable Blog.
ReplyDeleteThank you
DeleteVery knowledgeable Blog.Thanks for providing such a valuable Knowledge on Data Analysis With Python.
ReplyDeleteThank you
DeleteThe blog is more impressive... I liked it!!!
ReplyDelete.net coaching centre in chennai
dot net online training
dot net course in coimbatore
Thanks Hemapriya
DeleteI really like your blog about Machine Learning and the use of BigQuery your blog post could prove helpful to people who are interested in the field, if you want you could check
ReplyDeletedata science course
https://www.skillslash.com/ they have good information about data science, machine learning, artificial intelligence that could provide some help
DeleteI am really very happy to visit your blog. Directly I am found which I truly need. please visit our website for more information
ReplyDeleteData Scraping Service in India 2022