Data to Analytics: May 2020

Are you excited to understand Machine Leaning(ML) basic and do some experiment. This blog is best fit for you. This is ML With BigQuery just Trailer.

BigQuery is Google cloud based Big Data Data warehouse it provides machine learning capability as well.

BigQuery also provides Public dataset where you can play around, so no need to waste time on populate huge test data. Here we will try to trained a Model and then will give some sample and test model will classify the text source.

Step-1 : Create Dataset for Input

WITH
extracted AS (
SELECT
source,
REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title
FROM (
SELECT
ARRAY_REVERSE( SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.'))[
OFFSET
(1)]AS source,
url,
title
FROM
`bigquery-public-data.hacker_news.stories`
WHERE
REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.com$')
AND LENGTH(title )>10 )T ),
DS AS (
SELECT
ARRAY_CONCAT(SPLIT(title," "),['NULL','NULL','NULL','NULL','NULL']) AS word,
source
FROM
extracted
WHERE
(source ='github'
OR source='nytimes'
OR source='techcrunch' ) )
SELECT
source,
word[
OFFSET
(0)] AS word1,
word[
OFFSET
(1)] AS word2,
word[
OFFSET
(2)] AS word3,
word[
OFFSET
(3)] AS word4,
word[
OFFSET
(4)] AS word5,
FROM
ds

Step-2 : Create a Model and Assign your Dataset to Train Model

CREATE OR REPLACE MODEL
`total-pillar-275405.VIKAS_ML.textclass` OPTIONS (model_type='logistic_reg',
input_label_cols=['source']) AS
WITH
extracted AS (
SELECT
source,
REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title
FROM (
SELECT
ARRAY_REVERSE( SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.'))[
OFFSET
(1)]AS source,
url,
title
FROM
`bigquery-public-data.hacker_news.stories`
WHERE
REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.com$')
AND LENGTH(title )>10 )T ),
DS AS (
SELECT
ARRAY_CONCAT(SPLIT(title," "),['NULL','NULL','NULL','NULL','NULL']) AS word,
source
FROM
extracted
WHERE
(source ='github'
OR source='nytimes'
OR source='techcrunch' ) )
SELECT
source,
word[
OFFSET
(0)] AS word1,
word[
OFFSET
(1)] AS word2,
word[
OFFSET
(2)] AS word3,
word[
OFFSET
(3)] AS word4,
word[
OFFSET
(4)] AS word5,
FROM
ds

Step-3 : Validate Model how accurate it will be

/* Validate accuracy*/
SELECT
*
FROM
ML.EVALUATE(MODEL `total-pillar-275405.VIKAS_ML.textclass`)
--accuracy :0.8148073431510994 means 81%

Step-4 : Test Model with live data

/* Testing Model*/
SELECT
*
FROM
ML.PREDICT(MODEL `total-pillar-275405.VIKAS_ML.textclass`,
(
SELECT
'government'word1,
'shutdown' word2,
'leave' word3,
'workers' word4,
'reeling' word5
UNION ALL
SELECT
'unlikly',
'parternership',
'in',
'house',
'gives'
UNION ALL
SELECT
'downloading',
'the',
'android',
'studio',
'project'
UNION ALL
SELECT
'Facing',
'criticism',
'on',
'the',
'pandemic'
UNION ALL
SELECT
'chekin',
'commit',
null,
null,
null
))

Now here you Go, you can input many different text and try it, I hope you enjoyed.
:)

Reference : https://www.coursera.org/ Training

Data to Analytics

Pages

Friday 8 May 2020

Machine Learning with BigQuery [Trailer]

Data Mesh

Search This Blog