Friday 8 May 2020

Machine Learning with BigQuery [Trailer]




Are you excited to understand Machine Leaning(ML) basic and do some experiment. This blog is best fit for you. This is ML With BigQuery just Trailer.

BigQuery is Google cloud based Big Data Data warehouse it provides machine learning capability as well.

 BigQuery also provides Public dataset where you can play around, so no need to waste time on populate huge test data. Here we will try to trained a Model and then will give some sample and test model will classify the text source.

Step-1 : Create Dataset for Input
WITH
  extracted AS (
  SELECT
    source,
    REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title
  FROM (
    SELECT
      ARRAY_REVERSE( SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.'))[
    OFFSET
      (1)]AS source,
      url,
      title
    FROM
      `bigquery-public-data.hacker_news.stories`
    WHERE
      REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.com$')
      AND LENGTH(title )>10 )T ),
  DS AS (
  SELECT
    ARRAY_CONCAT(SPLIT(title," "),['NULL','NULL','NULL','NULL','NULL']) AS word,
    source
  FROM
    extracted
  WHERE
    (source ='github'
      OR source='nytimes'
      OR source='techcrunch' ) )
SELECT
  source,
  word[
OFFSET
  (0)] AS word1,
  word[
OFFSET
  (1)] AS word2,
  word[
OFFSET
  (2)] AS word3,
  word[
OFFSET
  (3)] AS word4,
  word[
OFFSET
  (4)] AS word5,
FROM
  ds 

Step-2 : Create a Model and Assign your Dataset to Train Model
CREATE OR REPLACE MODEL
  `total-pillar-275405.VIKAS_ML.textclass` OPTIONS (model_type='logistic_reg',
    input_label_cols=['source']) AS
WITH
  extracted AS (
  SELECT
    source,
    REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title
  FROM (
    SELECT
      ARRAY_REVERSE( SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.'))[
    OFFSET
      (1)]AS source,
      url,
      title
    FROM
      `bigquery-public-data.hacker_news.stories`
    WHERE
      REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.com$')
      AND LENGTH(title )>10 )T ),
  DS AS (
  SELECT
    ARRAY_CONCAT(SPLIT(title," "),['NULL','NULL','NULL','NULL','NULL']) AS word,
    source
  FROM
    extracted
  WHERE
    (source ='github'
      OR source='nytimes'
      OR source='techcrunch' ) )
SELECT
  source,
  word[
OFFSET
  (0)] AS word1,
  word[
OFFSET
  (1)] AS word2,
  word[
OFFSET
  (2)] AS word3,
  word[
OFFSET
  (3)] AS word4,
  word[
OFFSET
  (4)] AS word5,
FROM
  ds 


Step-3 : Validate Model how accurate it will be 
/* Validate accuracy*/
SELECT
  *
FROM
  ML.EVALUATE(MODEL `total-pillar-275405.VIKAS_ML.textclass`)
  --accuracy :0.8148073431510994 means 81%


Step-4 : Test Model with live data 
 /* Testing Model*/
SELECT
  *
FROM
  ML.PREDICT(MODEL `total-pillar-275405.VIKAS_ML.textclass`,
    (
    SELECT
      'government'word1,
      'shutdown' word2,
      'leave' word3,
      'workers' word4,
      'reeling' word5
    UNION ALL
    SELECT
      'unlikly',
      'parternership',
      'in',
      'house',
      'gives'
    UNION ALL
    SELECT
      'downloading',
      'the',
      'android',
      'studio',
      'project'
    UNION ALL
    SELECT
      'Facing',
      'criticism',
      'on',
      'the',
      'pandemic'
       UNION ALL
    SELECT
      'chekin',
      'commit',
      null,
      null,
      null
      ))

Now here you Go, you can input many different text and try it, I hope you enjoyed.
:)

Reference : https://www.coursera.org/ Training



Data Mesh

  Data Mesh is a relatively new approach to managing and organizing data within organizations, especially large enterprises. It advocates fo...