Friday 8 May 2020

Machine Learning with BigQuery [Trailer]




Are you excited to understand Machine Leaning(ML) basic and do some experiment. This blog is best fit for you. This is ML With BigQuery just Trailer.

BigQuery is Google cloud based Big Data Data warehouse it provides machine learning capability as well.

 BigQuery also provides Public dataset where you can play around, so no need to waste time on populate huge test data. Here we will try to trained a Model and then will give some sample and test model will classify the text source.

Step-1 : Create Dataset for Input
WITH
  extracted AS (
  SELECT
    source,
    REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title
  FROM (
    SELECT
      ARRAY_REVERSE( SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.'))[
    OFFSET
      (1)]AS source,
      url,
      title
    FROM
      `bigquery-public-data.hacker_news.stories`
    WHERE
      REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.com$')
      AND LENGTH(title )>10 )T ),
  DS AS (
  SELECT
    ARRAY_CONCAT(SPLIT(title," "),['NULL','NULL','NULL','NULL','NULL']) AS word,
    source
  FROM
    extracted
  WHERE
    (source ='github'
      OR source='nytimes'
      OR source='techcrunch' ) )
SELECT
  source,
  word[
OFFSET
  (0)] AS word1,
  word[
OFFSET
  (1)] AS word2,
  word[
OFFSET
  (2)] AS word3,
  word[
OFFSET
  (3)] AS word4,
  word[
OFFSET
  (4)] AS word5,
FROM
  ds 

Step-2 : Create a Model and Assign your Dataset to Train Model
CREATE OR REPLACE MODEL
  `total-pillar-275405.VIKAS_ML.textclass` OPTIONS (model_type='logistic_reg',
    input_label_cols=['source']) AS
WITH
  extracted AS (
  SELECT
    source,
    REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title
  FROM (
    SELECT
      ARRAY_REVERSE( SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.'))[
    OFFSET
      (1)]AS source,
      url,
      title
    FROM
      `bigquery-public-data.hacker_news.stories`
    WHERE
      REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'),'.com$')
      AND LENGTH(title )>10 )T ),
  DS AS (
  SELECT
    ARRAY_CONCAT(SPLIT(title," "),['NULL','NULL','NULL','NULL','NULL']) AS word,
    source
  FROM
    extracted
  WHERE
    (source ='github'
      OR source='nytimes'
      OR source='techcrunch' ) )
SELECT
  source,
  word[
OFFSET
  (0)] AS word1,
  word[
OFFSET
  (1)] AS word2,
  word[
OFFSET
  (2)] AS word3,
  word[
OFFSET
  (3)] AS word4,
  word[
OFFSET
  (4)] AS word5,
FROM
  ds 


Step-3 : Validate Model how accurate it will be 
/* Validate accuracy*/
SELECT
  *
FROM
  ML.EVALUATE(MODEL `total-pillar-275405.VIKAS_ML.textclass`)
  --accuracy :0.8148073431510994 means 81%


Step-4 : Test Model with live data 
 /* Testing Model*/
SELECT
  *
FROM
  ML.PREDICT(MODEL `total-pillar-275405.VIKAS_ML.textclass`,
    (
    SELECT
      'government'word1,
      'shutdown' word2,
      'leave' word3,
      'workers' word4,
      'reeling' word5
    UNION ALL
    SELECT
      'unlikly',
      'parternership',
      'in',
      'house',
      'gives'
    UNION ALL
    SELECT
      'downloading',
      'the',
      'android',
      'studio',
      'project'
    UNION ALL
    SELECT
      'Facing',
      'criticism',
      'on',
      'the',
      'pandemic'
       UNION ALL
    SELECT
      'chekin',
      'commit',
      null,
      null,
      null
      ))

Now here you Go, you can input many different text and try it, I hope you enjoyed.
:)

Reference : https://www.coursera.org/ Training



9 comments:

  1. Thanks for providing such a valuable Knowledge on Data Analysis With Python. I have learned many things with this blog. Keep sharing.Very knowledgeable Blog.

    ReplyDelete
  2. Very knowledgeable Blog.Thanks for providing such a valuable Knowledge on Data Analysis With Python.

    ReplyDelete
  3. I really like your blog about Machine Learning and the use of BigQuery your blog post could prove helpful to people who are interested in the field, if you want you could check
    data science course

    ReplyDelete
    Replies
    1. https://www.skillslash.com/ they have good information about data science, machine learning, artificial intelligence that could provide some help

      Delete
  4. I am really very happy to visit your blog. Directly I am found which I truly need. please visit our website for more information
    Data Scraping Service in India 2022

    ReplyDelete

Data Mesh

  Data Mesh is a relatively new approach to managing and organizing data within organizations, especially large enterprises. It advocates fo...