Skip to content
This repository has been archived by the owner on May 28, 2020. It is now read-only.

Normalize tag data model #57

Open
kousu opened this issue May 6, 2019 · 1 comment
Open

Normalize tag data model #57

kousu opened this issue May 6, 2019 · 1 comment

Comments

@kousu
Copy link

kousu commented May 6, 2019

I took a look at the SQL dump and found that there is a TagCache table listing the tags, but that it is not related to the Stations table.

The current database has tags just stored as a string delimited by commas:

(57613, 104897, 'Radio Progreso y ERIC - El Progreso', 'http://192.99.233.46:8900/;', 'http://radioprogresohn.net/', 'http://radioprogresohn.net/media/com_zo2framework/images/favicon.ico', '2017-12-02 04:26:29', 'Honduras', 'Yoro', 'Spanish', 'derechos humanos,jesuit,el progreso', 35, 0, '24.246.81.175', '9649a4df-0601-11e8-ae97-52543be04c81','9649a4db-0601-11e8-ae97-52543be04c81')

(here the tags are ["derechos humanos", "jesuit", "el progreso"])

and there's a /cache/ which I assume gets rebuilt manually somewhere:

DROP TABLE IF EXISTS `TagCache`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `TagCache` (
  `TagName` varchar(100) COLLATE utf8_bin NOT NULL,
  `StationCount` int(11) DEFAULT '0',
  `StationCountWorking` int(11) DEFAULT '0',
  PRIMARY KEY (`TagName`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
/*!40101 SET character_set_client = @saved_cs_client */;

This situation is a many-to-many relationship and it would be cleaner to use an intermediate "junction" table:

CREATE TABLE `Station` (
  `StationID` int(11) NOT NULL AUTO_INCREMENT,
  -- ...
  -- *no* `Tags` field here
  -- ...
  PRIMARY KEY (`StationID`)
)

CREATE TABLE `Tag` (
     `id` int(11) NOT NULL AUTO_INCREMENT,
     `name` varchar(100) COLLATE utf8_bin NOT NULL,
     PRIMARY KEY (`id`) 
)

CREATE TABLE `Station_Tag` (
    `StationID` int(11) NOT NULL,
    `TagID` int(11) NOT NULL
    FOREIGN KEY (StationID)
        REFERENCES Station(id),
    FOREIGN KEY (TagID)
        REFERENCES Tag(id)
)

I can't find where in the code TagCache gets populated; is there a cronjob that's not checked into the repo or am I missing something obvious? In any case, you could keep the table working by rewriting it as a view, and if performance becomes a problem you could materialize it (unfortunately with MySQL you need to write this manually). I think this should do it, but I haven't tested it, so it's just to show the idea:

CREATE VIEW TagCache AS
 SELECT
   Tag.id as TagID,
   Tag.name as name,
   count(*) as StationCount,
   count(Station.working = 1) as StationCountWorking
 FROM
 -- this inner select deals with Stations having two IDs: a UUID and a SQL ID column
 (SELECT Station.id as id, StationCheck.CheckOK as working
  FROM
   Station
  INNER JOIN
   StationCheck
  WHERE Station.StationUuid = StationCheck.StationUuid) Station
 INNER JOIN Station_Tag where Station.id = Station_Tag.StationID
 INNER JOIN Tag where Tag.id = Station_Tag.TagID
 GROUP BY TagID

I would like to write a migration to implement this change. It would make #56 easier for me.

@kousu
Copy link
Author

kousu commented May 6, 2019

(this should also save a bunch of storage since it compresses tags to 4 byte integers instead of repeating their whole names in each station entry)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant