All we need is an easy explanation of the problem, so here it is.
My purpose is to monitor which tags increase/decrease rapidly last week on different bulletin boards. Or which tag do new appear. The current table schema of articles is as follows:
article_id e.g. 123
tags e.g. apple,orange
article_created_at e.g. 2021/5/7 01:23
borad_name e.g. database
author e.g. some_person
My current idea is to calculate the amount of each tag by board weekly. But I want to survey other solutions to reduce the development time. Is it ok to choose a time-series database engine or other possible solution? Thanks for your time.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
Having a comma-separated list of tags would save on the number of rows written to a database, but would make querying far more computationally expensive. Typically this sort of table would have one record per tag, as this would allow for simpler queries to be written. A structure similar to something you’d find in a blogging database would work. For example:
TagHistory
article_id
tag
flat
posted_at
board_name
author
Then you would have records like this
article_id |
tag |
flat |
posted_at |
board_name |
author |
---|---|---|---|---|---|
123 | pokémon | pokemon | 2021-05-01 00:00:00 | games | BillyBob |
129 | PokEmon | pokemon | 2021-05-01 00:07:41 | games | Jackie |
129 | cards | cards | 2021-05-01 00:07:41 | games | Jackie |
129 | Deck | deck | 2021-05-01 00:07:41 | games | Jackie |
147 | MewTwo | mewtwo | 2021-05-01 00:11:12 | games | Archibald |
The flat
field would contain a lower-case tag
value with all accents and other non-alphanumeric components removed so that values like pokémon
, PokEmon
, and Pokemon
do not appear as three separate values.
With this sort of design, you could then begin writing queries that could rank tag usage over a period of time. If you were interested in the five most popular tags from last week, you could write a query like this:
SELECT WEEK(posted_at) as week_no, flat, COUNT(article_id) as usages
FROM TagHistory
WHERE `posted_at` BETWEEN DATE_FORMAT(DATE_SUB(NOW(), INTERVAL DAYOFWEEK(NOW()) + 6 DAY), '%Y-%m-%d 00:00:00')
AND DATE_FORMAT(DATE_SUB(NOW(), INTERVAL DAYOFWEEK(NOW()) -1 DAY), '%Y-%m-%d 23:59:59')
GROUP BY week_no, flat
ORDER BY usages DESC
LIMIT 5;
Alternatively you could do it by board, or by author, or across a period of weeks or months. The sky is the limit and the queries are simpler to write. If there is a regular means of presentation, this data could be collated every week and written to a summary table that would save you the hassle of looking up and calculating tag usage over time.
For example:
TagHistorySummary
year
week_no
flat
usages
Then if you wanted to present a chart showing the popularity of the pokemon
tag for all of 2020 (similar to Google’s Ngram Viewer), you could write a query like this:
SELECT week_no, usages
FROM TagHistorySummary
WHERE flat = 'pokemon' and year = 2020
ORDER BY week_no;
Note: Using
year
as a field name would be a poor choice given it’s a reserved word by most database engines. Do not take these examples as verbatim.
This data could then be used to populate a simple line chart:
Hmm … who knew that Pokemon was a thing before WWI? I sure as heck didn’t.
This isn’t the only way to accomplish your goal, of course. Experience has shown that it is one of the simpler ways, though, particularly when working with data that is supplied by people.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0