Feed aggregator
SPM and GTTs
Pickleball Übung: Drop Spiel 7-11
Einer der wichtigsten Schläge im Pickleball ist der 3rd Shot Drop – also der dritte Schlag einer Rally, wo das aufschlagende Team mittels eines kurzen Balls in die NVZ nach vorn kommen will.
Leider ist das auch ein ziemlich schwieriger Ball, weshalb er häufig geübt werden sollte.
Wenn es euch geht wie mir, findet ihr Spiele um Punkte viel spannender als Übungen. Darum hab ich mir dieses Spiel ausgedacht.
Es geht mit vier, drei oder sogar nur zwei Teilnehmern. Die Beschreibung ist für vier Spieler.
Spieler A und B sollen den Drop Shot üben. Sie stehen so, wie im normalen Spiel das aufschlagende Team vor dem 3. Schlag steht. Spieler C und D stehen so, wie im normalen Spiel das rückschlagende Team nach dem Return steht – nämlich an der NVZ. Hier in der Übung starten C und D jede Rally. Zuerst spielt D einen langen Ball diagonal. A versucht einen Drop Shot. Anschließend rücken A und B nach vorn:
Je nachdem, wie gut der Drop Shot war, kommen sie gleich nach vorn oder rücken allmählich durch die Transition-Zone vor.
Das Spiel geht mit Rally-Scoring, also sowohl Team AB als auch Team CD können jederzeit Punkte machen. C und D starten abwechselnd die Rally. Ist also der erste Punkt ausgespielt, beginnt nun C:
Für C und D ist es etwas leichter, Punkte zu machen als für A und B. Darum gewinnen C und D mit 11 Punkten, während A und B schon mit 7 Punkten gewinnen.
Sind nur drei Spieler am Start, übt einer den Drop Shot. Er wechselt dabei jeweils die Seite. Die Gegner dürfen nur auf diese Seite spielen.
Bei zwei Spielern spielt man nur auf einer Hälfte des Platzes.
Das Spiel hat für beide Teams einen guten Übungseffekt, denn diese Schläge sind typischerweise die kritischen Schläge jedes Ballwechsels bei fortgeschrittenen Spielern – und man spielt/übt eben nur diese.
Durch das Scoring bleibt die Motivation hoch. Bei unseren bisherigen Drop Spielen hat sich gezeigt, dass nach relativ kurzer Zeit häufiger das Drop Team mit 7 Punkten gewinnt. Das ist aber auch ganz okay so, finde ich. Denn das gibt ja das Feedback, dass man es mit dem Drop Shot richtig macht.
Number Data type declaration with different length and performance impact
Update Partition table from another Partition table Performance issue
Gather STATS on Partitioned Table and Parallel for Partitioned Table
DR setup involving replicated database
Locate an Error in Wrapped PL/SQL
Posted by Pete On 18/03/24 At 01:00 PM
FastAPI File Upload and Temporary Directory for Stateless API
If you think I’m geeky, you should meet my friend.
I’d like to talk about a very good friend of mine.
Whilst he’s much older than me ( 11 or 12 weeks at least), we do happen to share interests common to programmers of a certain vintage.
About a year ago, he became rather unwell.
Since then, whenever I’ve gone to visit, I’ve taken care to wear something that’s particular to our friendship and/or appropriately geeky.
At one point, when things were looking particularly dicey, I promised him, that whilst “Captain Scarlet” was already taken, if he came through he could pick any other colour he liked.
As a life-long Luton Town fan, his choice was somewhat inevitable.
So then, what follows – through the medium of Geeky T-shirts – is a portrait of my mate Simon The Indestructable Captain Orange…
When we first met, Windows 3.1 was still on everyone’s desktop and somewhat prone to hanging at inopportune moments. Therefore, we are fully aware of both the origins and continuing relevance of this particular pearl of wisdom :
Fortunately, none of the machines Simon was wired up to in the hospital seemed to be running any version of Windows so I thought he’d be reassured by this :
Whilst our first meeting did not take place on a World riding through space on the back of a Giant Turtle ( it was in fact, in Milton Keynes), Simon did earn my eternal gratitude by recommending the book Good Omens – which proved to be my gateway to Discworld.
The relevance of this next item of “Geek Chic” is that, when Simon later set up his own company, he decided that it should have a Latin motto.
In this, he was inspired by the crest of the Ankh-Morpork Assassins’ Guild :
His motto :
Nil codex sine Lucre
…which translates as …
No code without payment
From mottoes to something more akin to a mystic incantation, chanted whenever you’re faced with a seemingly intractable technical issue. Also, Simon likes this design so…
As we both know, there are 10 types of people – those who understand binary and those who don’t…
When confronted by something like this, I am able to recognise that the binary numbers are ASCII codes representing alphanumeric characters. However, I’ve got nothing on Simon, a one-time Assembler Programmer.
Whilst I’m mentally removing my shoes and socks in preparation to translate the message, desperately trying to remember the golden rule of binary maths ( don’t forget to carry the 1), he’ll just come straight out with the answer (“Geek”, in this case).
Saving the geekiest to last, I’m planning to dazzle with this on my next visit :
Techie nostalgia and a Star Wars reference all on the one t-shirt. I don’t think I can top that. Well, not for now anyway.
1, 2, 3 – Frei!
Pickleball Übung: 1, 2 , 3 – Frei!
Eine schöne Übung zum Aufwärmen, die auch gut für Einsteiger geeignet ist:
Alle vier Spieler stehen an der NVZ. Aufschlag und Zählweise ist wie beim normalen Spiel.
Die ersten drei Bälle inklusive des Aufschlags müssen in der NVZ aufkommen. Anschließend ist der Ball freigegeben für offensive Dinks, Speed-Ups und Lobs:
Beispiel: Spieler A beginnt mit dem Aufschlag diagonal, D dinkt (nicht zwingend) zu B und B spielt den dritten Ball in die Küche zu C. C spielt einen langen Ball in die Lücke.
Hintergrund: Wir haben die Übung bisher so ähnlich gespielt, aber mit 5 Bällen, die in die Küche gespielt werden müssen, bevor der Ball freigegeben wird.
Das hat in meinen Augen zwei Nachteile:
- Es lehrt die Teilnehmer die falsche Art von Dinks, nämlich harmlose „Dead Dinks“ in die Küche. Im ernsthaften Spiel geht es aber beim Dinken nicht in erster Linie darum, unbedingt in die Küche zu treffen. Ein Dink soll möglichst nicht angreifbar sein, aber möglichst unangenehm für den Gegner, damit der einen hohen Ball zurückspielt, den wir unsererseits angreifen können. Das kann durchaus auch ein Ball sein, der kurz hinter der NVZ aufspringt. Mit dem alten Übungsmodell ist das aber ein Fehler. Später hat man dann oft noch Schwierigkeiten, den Leuten die richtige Art von Dinks beizubringen.
- Man muss bis 5 die Bälle mitzählen. Das klappt oft nicht so gut, so dass man im Zweifel ist: Waren das jetzt schon 5?
Bei 1, 2, 3 – Frei! behält man leichter den Überblick. Trotzdem ist der Aufschläger (wie beim großen Spiel) etwas im Nachteil, denn die Rückschläger können zuerst einen offensiven Ball spielen. Eben zum Beispiel einen druckvollen Dink, kurz hinter die NVZ.
How to Create Urdu Hindi AI Model and Dataset from New Dataset
This video is hands on step-by-step tutorial to create a new dataset, an AI model, fine-tune the model on dataset and then push it to hugging face.
Code:
%%capture
import torch
major_version, minor_version = torch.cuda.get_device_capability()
# Must install separately since Colab has torch 2.2.1, which breaks packages
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
if major_version >= 8:
# Use this for new GPUs like Ampere, Hopper GPUs (RTX 30xx, RTX 40xx, A100, H100, L40)
!pip install --no-deps packaging ninja flash-attn xformers trl peft accelerate bitsandbytes
else:
# Use this for older GPUs (V100, Tesla T4, RTX 20xx)
!pip install --no-deps xformers trl peft accelerate bitsandbytes
pass
!pip install einops
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/gemma-7b-bnb-4bit", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
token = " ", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
use_gradient_checkpointing = True,
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
alpaca_prompt = """ذیل میں ایک ہدایت ہے جو فلم کے نام کی وضاحت کرتی ہے، اس کے ساتھ ایک ان پٹ بھی ہے جو مزید دستاویزات فراہم کرتا ہے۔ گانے کے بول لکھنے کے لیے ایک لمحہ نکالیں جو فلم کے نام کے معنی سے میل کھاتا ہے۔
### Instruction:
{}
### Input:
{}
### Response:
{}"""
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
instructions = examples["urdu_instruction"]
inputs = examples["urdu_input"]
outputs = examples["urdu_output"]
texts = []
for instruction, input, output in zip(instructions, inputs, outputs):
# Must add EOS_TOKEN, otherwise your generation will go on forever!
text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
texts.append(text)
return { "text" : texts, }
pass
from datasets import load_dataset
dataset = load_dataset("fahdmirzac/urdu_bollywood_songs_dataset", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)
from huggingface_hub import login
access_token = "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"
login(token=access_token)
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps = 100,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
trainer_stats = trainer.train()
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"دیے گئے فلم کے نام کے بارے میں ایک مختصر گیت کے بول لکھیں۔", # instruction
"کیوں پیار ہو گیا", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 200, use_cache = True)
tokenizer.batch_decode(outputs)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"دیے گئے فلم کے نام کے بارے میں ایک مختصر گیت کے بول لکھیں۔", # instruction
"رنگ", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens = 200, use_cache = True)
tokenizer.batch_decode(outputs)
model.push_to_hub("fahdmirzac/Gemma_Urdu_Hindi_Bollywood_Songs", token = "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")
Index Usage – 1
In 12.2 Oracle introduced Index Usage Tracking to replace the previous option for “alter index xxx monitoring usage”. A recent post on the Oracle database discussion forum prompted me to look for articles about this “new” feature and what people had to say about it. There didn’t seem to be much information online – just a handful of articles starting with Tim Hall a few years ago and ending with Maria Colgan a few months ago – so I thought I’d update my notes a little and publish them.
Unfortunately, by the time I’d written the first 6 pages it was starting to feel like very heavy going, so I decided to rewrite it as a mini-series. In part one I’ll just give you some descriptions and explanations that are missing from the manuals; in part two I’ll do a bit of a fairly shallow dive to talk about what’s happening behind the scenes and how you can do some experiments; in part three I’ll describe some of the experiments and show the results that justify the descriptions I’ve given here in part one.
HistoryIn the bad old days you could enable “monitoring” on an index to see if it was being used. The command to do this was:
alter index {index name} monitoring usage;
After executing this statement you would wait for a bit then check the view dba_object_usage:
SQL> desc dba_object_usage
Name Null? Type
----------------------------- -------- --------------------
OWNER NOT NULL VARCHAR2(128)
INDEX_NAME NOT NULL VARCHAR2(128)
TABLE_NAME NOT NULL VARCHAR2(128)
MONITORING VARCHAR2(3)
USED VARCHAR2(3)
START_MONITORING VARCHAR2(19)
END_MONITORING VARCHAR2(19)
SQL> select * from dba_object_usage;
OWNER INDEX_NAME TABLE_NAME MON USE START_MONITORING END_MONITORING
--------------- -------------------- ------------------------- --- --- ------------------- -------------------
TEST_USER T2_I1 T2 YES YES 03/12/2024 15:31:35
1 row selected.
As you can see, this didn’t give you much information – just “yes it has been used” or “no it hasn’t been used” since the moment you started monitoring it; and that’s almost totally useless as an aid to measuring or understanding the effectiveness of the index.
Apart from the almost complete absence of information, there were collateral issues: I think that, initially, gathering stats, index rebuilds and using explain plan would flag an index as used; at the opposite extreme indexes that were actually used to avoid foreign key locking problems were not flagged as used.
And now for something completely differentThe promise of Index Usage Tracking is clearly visible in the description of the view you use to report the details captured:
SQL> desc dba_index_usage
Name Null? Type
----------------------------------- -------- ------------------------
OBJECT_ID NOT NULL NUMBER
NAME NOT NULL VARCHAR2(128)
OWNER NOT NULL VARCHAR2(128)
TOTAL_ACCESS_COUNT NUMBER
TOTAL_EXEC_COUNT NUMBER
TOTAL_ROWS_RETURNED NUMBER
BUCKET_0_ACCESS_COUNT NUMBER
BUCKET_1_ACCESS_COUNT NUMBER
BUCKET_2_10_ACCESS_COUNT NUMBER
BUCKET_2_10_ROWS_RETURNED NUMBER
BUCKET_11_100_ACCESS_COUNT NUMBER
BUCKET_11_100_ROWS_RETURNED NUMBER
BUCKET_101_1000_ACCESS_COUNT NUMBER
BUCKET_101_1000_ROWS_RETURNED NUMBER
BUCKET_1000_PLUS_ACCESS_COUNT NUMBER
BUCKET_1000_PLUS_ROWS_RETURNED NUMBER
LAST_USED DATE
Though the columns are not very well described in the reference manuals you can see very clearly that there’s a lot more detail than just “yes/no” here. The columns clearly carry information about “how many times” and “how much data”, breaking the numbers down across a small range-based histogram. Here’s an example of output (using Tom Kyte’s print_table() routine to turn columns to rows):
SQL> execute print_table('select * from dba_index_usage where name = ''T1_I1''')
OBJECT_ID : 206312
NAME : T1_I1
OWNER : TEST_USER
TOTAL_ACCESS_COUNT : 889
TOTAL_EXEC_COUNT : 45
TOTAL_ROWS_RETURNED : 17850
BUCKET_0_ACCESS_COUNT : 0
BUCKET_1_ACCESS_COUNT : 0
BUCKET_2_10_ACCESS_COUNT : 0
BUCKET_2_10_ROWS_RETURNED : 0
BUCKET_11_100_ACCESS_COUNT : 878
BUCKET_11_100_ROWS_RETURNED : 13200
BUCKET_101_1000_ACCESS_COUNT : 9
BUCKET_101_1000_ROWS_RETURNED : 1650
BUCKET_1000_PLUS_ACCESS_COUNT : 2
BUCKET_1000_PLUS_ROWS_RETURNED : 3000
LAST_USED : 11-mar-2024 20:26:26
The order of the columns is just a little odd (in my opinion) so I’ve switched two of them around in my descriptions below:
- Total_exec_count: is the total number of executions that have been captured for SQL statements using this index.
- Total_access_count: is the total number of scans of this index that have been observed. If you think of a nested loop join you will appreciate that a single execution of an SQL statement could result in many accesses of an index – viz: an index range scan into the inner (second) table may happen many times, once for each row acquired from the outer (first) table.
- Total_rows_returned: carries a little trap in the word rows, and in the word returned. In this context “rows” means “index entries”, and “returned” means “passed to the parent operation”. (To be confirmed / clarified)
- Bucket_0_access_count: how many index accesses found no rows and there’s no bucket_0_row_count needed because it would always be 0).
- Bucket_1_access_count: how many index accesses found just one row (and there’s no bucket_1_row_count because that would always match the access count).
- Bucket_M_N_access_count: how many index accesses found between M and N rows.
- Bucket_M_N_row_count: sum of rows across all the index accesses that returned between M and N rows.
- Last_used: date and time of the last flush that updated this row of the table/view.
The most important omission in the descriptions given in the manuals is the difference between total_exec_count and total_access_count. (It was a comment on Maria Colgan’s blog note asking about the difference that persuaded me that I really had to write this note.) If you don’t know what an “access” is supposed to be you can’t really know how to interpret the rest of the numbers.
Take another look at the sample output above, it shows 45 executions and 889 accesses – I happen to know (because I did the test) that most of the work I’ve done in this interval has been reporting a two-table join that uses a nested loop from t2 into t1 using an index range scan on index t1_i1 to access table t1. I know my data well enough to know that every time I run my query it’s going to find about 20 rows in t2, and that for every row I find in t2 there will be roughly 15 rows that I will access in t1 through the index.
Give or take a little extra activity round the edges that blur the numbers I can see that the numbers make sense:
- 45 executions x 20 rows from t2 = 900 index range scans through t1_i1
- 878 index ranges scans x 15 rows per scan = 13,170
The numbers are in the right ball-park to meet my expectations. But we do have 11 more accesses reported – 9 of them reported an average of 1,650/9 = 183 rows, 2 of them reported an average of 3,000/2 = 1500 rows. Again, I know what I did, so I can explain why those numbers have appeared, but in real life you may have to do a little work to find a reasonable explanation (Spoilers: be suspicious about gathering index stats)
It’s possible, for example, that there are a few rows in the t2 table that have far more than the 15 row average in t1 and the larger numbers are just some examples from the nested loop query that happened to hit a couple of these outliers in t2. (It’s worth highlighting, as a follow-up to this suggestion, that a single execution could end up reporting accesses and row counts in multiple buckets.)
In fact the 9 “medium sized” access were the result of single table queries using a “between” clause that ranged through 10 to 15 values of t1 (returning 150 to 225 rows each), and the two “large” accesses were the result of two index-only queries where I forced an index full scan and an index fast full scan that discarded half the rows of an index holding 3,000 entries.
As I said, I’ll be presenting a few examples in part 3, but a guideline that may be helpful when considering the executions, accesses, and rowcounts is this: if you’re familiar with the SQL Monitor report then you’ll know that each call to dbms_sql_monitor.report_sql_monitor() reports one execution – then the Starts column for any index operation will (probably) be the total access count, and the Rows (Actual) column will (probably) be the total rows returned. As noted above, though, any one execution may end up splitting the total Starts and Rows (Actual) across multiple buckets.
Some questions to investigateI hope this has given you enough information to get you interested in Index Usage Tracking, and some idea of what you’re looking at when you start using the view. There are, however, some technical details you will need to know if you want to do some testing before taking any major steps in production. There are also some questions that ought to be addressed before jumping to conclusions about what the numbers mean, so I thought I’d list several questions that came to mind when I first read about the feature:
- Does a call to dbms_stats.gather_index_stats result in an update to the index usage stats, and does it matter?
- Does a call to explain plan result in an update to the index usage stats, and does it matter.
- Do referential integrity checks result in the parent or child indexes being reported in the usage stats. What if there is a parent delete with “on delete cascade” on the child.
- Do inserts, updates, deletes or merges produce any unexpected results (e.g. double / quadruple counting); what if they’re PL/SQL forall bulk processing, what if (e.g.) you update or delete through a join view.
- Does an index skip scan count as a single access, or does Oracle count each skip as a separate access (I’d hope it would be one access).
- If you have an index range scan with a filter predicate applied to each index entry after the access predicate is the “rows returned” the number of index entries examined (accessed), or the number that survive the filter. (I would like it to be the number examined because that’s the real measure of the work done in the index but the name suggests it counts the survivors.)
- Does an index fast full scan get reported correctly.
- Are IOTs accounted differently from ordinary B-tree indexes
- For bitmap indexes what is a “row” and what does the tracking information look like?
- If you have an “Inlist Iterator” operation does this get summed into one access, or is it an access per iteration (which is what I would expect). And how is the logic applied with partitioned table iteration.
- Does a UNION or UNION ALL operation count multiple accesses (I would expect so), and what happens with things like nvl_or_expansion with “conditional” branches.
- Does a “connect by pump” through an index produce any unexpected results
- Can index usage tracking tell us anything about Domain indexes
- Are there any types of indexes that are not tracked (sys indexes, for example)
If you can think of any other questions where “something different” might happen, feel free to add them as comments.
SummaryIndex Usage Tracking (and the supporting view dba_index_usage) can give you a good insight into how Oracle is using your indexes. This note explains the meaning of data reported in the view and a couple of ideas about how you may need to interpret the numbers for a single index.
In the next two articles we’ll look at some of the technical aspects of the feature (including how to enable and test it), and the results captured from different patterns of query execution, concluding (possibly in a 4th article) in suggestions of how to use the feature in a production system.
FootnoteAt the start of this note I said it had been prompted by a question on one of the Oracle forums. The thread was about identifying indexes that could be dropped and the question was basically: “Is the old index monitoring obsolete?” The answer is “Yes, definitely, and it has been for years.”
Using Claude 3 Haiku Vision with Amazon Bedrock Locally
This video is a hands-on guide as how to use vision features of Anthropic's Claude 3 Haiku AI model with Amazon Bedrock.
Code Used:
Kompaktkurs Pickleball für Einsteiger
Alles was ihr braucht, um mit Pickleball qualifiziert zu starten. Auf den Punkt gebracht an einem Tag!
Dieser 4-stündige Kompaktkurs ist ideal geeignet für Berufstätige und sportlich ambitionierte Senioren.
Der Kurs kostet 40 Euro pro Person und wird geleitet von Uwe Hesse – einem erfahrenen Spieler und vom DPB zertifizierten Trainer.
Er findet statt in Düsseldorf.
Die Teilnehmerzahl ist je Kurs begrenzt auf 8, um ein intensives und individuelles Coaching gewährleisten zu können.
Inhalte sind u.a.
Grundschläge: Aufschlag, Return, Volley, Dink
Regelkunde
Basisstrategien im Doppel
Zählweise
Bedeutung der Non-Volley-Zone
3rd Shot Drop
Teilnehmer-Doppel mit Trainerfeedback
Aktuelle Termine:
Samstag, 27. April
Samstag, 04. Mai
Beginn ist jeweils 10:00 Uhr.
Anmeldungen bitte ausschließlich per Mail an info@uhesse.com
Für Mitglieder des DJK Agon 08 kostet der Kurs nur 20 Euro.
Bei nachfolgendem Vereinseintritt werden 20 Euro erstattet.
Create AI Agent in AWS with Boto3 Code
This video is a step-by-step tutorial with code as how to create Amazon Bedrock AI agents with boto3 in Python to integrate with Lambda.
Code used: Just use any lambda with it of your choice.
PLS-00103: Encountered the symbol "RECORD" when expecting one of the following: array varray table object fixed varying opaque sparse
Musk Ox
Musk Ox is a fantastic instrumental chamber folk project from up in Canada. I recently stumbled on these guys and their guitarist Nathanael Larochette from some collaborations he did with the (now defunct, but often fantastic) Oregon-based post metal/neofolk band Agalloch. Here's a neat documentary on the making of their album Woodfall.
Another cool project is the acoustic spin off that Nathanael did from the last Agalloch album.
His solo stuff is great. Music about trees and such.
Querying LONGs
Update for 23c: If your only need for using LONGs in predicates is to query the partitioning views by high_value you won’t need to read this note as the views now expose columns high_value_clob and high_value_json. (See comment #3 below.)
Despite their continued presence in the Oracle data dictionary, LONG columns are not an option that anyone should choose; you can’t do much with them and they introduce a number of strange space management problems. Nevertheless a recent thread on the Oracle database forum started with the following question: “How do you use LONG columns in a WHERE clause?”. The basic answer is: “You don’t”.
This user wanted to query all_tab_partitions for a given table_name and high_value, and the high_value is (still) a LONG, so attempts to use it resulted in Oracle error “ORA-00997: illegal use of LONG datatype”. A possible, and fairly obvious but undesirable, solution to the requirement is to write a PL/SQL function to read the current row from all_tab_partitions and returns the first N characters of the high_value as a varchar2(). Here’s a version (not quite the one I posted) of such a function, with a sample of use:
rem
rem Script: get_high_value.sql
rem Author: Jonathan Lewis
rem Dated: Mar 2024
rem
create or replace function get_high_value (
i_tab_owner varchar2,
i_tab_name varchar2,
i_part_name varchar2,
i_part_posn number
)
return varchar2
is
v1 varchar2(4000);
begin
select atp.high_value
into v1
from all_tab_partitions atp
where atp.table_owner = upper(i_tab_owner)
and atp.table_name = upper(i_tab_name)
and atp.partition_name = upper(i_part_name)
and atp.partition_position = upper(i_part_posn)
;
return v1;
end;
/
select
apt.table_owner, apt.table_name,
apt.tablespace_name,
apt.partition_name, apt.partition_position,
apt.high_value
from
all_tab_partitions apt
where
apt.table_owner = 'TEST_USER'
and apt.table_name = 'PT_RANGE'
and get_high_value(
apt.table_owner,
apt.table_name,
apt.partition_name,
apt.partition_position
) = '200'
/
This seemed to work quite well and sufficiently rapidly – but I only had two partitioned tables in my schema and a total of 12 partitions, so it’s not sensible to look at the clock to see how efficient the query is.
Another possible solution introduced me to a function that has been around for years (and many versions) which I had never come across: sys_dburigen(). PaulZip supplied the following code (which I’ve altered cosmetically and edited to pick up a table in my schema):
select *
from (
select
dbms_lob.substr(
sys_dburigen (
atp.table_owner,
atp.table_name,
atp.partition_name,
atp.partition_position,
atp.high_value,
'text()'
).getclob(), 4000, 1) high_value_str,
atp.table_owner, atp.table_name, atp.partition_name,
atp.tablespace_name, atp.high_value
from all_tab_partitions atp
where atp.table_owner = 'TEST_USER'
and atp.table_name = 'PT_RANGE'
)
where high_value_str = '200'
/
This was so cute, and looked like a much nicer (i.e. “legal”) solution than my PL/SQL hacking that I had to take a closer look at sys_dburigen() – first to understand what it was supposed achieve (yes, I do RTFM) then to see how it actually worked.
Something I did first was simply to strip back the layers of the expression used to supplied the high_value_str which took me through the following four combinations (with and without ‘text’ , with and without ‘get_clob’). Each expression is followed by the result for the row selected above:
sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value, 'text()').getclob()
200
---
sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value).getclob()
<?xml version="1.0"?><HIGH_VALUE>200</HIGH_VALUE>
--
sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value, 'text()')
DBURITYPE('/PUBLIC/ALL_TAB_PARTITIONS/ROW[TABLE_OWNER=''TEST_USER'' and TABLE_NAME=''PT_RANGE'' and PARTITION_NAME=''P200'' and PARTITION_POSITION=''1'']/HIGH_VALUE/text()', NULL)
--
sys_dburigen (atp.table_owner, atp.table_name, atp.partition_name, atp.partition_position, atp.high_value)
DBURITYPE('/PUBLIC/ALL_TAB_PARTITIONS/ROW[TABLE_OWNER=''TEST_USER'' and TABLE_NAME=''PT_RANGE'' and PARTITION_NAME=''P200'' and PARTITION_POSITION=''1'']/HIGH_VALUE', NULL)
Working from the bottom pair up we see that we start by generating a dburitype which defines the type of thing we want to query and the restriction we want to use while querying. The ‘text()’ option simply adds an extra detail to the dburitype.
The top pair shows us that the get_clob() will then return the value we have requested, either as an XML value, or as the text value described by the XML value if we’ve supplied the ‘text()’ option.
Our call to sys_dburigen() has specified an object we want to access, and 4 columns in that object that will identify a unique row in that object, and a fifth column that we want returned either as an XML value or as a text value.
TracingI actually worked through the analysis in the opposite direction to the one I’ve been showing. When the call to sys_dburigen() I suspected that it might be doing the same thing as my PL/SQL function call, so I ran the two queries with SQL tracing enabled to see what activity took place at the database.
Ignoring driving query against all_tab_partitions the content of the PL/SQL trace was basically 3 executions (I had 3 partitions in the pt_range table) of:
SELECT ATP.HIGH_VALUE
FROM
ALL_TAB_PARTITIONS ATP WHERE ATP.TABLE_OWNER = UPPER(:B4 ) AND
ATP.TABLE_NAME = UPPER(:B3 ) AND ATP.PARTITION_NAME = UPPER(:B2 ) AND
ATP.PARTITION_POSITION = UPPER(:B1 )
The content of the sys_dburigen() trace was 3 executions of a query like:
SELECT alias000$."HIGH_VALUE" AS HIGH_VALUE
FROM
"ALL_TAB_PARTITIONS" alias000$ WHERE 1 = 1 AND ((((alias000$."TABLE_OWNER"=
'TEST_USER') AND (alias000$."TABLE_NAME"='PT_RANGE')) AND
(alias000$."PARTITION_NAME"='P200')) AND (alias000$."PARTITION_POSITION"=
'1'))
Note particularly the literal values in the predicates in lines 4, 5 and 6. This version of the code has to generate and optimise (hard-parse) a new SQL statement for every partition in the table referenced in the driving query. For a table with a large number of partitions, and a system with a large number of partitioned tables, the disruption of shared pool that this might cause could be severe if (as the user said at one point) “we will be frequently selecting from all_tab_partitions”. [Damage limitation: if the session sets cursor_sharing to FORCE temporarily then the generated SQL will be subject to bind variable substitution; but that’s not an ideal workaround.]
SummaryUsing LONG columns in SQL predicates is not nice – and not likely to be efficient – but there are ways of working around the limitations of LONGs. It’s undesirable to use PL/SQL that calls SQL inside a SQL statement, but we can use a PL/SQL function to return a string from a LONG in the current row – and since that’s pretty much what Oracle seems to be doing with its call to sys_dburigen() it’s hard to insist that the PL/SQL strategy is inappropriate. (But maybe the call to sys_dburigen() in this context would be considered an abuse of a feature anyway – even though it seems much more elegant and flexible once you’ve learned a little about how it works.)
FootnoteAs another detail on analysing the cost/benefit of different approaches – it would be possible to avoid creating the pl/sql function by embedding it in the SQL as a “with function” clause:
with function get_high_value (
i_tab_owner varchar2,
i_tab_name varchar2,
i_part_name varchar2,
i_part_posn number
)
return varchar2
is
v1 varchar2(4000);
begin
select atp.high_value
into v1
from all_tab_partitions atp
where atp.table_owner = upper(i_tab_owner)
and atp.table_name = upper(i_tab_name)
and atp.partition_name = upper(i_part_name)
and atp.partition_position = upper(i_part_posn)
;
return v1;
end;
select
apt.table_owner, apt.table_name,
apt.tablespace_name,
apt.partition_name, apt.partition_position,
apt.high_value
from
all_tab_partitions apt
where
apt.table_owner = 'TEST_USER'
and apt.table_name = 'PT_RANGE'
and get_high_value(
apt.table_owner,
apt.table_name,
apt.partition_name,
apt.partition_position
) = '200'
/
I have asked the user why they want to query all_tab_partitions by high_value since it seems to be a slightly odd thing to do and there may be a better way of doing whatever it is that this query is supposed to support. They haven’t responded to the question, so I’ll take a guess that they want to rename (or move etc.) partitions that they don’t know the name for – perhaps because they are using interval partitioning or automatic list partitioning. If the guess is correct then the solutions offered are irrelevant – you don’t need to know the name of a partition to manipulate it, you need only know some value that is a legal member of the partition:
SQL> select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position;
PARTITION_NAME
----------------------
P200
P400
P600
3 rows selected.
SQL> alter table pt_range rename partition for (199) to pt_0200;
Table altered.
SQL> select partition_name from user_tab_partitions where table_name = 'PT_RANGE' order by partition_position;
PARTITION_NAME
----------------------
PT_0200
P400
P600
3 rows selected.
Missing Partition
Here’s a silly little detail about execution plans on (interval) partitioned tables that I hadn’t noticed until it showed up on this thread on a public Oracle forum: it’s an execution plan that claims that Oracle will be visiting a partition that clearly won’t be holding the data requested.
Here’s the starting section of a demonstration script – mostly by Solomon Yakobson with minor tweaks and additions from me:
rem
rem Script: non_existent_partition.sql
rem Author: Solomon Yakobson / Jonathan Lewis
rem Dated: Mar 2024
rem
rem Last tested
rem 19.11.0.0
rem
create table invoices_partitioned(
invoice_no number not null,
invoice_date date not null,
comments varchar2(500)
)
partition by range (invoice_date)
interval (interval '3' month)
(
partition invoices_past values less than (date '2023-01-01')
);
insert into invoices_partitioned
select level,
date '2023-01-01' + numtoyminterval(3 * (level - 1),'month'),
null
from dual
connect by level <= 6
/
insert into invoices_partitioned select * from invoices_partitioned;
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
/
commit
/
execute dbms_stats.gather_table_stats(user,'invoices_partitioned')
set linesize 156
column high_value format a80
select partition_position, num_rows,
partition_name,
high_value
from user_tab_partitions
where table_name = 'INVOICES_PARTITIONED'
order by partition_position
/
alter table invoices_partitioned drop partition for (date'2023-09-01');
purge recyclebin;
select partition_position, num_rows,
partition_name,
high_value
from user_tab_partitions
where table_name = 'INVOICES_PARTITIONED'
order by partition_position
/
The script creates an interval partitioned table, with an interval of 3 months, then inserts 131,072 rows per partition (the strange re-execution of “insert into x select from x” was my lazy way of increasing the volume of data from the original one row per partition without having to think too carefully.
After creating the data we report the partition names and high values in order, then drop (and purge) the partition that should hold the value 1st Sept 2023 (which will be the partition with the high_value of 1st Oct 2023) and report the partition names and high values again so that you can see the “gap” in the high values and the adjustment to the partition_position values. Here are the “before” and “after” outputs:
PARTITION_POSITION NUM_ROWS PARTITION_NAME HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
1 0 INVOICES_PAST TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
2 131072 SYS_P39375 TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
3 131072 SYS_P39376 TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
4 131072 SYS_P39377 TO_DATE(' 2023-10-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
5 131072 SYS_P39378 TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
6 131072 SYS_P39379 TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
7 131072 SYS_P39380 TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
7 rows selected.
PARTITION_POSITION NUM_ROWS PARTITION_NAME HIGH_VALUE
------------------ ---------- ---------------------- --------------------------------------------------------------------------------
1 0 INVOICES_PAST TO_DATE(' 2023-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
2 131072 SYS_P39375 TO_DATE(' 2023-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
3 131072 SYS_P39376 TO_DATE(' 2023-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
4 131072 SYS_P39378 TO_DATE(' 2024-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
5 131072 SYS_P39379 TO_DATE(' 2024-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
6 131072 SYS_P39380 TO_DATE(' 2024-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIA
6 rows selected.
Now we check the execution plan for a query that would have accessed the partition we’ve just dropped:
explain plan for
select *
from invoices_partitioned
where invoice_date = date '2023-09-01';
select *
from dbms_xplan.display();
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
Plan hash value: 1148008570
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 109K| 1173K| 104 (1)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 109K| 1173K| 104 (1)| 00:00:01 | 4 | 4 |
|* 2 | TABLE ACCESS FULL | INVOICES_PARTITIONED | 109K| 1173K| 104 (1)| 00:00:01 | 4 | 4 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
The execution plans says it’s going to visit partition number 4 (pstart/pstop) – which we know will definitely cannot be holding any relevant data. If this were an ordinary range-partitioned table – as opposed to interval partitioned – it would be the correct partition for 1st Sept 2024, of course, but it isn’t, so it feels like the pstart/pstop ought to say something like “non-existent” and all the numeric estimates should be zero.
A quick trick for making an interval partition appear without inserting data into it is to issue a “lock table … partition for () …” statement (See footnote to this blog note). I did wonder if the attempt to explain a plan that needed a non-existent partition had actually had the same effect of making Oracle create the partition, so I ran the query against user_tab_partitions again just to check that this hadn’t happend.
So what’s going to happen at run-time: is this an example of “explain plan” telling us a story that’s not going to match what shows up in v$sql_plan (dbms_xplan.display_cursor). Let’s run the query (with rowsource execution stats enabled) and find out:
set serveroutput off
alter session set statistics_level = all;
alter session set "_rowsource_execution_statistics"=true;
select *
from invoices_partitioned
where invoice_date = date '2023-09-01';
select *
from table(dbms_xplan.display_cursor(format=>'allstats last partition'));
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------
SQL_ID d42kw12htubhn, child number 0
-------------------------------------
select * from invoices_partitioned where invoice_date = date
'2023-09-01'
Plan hash value: 1148008570
---------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | Pstart| Pstop | A-Rows | A-Time |
---------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | | | 0 |00:00:00.01 |
| 1 | PARTITION RANGE SINGLE| | 1 | 109K| 4 | 4 | 0 |00:00:00.01 |
|* 2 | TABLE ACCESS FULL | INVOICES_PARTITIONED | 0 | 109K| 4 | 4 | 0 |00:00:00.01 |
---------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("INVOICE_DATE"=TO_DATE(' 2023-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
It’s the same plan with the same “wrong” partition identified, and the same estimate for rows returned – but the access never actually happened: Starts = 0 on the table access full.
My hypothesis about this misleading reporting is that Oracle knows from the table definition everything about every partition that might eventually exist – the high_value for the “anchor” partition is known and the interval is known so the appropriate partition number for any partition key value can be derived. Then, at some point, a disconnect appears between the theoretical partition position and the set of physically instantiated partitions, so the optimizer gets the message “theoretically it’s in the 4th partition” and collects the stats from “partition_position = 4” to do the arithmetic and produce the plan.
Fortunately there’s some metadata somewhere that means the run-time engine doesn’t try to access the wrong partition, so this little glitch doesn’t really matter for this simple query – beyond its ability to cause a little confusion.
It’s possible, though, that this behaviour leaves the optimizer with another reason for getting the arithmetic wrong and picking the wrong path if you have a number of “missing” partitions in an interval partitioned table that you’re querying with a range-based predicate that crosses several (non-existent) partitions. So treat this as a warning/clue if you recognise that pattern in any of your partitioned table.